National Academies Press: OpenBook

A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors (2013)

Chapter: Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates

« Previous: Chapter 4 - Surrogates for Road Departure Crashes
Page 16
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 16
Page 17
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 17
Page 18
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 18
Page 19
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 19
Page 20
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 20
Page 21
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 21
Page 22
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 22
Page 23
Suggested Citation:"Chapter 5 - Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.
×
Page 23

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

16 Statistical Analysis: A Unified Approach to the Analysis of Rates for Crashes and Crash Surrogates Traditional Analysis of Crash Data In transportation-related safety studies, various data ana- lytic methodologies have been used to investigate associa- tions between crashes and various risk factors. Historically, depending on the application under investigation, Poisson, negative binomial, random effects, and hierarchical Bayesian data models, among others, have been used to analyze data collected from historical crash databases. The response vari- able under these models is typically the number of crashes which can be cross-classified into a contingency table accord- ing to certain explanatory factors hypothesized to be associ- ated with the response variable (Table 5.1). Exposure data, such as vehicle miles traveled (VMT) or annual average daily traffic (AADT), can also be cross-classified by the explana- tory factors for the analysis of rates. As shown in Table 5.1, explanatory factors can include those related to the driver, the environment, the vehicle, and the highway. After an appropri- ate model is fitted to the data, relative risks (RRs) of various combinations of the explanatory variables, including inter- actions, can be calculated from the estimated model param- eters to determine which risk factors are most associated with the occurrence of crashes. This project involves an additional component not gener- ally considered in standard data analysis problems. The study focuses on the statistical relationship between surrogate measures of collisions with actual collisions, and the formu- lation of exposure-based risk measures using these surrogate measures. The surrogate measures of collisions are generally collected from naturalistic driving data (NDD), which rarely provide sufficient data resulting from actual collisions, while data for crash outcomes are derived from historical crash databases. Therefore, it is necessary to consider the attributes of crash data and naturalistic driving data simultaneously in order to provide a link between crashes and crash surrogates (Table 5.2). The variables collected from NDD are generally richer than those available in historical crash databases in the sense that NDD are derived from instrumented vehicles capable of making precise measurements with respect to certain roadway factors, driver behavior, and vehicle fac- tors. Crash data, on the other hand, are derived mainly from the information available in police accident reports and cannot capture the level of detail contained in NDD. While certain variables such as weather condition, light condi- tion, and road condition are recorded in both data sources, many variables recorded in natural driving experiments are not recorded in crash databases. Therefore, it is desirable to develop data analytic methods that consider crash data and NDD in a unifying framework, one that can account for and possibly adjust for inherent differences in the types of variables available from the two different sources of data. In the analysis that follows, the response variables are actually defined as rates (crashes or surrogate event per unit exposure). Thus, exposure is included in the definition of the response variable in each case. However, exposure is also included in the explanatory variable set, so it is simply a matter of adjusting the model coefficients to convert from rates to counts and vice versa. In general, the relationship between crash (or surrogate) counts and exposure is expected to be nonlinear. However, on a logarithmic scale it is commonly the case that this relationship does indeed become linear (indi- cating a power law relationship in the underlying variables). In Figures 5.1 and 5.2, a plausible linear relationship is indeed seen on logarithmic scales. Seemingly Unrelated Regression Model Based on the discussion above, a model is proposed that extends the usual univariate response model for crashes to a model that treats crashes and crash surrogates as a bivari- ate response variable. Instead of fitting one model for crashes C h A p T e R 5

17 and independently fitting a separate model for a crash sur- rogate, the idea is to fit one model that accommodates both responses in a unifying model. The model is based on the method of seemingly unrelated regressions (SURs) proposed by Zellner (1962). SUR is developed in a normal theory framework and incorporates a correlation structure between crashes and crash surrogates. It allows formal tests of hypoth- eses to be conducted to test whether the risks associated with explanatory factors, or more importantly subsets of explana- tory factors, are the same or different for crashes and crash surrogates. The model used here takes the form shown in Equation 5.1: = β + ε = β + ε (5.1) 1 1 1 1 2 2 2 2 Y X Y X where the subscript 1 refers to the crash model and the sub- script 2 refers to the surrogate model. The equations resem- ble ordinary regression equations where Y1 and Y2 are the response variables, X1 and X2 are data matrices of explanatory variables, b1 and b2 are regression parameters, and e1 and e2 are error terms with normal distributions. In the SUR frame- work, the crash data are stacked on top of the surrogate data to form a system of equations (Equation 5.2). Table 5.1. Common Response and Explanatory Factors Used in Crash Data Analysis Response Crashes (Exposure) Explanatory Factor Driver Environment Vehicle Highway Table 5.2. Simultaneous Consideration of Crash and Field Operational Test Data Crash Data Field Operational Test Data Response Crashes Response Surrogates (Exposure) (Exposure) Explanatory Factor Driver Explanatory Factor Driver Environment Environment Vehicle Vehicle Highway Highway 2 4 6 8 10 12 4. 0 4. 5 5. 0 5. 5 6. 0 6. 5 7. 0 7. 5 Log Crash Exposure Lo g Cr as he s Figure 5.1. Relationship between crash numbers and exposure on logarithmic scales. 0 2 4 6 8 4 5 6 7 8 Log LD Alert Exposure Lo g LD A le rts Figure 5.2. Relationship between surrogate crash numbers and exposure on logarithmic scales. Y Y X X 1 2 1 2 1 2 1 2 0 0     =         +    • β β ε ε  ( . )5 2 Note that the X matrices of explanatory variables are not required to be the same, either in terms of variables or in terms of dimension. Therefore, variables collected from NDD can be different from those collected in the crash data set. Since crash data are stacked on top of surrogate data, the system of equa- tions satisfies a linear model of the form shown in Equation 5.3: Y X N= + ( )β ε ε, ~ ,0 Σ where ( )ε = Σ = σ σ σ σ     (5.3) 11 12 21 22 Var I I I I and I is an identity matrix. Suppose Y1 has dimension N1 × 1 and Y2 has dimension N2 × 1 so that Y has length N1 + N2 = N.

18 Then, the matrix S has dimension N × N. Since this model satisfies the properties of a linear model with a defined co- variance matrix, the parameters can be estimated by weighted least squares (WLS) as shown in Equation 5.4: ˆ (5.4)1 1 1X X X YT T( )β = Σ Σ− − − The parameters s11 and s22 represent the variances in the crash and surrogate regressions, respectively. The parameter s12 = s21 is the covariance between Y1 and Y2. These parame- ters are estimated by fitting separate independent regressions for the crash data and the surrogate data and using the usual residual sum of squares for the variances, and the sum of the residual cross-product terms for the covariance. The utility of this unifying framework is that tests of hypotheses of the form β = β:0 1 2H can be conducted by using the usual F-test in a regression setting. This hypothesis tests whether crash model param- eters equal surrogate model parameters. More important, it is possible to test whether only certain crash model parameters equal certain surrogate model parameters. This last point is important to the application of this framework to the simul- taneous modeling of crashes and surrogates, because in many cases only a subset of the variables will be common to both. poisson Log-Linear Models estimated by Weighted Least Squares The Poisson log-linear model is the standard model for the analysis of rates. However, this model has limited use in practice because for a Poisson random variable, the mean is restricted to equal the variance. This has caused researchers to consider more flexible models such as negative binomial, generalized linear mixed models (GLMMs), or Bayesian models. It is well known that WLS can be used to estimate maxi- mum likelihood parameters in Poisson log-linear models (see, e.g., Agresti 2002). Therefore, the SUR framework can be used to estimate parameters in a log-linear model since parameters in a SUR model can be estimated by WLS. The WLS solu- tion depends on asymptotic theory, so the only restriction is that the data are not too sparse. For example, the number of crashes or the number of surrogate events should not be 0 for many observations. Estimating parameters in a log-linear model by using normal theory on which the SUR model is based requires a square root transformation of the data. In particular, as shown in Equations 5.5 and 5.6, the dependent and independent variables are ( )′ = ′ =log (5.5, 5.6)Y Y Y X Y X The variable Y ′ is regressed on X ′ by using WLS with cova- riance matrix S. This model does not suffer from the restric- tions of the Poisson model. That is, in addition to the mean parameter, the normal model has two parameters for variance, and one parameter for the covariance. Therefore, it can handle extravariability or overdispersion often encountered in obser- vational studies that the standard Poisson model cannot. One disadvantage, discussed briefly above, is that Y should not be 0. In the rare cases that it is 0, 0.5 can be added. The asymptotics can break down if there are many zeros. Ideally, the Y values should be at least 5. Bayesian SUR for Log-Linear Models Now that the model is set up in the context of a normal theory linear model, the extension to a Bayesian model is straight- forward. Methods for Bayesian data analysis of normal regression models are well developed. A likelihood function for the data and a prior distribution for the parameters must be specified. The likelihood function and prior distributions are described below in Equations 5.7 through 5.9. Likelihood: fixed′ ( )Y Nµ µ, ~ , ( . )Σ Σ Σ 5 7 Prior 1: µ β τ λ τ λ β β i i i i N i x N, ~ , , . . . ,( ) = = ′ + ′ 1 0 0 1x xi p ip1 5 8+ ′+. . . . . . ( . )β Prior 2: Gamma β τ j N j p~ , , / ~ . , . . . ,0 10 0 1 0 001 6( ) = 0 001 5 9. ( . )( ) In the likelihood, the matrix S is assumed to be fixed. It is the same as described above and contains two parameters for variance, s11 and s22, and one parameter s12 for covari- ance. These parameters are estimated by using the residual sum of squares and residual sum of cross-product terms from ordinary independent regression models fit to crash and sur- rogate data, respectively. The regression model equation is incorporated into the first prior as the mean of a normal distribution and is designated by l, which is a linear com- bination of the regression parameters b and the explanatory variables X. The second prior is proper and takes a standard noninformative prior. Using proper priors ensures propriety of posterior distributions. Estimation proceeds by Markov chain Monte Carlo (MCMC) simulation, which is used to generate random variables from the posterior distributions of the parameters µ, b, and t. Because calculation of posterior distributions directly is not possible in closed form, the output generated from MCMC simulation is used to estimate characteristics of posterior dis- tributions. These Markov chains are designed to converge in

19 distribution to the desired posterior distributions. To ensure convergence, Markov chains are run with 60,000 iterations, and the first 30,000 are discarded for “burn-in.” The Bayesian model has an important advantage over the classical model. Because the regression model is speci- fied in the prior, the posterior estimate of µ will tend to be a weighted average of the data Y ′ and the regression estimate l. The weights depend on the estimates of variance, namely S and t. Therefore, if the regression model displays lack of fit, indicated by large t, the posterior estimate will be smoothed toward the data. Accordingly, in the Bayesian SUR model, interest focuses on the posterior estimates of µ and not on the regression estimates l. The estimates of RR produced by the Bayesian model that are the focus of this analysis depend on µ. Since the Bayesian model produces estimates that are a weighted average of the data and the regression model, in the case of lack of fit the Bayesian model smoothes esti- mates toward the data. This was an important property in the models fit by the team. In a classical model, RR would be estimated by the regression equation for l alone. Because the SUR model is estimated on a transformed scale to normality, it is necessary to transform back to make inference about the RRs. The RR is simply a ratio of rates comparing one combination of explanatory variables in the numerator to another combination in the denominator. Run- ning the Markov chain will produce samples generated from the posterior distributions of µ. The transformation of the dependent variable that was shown in the previous section is: ( )′ = log (5.10)Y Y Y Therefore, the simulated values should be transformed by the formula ( )µ − log exposure (5.11) Y to calculate a posterior sample for the log rates. Then log RRs can be formed by taking differences of log rates based on com- binations of certain explanatory variables. The reason for using the log RR is that the sampling distribution of the RR on the log scale is close to normal. A main hypothesis of interest is whether the difference between a crash log RR and a surrogate log RR is 0 while controlling for certain explanatory variables. SUR Model Application and Results Bayesian SUR models were applied to right road departure crashes and three candidate surrogates: right lane deviation (LDEV), right lane departure warning (LDW), and time to right edge crossing (TTEC). The number of explanatory variables for SUR model application was limited by the data and consisted of four variables reported in the literature to be associated with road departure crashes, area type, road type, horizontal curva- ture, and shoulder width. Therefore, three separate models are presented, one for each crash-surrogate pair. The categorical models are: crashes, surrogate events, and exposure measures aggregated into the 24 combinations of the four variables in the models—Curve (2), Freeway (2), Area (2), and Right Shoulder (3)—so that there are 2 × 2 × 2 × 3 = 24 independent observa- tions. Of the 24 possible cells (combinations of the explanatory variables), only 16 are used as data for the models; six cells were necessarily empty (meaning the specific combinations were not found in the data, for example, rural freeways with shoulder width 0 to 3 ft on curved and on tangent sections), and two cells had very low values for traversals and crashes, that dropped from the analysis. The number of traversals for the cells left in the analysis ranged from 57 to more than 28,000, and the number of crashes in the cells ranged from 52 to 1,879. The exposure for crashes in each case was based on the 5-year traffic volume and segment length, and the exposure for each of the surrogates was based on the number of traversals in seg- ment and segment length. The same set of explanatory variables was used in each model: • Curve (1 = Yes, 2 = No) • Freeway (1 = Yes, 2 = No) • Area (1 = Rural, 2 = Urban) • Right Shoulder (1 = 0 to 3 ft, 2 = 3+ ft to 8 ft, 3 = 8+ ft) The data used in each of the models for each of the three surrogate candidates are in Appendix C. Results from the three are shown in subsequent subsections. Posterior estimates of the regression parameters are given, and log RR comparisons between crashes and surrogate measures are shown. The real focus of this analysis is the presentation of the log RR differ- ences, which are used to determine if the RRs of crashes and surrogate events are the same under specified conditions. The regression parameters in the crash and surrogate equations are of secondary concern. The regression parameters in the mod- els are shown to give an indication of the effects of the four variables on crashes and surrogate measures. Often, parameter estimates are in the same direction and are of similar mag- nitude in the crash and surrogate regression equations. The regression model was included in the prior specification of the Bayesian model to help smooth estimates of RR, the primary exposure-based risk measure used in the study. In the regression equations, the first level of each indepen- dent variable serves as the baseline case in which parameter estimates are constrained to be 0. That is, for the binary vari- ables Curve, Freeway, and Area, the parameter estimates are associated with the second level of the categorical variables. For example, a negative regression coefficient attached to Curve indicates that a crash or surrogate event is more likely

20 are indicated by the 2.5 and 97.5 percentiles. Log exposure is fit on the right-hand side of the model equations for both the crash and the surrogate regressions. There is some similarity in the directions and the magnitudes for certain variables. For example, in the crash regression, the posterior mean for the Curve variable is -0.642, while in the LDEV regression the value is -0.558. On the basis of coding of the Curve variable, this dif- ference suggests that crashes and LDEV events were more likely on curves. In addition, the Area variables are both negative and of similar magnitude. This similarity suggests the protective effects of urban areas relative to rural areas. The shoulder vari- ables are both positive, although of somewhat differing magni- tudes between the two regressions. The freeway variable is not significant at the 0.05 level in the LDEV regression. At first glance, the signs of the coefficients for the shoulder width seem counterintuitive: apparently crash risk is higher when the shoulders are wider. Care is needed in interpretation. This result does not imply that increasing shoulder width on a particular road segment would increase crash risk. Rather, it indicates that, within the resolution of the statistical model used here, there is a systematic effect that more road depar- ture crashes (as well as more surrogate events) occur under conditions where shoulders are wider than where shoulders are narrower. Note that a single model has been used for both urban and rural areas, and only a limited set of highway variables has been included. In urban areas, with high traf- fic density, shorter journey distances and occasional conges- tion, single-vehicle road departure crashes are relatively rare; curbs typically define the road edges (shoulder width is zero) and risk is low. On rural highways with higher traffic speeds (and shoulders present) the risk is expected to be higher. It is not surprising that the road segments of higher risk for road departure crashes are also areas with shoulders present. And if wider shoulders tend to be associated with limited access highways, higher speeds and longer journey times, again the association with higher risk is not too surprising. The urban/ rural area variable is expected to account for some of this variation (crash risk being lower for urban areas) but if the effect noted is particularly strong and the population-based “area” variable in only partially correlated with road condi- tions, it is not surprising that the presence of shoulders is associated with higher crash risk. Clearly it would be fruit- ful to increase the number of explanatory variables so that the shoulder variable is not confounded with other factors, and in the future, with larger data sets, this is entirely fea- sible. It will also be beneficial to implement separate models for urban and rural areas. (In this project, given the limited volume of driving data, it was more feasible to combine the two areas in a single model.) Thus, on the basis of the large- scale naturalistic driving study, the team expects confound- ing effects to be removed, and the shoulder-width coefficients will provide a more direct indicator of relative risk. on a curve than on a segment of highway defined as not on a curve. The Right Shoulder variable has two regression parameters corresponding to the second level (3+ ft to 8 ft) and third level (8+ ft) of that variable. The log RR of a crash and of each candidate surrogate were calculated from the three models for a road segment with a curve compared to a road segment without a curve on a nonfreeway rural road with shoulders greater than 3 ft but less than 8 ft. If the log RR of a crash and a candidate sur- rogate are the same, then it is argued that the candidate is a good surrogate for the crash. Accordingly, for each model a sample is generated from the posterior distribution of the log RR difference by using MCMC simulation. The hypothesis of interest is whether 0 is contained in the middle 95% of this distribution. Note that one expects crash rates to be consid- erably smaller than rates derived from surrogate measures. This difference makes the RR an attractive exposure-based measure, since rates are not compared on an absolute scale but on a relative scale that compares the risk of an event on a curve to the risk of an event not on a curve. Lateral Deviation Table 5.3 shows posterior estimates from the regression parameters in the LDEV model. The table also includes esti- mates that describe the middle 95% of the distributions which Table 5.3. Posterior Regression Estimates for Bayesian SUR LDEV Model Crash Parameter Mean SD 2.5% 97.5% Intercept 2.095 0.419 1.263 2.933 Log exposure 0.469 0.042 0.386 0.554 Curve -0.642 0.072 -0.782 -0.500 Freeway 0.262 0.126 0.012 0.510 Area -0.534 0.216 -0.967 -0.113 Shoulder2 0.523 0.129 0.267 0.778 Shoulder3 0.327 0.145 0.040 0.615 LDEV Parameter Mean SD 2.5% 97.5% Intercept 3.981 0.203 3.569 4.374 Log exposure 0.553 0.030 0.494 0.613 Curve -0.558 0.062 -0.680 -0.433 Freeway -0.153 0.079 -0.306 0.005 Area -0.568 0.141 -0.849 -0.290 Shoulder2 0.658 0.090 0.477 0.836 Shoulder3 0.794 0.103 0.594 1.006 Note: SD = standard deviation.

21 curve to no curve while holding other variables in the model fixed. The mean of this distribution is 0.38 with a 95% confi- dence interval of (0.15, 0.61). Since 0 is not contained in the confidence interval, the conclusion is that lane deviation is a poor surrogate for lane departure crashes. Lane Departure Warning Table 5.4 shows posterior estimates from the regression param- eters in the LDW model. In this model, the coefficients for the Curve variable are also negative, suggesting that crashes and The real focus of this analysis, however, is not on the regression parameters, but the RR measures, which depend on the estimated posterior rates. One of the advantages of this Bayesian model is that the posterior rates tend to a weighted average of the observed rates and those estimated from the regression equation shown in Table 5.3. Therefore, if the regres- sion equation displays lack of fit, the posterior estimates tend to the observed rates. Figure 5.3 shows histograms of samples of size 30,000 drawn from the posterior distributions of the log RRs for crashes and the LDEV surrogate. The compari- son is between curved road segments and those that are not curved while holding the freeway, area, and shoulder vari- ables fixed as described above in the section SUR Model Application and Results. The estimate for the log crash RR is 1.15 with a 95% confidence interval of (0.98, 1.33). The estimate for the log LDEV RR is 0.77 with a 95% confi- dence interval of (0.63, 0.92). Both of the confidence inter- vals do not contain 0, suggesting that the risks of crashes and LDEV events are greater on curves holding the other variables fixed. Figure 5.4 shows the posterior distribution of the log RR dif- ference between crashes and the LDEV surrogate comparing Log RR Crash 0.9 1.0 1.1 1.2 1.3 1.4 Log RR LDEV 0.5 0.6 0.7 0.8 0.9 1.0 Figure 5.3. Posterior distributions of log RR comparing curve to no curve for crash and LDEV. Log RR Difference D en si ty 0.0 0.2 0.4 0.6 0.8 0. 0 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 Figure 5.4. Posterior distribution of log RR difference between crash and LDEV comparing curve to no curve. Table 5.4. Posterior Regression Estimates for Bayesian SUR LDW Model Crash Parameter Mean SD 2.5% 97.5% Intercept 1.918 0.420 1.044 2.720 Log exposure 0.463 0.039 0.391 0.546 Curve -0.629 0.069 -0.766 -0.494 Freeway 0.580 0.213 0.169 1.018 Area -0.240 0.234 -0.729 0.194 Shoulder2 0.486 0.119 0.258 0.728 Shoulder3 0.315 0.134 0.057 0.591 Freeway × area -0.367 0.195 -0.764 0.008 LDW Parameter Mean SD 2.5% 97.5% Intercept 1.536 0.654 0.285 2.833 Log exposure 0.422 0.087 0.238 0.589 Curve -0.522 0.174 -0.864 -0.184 Freeway 0.866 0.498 -0.071 1.885 Area 0.428 0.580 -0.709 1.565 Shoulder2 0.388 0.250 -0.109 0.870 Shoulder3 0.643 0.293 0.062 1.212 Freeway × area -0.964 0.485 -1.919 -0.045

22 Time to Edge Crossing Table 5.5 shows posterior estimates from the regression param- eters in the TTEC model. In terms of regression estimates between the crash and surrogate measure, this model shows the best agreement among the three models. The log exposure, curve, area, and shoulder variables are not only in the same direction between the crash and the TTEC regressions, but the magnitudes also tend to be reasonably close. Note that the intercept is of no interest because it only captures the difference on an absolute scale of numbers of crashes and TTEC events. All parameters in the crash regression are significant. The free- way and area variables in the TTEC regression equation do not meet the significance criteria at 0.05. LDW events are more likely on curved road segments. The shoulder coefficients are also both positive. This model con- tains an interaction term between the freeway and area vari- ables that is significant in the LDW regression equation and marginally significant in the crash regression. The negative coefficients suggest that the additive effects of freeway and area are somewhat reduced in urban areas when not on a freeway. Figure 5.5 shows histograms of samples of size 30,000 drawn from the posterior distributions of the log RRs for crashes and the LDW surrogate. The estimate for the log crash RR is 1.00 with a 95% confidence interval of (0.84, 1.16). The estimate for the log LDW RR is 1.09 with a 95% confidence interval of (0.65, 1.53). Both of the confidence intervals do not contain 0, sug- gesting that the risks of crashes and LDW events are greater on curves holding the other variables fixed. Figure 5.6 shows the posterior distribution of the log RR difference between crashes and the LDW surrogate compar- ing curve to no curve while holding other variables in the model fixed. The mean of this distribution is -0.08 with a 95% confidence interval of (-0.51, 0.33). The 95% confi- dence interval for the log RR difference includes 0, indicating that LDW could be useful as a surrogate for crashes on rural nonfreeway roads. Log RR Crash 0.7 0.8 0.9 1.0 1.1 1.2 1.3 Log RR LDW 0.50 0.75 1.00 1.25 1.50 1.75 Figure 5.5. Posterior distributions of the log RR comparing curve to no curve for crash and LDW. -0.75 -0.50 -0.25 0.00 0.25 0.50 Figure 5.6. Posterior distribution of the log RR difference between crash and LDW comparing curve to no curve. Table 5.5. Posterior Regression Estimates for Bayesian SUR TTEC Model Crash Parameter Mean SD 2.5% 97.5% Intercept 2.017 0.438 1.136 2.836 Log exposure 0.478 0.045 0.394 0.567 Curve -0.638 0.077 -0.787 -0.488 Freeway 0.285 0.130 0.038 0.545 Area -0.579 0.230 -1.033 -0.129 Shoulder2 0.541 0.135 0.283 0.805 Shoulder3 0.351 0.152 0.063 0.653 TTEC Parameter Mean SD 2.5% 97.5% Intercept 4.557 0.341 3.843 5.213 Log exposure 0.464 0.054 0.362 0.576 Curve -0.594 0.100 -0.788 -0.399 Freeway 0.072 0.150 -0.218 0.368 Area -0.469 0.259 -0.996 0.026 Shoulder2 0.462 0.150 0.172 0.765 Shoulder3 0.466 0.180 0.119 0.829

23 intervention such as road widening, improving lane markings, changing signage or adding rumble strips can be evaluated by using the surrogate in a relatively short space of time. The effect on RR then represents a predicted safety benefit. While data still need to be collected to evaluate the effect on the surrogate, this approach is potentially much more useful, sensitive, and repeat- able than counting crashes at a single “treated” location. This chapter explored a statistical approach for testing can- didate surrogate measures for road departure crashes with the type of NDD and highway data that will be available from the SHRP 2 Safety projects. The focus was on identifying appro- priate analysis methods coupled to mechanisms and control performance, not just on statistical associations. Three alter- native surrogates were tested by using the SUR approach and using the same four explanatory variables—area type, road type, horizontal curvature, and shoulder width—that were used in the SUR analyses of the three candidate surrogates. These variables were selected a priori for this exploratory study because they are known to be associated with road departure crashes. Furthermore, classification of these vari- ables was limited by available data. The results obtained were similar to what would be obtained from multivariate response model. The use of the same variables for both the crash and surrogate portions of the model was done for simplic- ity during the testing of the approach. The SUR framework developed in this section can accommodate the more general case, that is different variables can be used in the crash and surrogate portions of the model, bridging crash data with naturalistic data in a highway context. In particular, vari- ables relating to specific driver behaviors are only available in the naturalistic case, so this flexibility in the SUR approach should prove valuable in the future. The large natural use data from SHRP 2 should provide a richer selection of explanatory variables and finer classification of explanatory variables for expanded analysis within this framework. The team’s analysis has shown that the SUR approach is well suited for screening surrogates. Of the three surrogates evaluated, TTEC appears to be the best. However, the team’s analysis was exploratory, and better surrogates may exist. Figure 5.7 shows histograms of samples of size 30,000 drawn from the posterior distributions of the log RRs for crashes and the TTEC surrogate. The estimate for the log crash RR is 1.00 with a 95% confidence interval of (0.82, 1.18). The estimate for the log TTEC RR is 1.12 with a 95% confidence interval of (0.83, 1.36). Both of the confidence intervals do not contain 0, suggesting that the risks of crashes and TTEC events are greater on curves holding the other variables fixed. Figure 5.8 shows the posterior distribution of the log RR difference between crashes and the TTEC surrogate comparing curve to no curve while holding other variables in the model fixed. The mean of this distribution is -0.11 with a 95% confi- dence interval of (-0.40, 0.18). The 95% confidence interval for the log RR difference includes 0, indicating that TTEC could be useful as a surrogate for crashes on rural nonfreeway roads. Overall, from the three example analyses, it is seen that differ- ent candidate surrogates have different qualities in terms of their fidelity to the crash model. In the figures above, comparing log RR differences, as well as signs and magnitudes of the regression parameters, it is clear that LDEV is the worst candidate of the three, and TTEC is the best candidate, while LDW is intermedi- ate. The results not only help confirm surrogacy but also provide a possible tool for guiding future studies in reducing risk: given a valid surrogate—one that really mimics RR in crashes—an Log RR Crash 0.6 0.8 1.0 1.2 1.4 Log RR TTEC 0.6 0.8 1.0 1.2 1.4 1.6 Figure 5.7. Posterior distributions of the log RR comparing curve to no curve for crash and TTEC. -0.6 -0.4 -0.2 0.0 0.2 0.4 Figure 5.8. Posterior distribution of the log RR difference between crash and TTEC comparing curve to no curve.

Next: Chapter 6 - Statistical Analysis: An Approach Using Extreme Value Theory »
A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors Get This Book
×
 A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s second Strategic Highway Research Program (SHRP 2) Report S2-S01C-RW-1: A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors explores analysis methods capable of associating crash risk with quantitative metrics (crash surrogates) available from naturalistic driving data.

Errata: The foreword originally contained incorrect information about the project. The text has been corrected in the online version of the report. (August 2013)

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!