Read "Safety Performance of Part-Time Shoulder Use on Freeways, Volume 2: Conduct of Research Report" at NAP.edu

« Previous: Chapter 6: Crash Severity and Crash Type Distributions

Page 146

Suggested Citation:"Chapter 7: Advanced Crash Prediction Models." National Academies of Sciences, Engineering, and Medicine. 2021. Safety Performance of Part-Time Shoulder Use on Freeways, Volume 2: Conduct of Research Report. Washington, DC: The National Academies Press. doi: 10.17226/26393.

Page 147

Page 148

Page 149

Page 150

Page 151

Page 152

Page 153

Page 154

Page 155

Page 156

Page 157

Page 158

Page 159

Page 160

Page 161

Page 162

Page 163

Page 164

Page 165

Page 166

Page 167

Page 168

Page 169

Page 170

Page 171

Page 172

Page 173

Page 174

Page 175

Page 176

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

148 C H A P T E R 7 -ADVANCED CRASH PREDICTION MODELS Advanced Crash Prediction Models This chapter describes the findings obtained during the development of random parameters (RP), and latent class (LC) crash prediction models (CPMs) for freeways with part-time shoulder use (PTSU). This chapter also includes a re-estimation of the fixed parameters (FP) CPM from Chapter 5 using an alternative modeling approach. The models are used to predict the average crash frequency associated with one direction of travel on a freeway, and are applicable to freeway segments, ramp entrance speed-change lane sites, and ramp exit speed-change lane sites. The definitions of these site types are provided in Chapter 3 of this report. Note that the term âparameterâ is synonymous with the terms âcoefficientâ and âregression coefficient,â as used in this document. The RP and LC models described in this chapter are not recommended for inclusion in the HSM. They are most effective when used to evaluate sites within the dataset from which they were developed. Thus, they could be used effectively for network screening applications. However, they must be reduced to their fixed parameter equivalents when evaluating sites outside of the dataset from which they were developed. In this manner, a âreducedâ RP or LC model could be used to evaluate design alternatives for a proposed site but this use does not take full advantage of the capabilities of these models. The chapter is organized into the following sections: ï· Methodological Backgroundâ provides background information associated with the FP, RP, and LC models estimated. ï· Highway Safety Databaseâsummarizes the data used for CPM estimation, including descriptive statistics of the dependent and candidate independent variables. ï· Model Estimation Resultsâshows the modeling results and offers commentary on the findings. ï· Model Comparisonsâprovides comparisons of the full models and presents a vision for integrating more advanced statistical models into future editions of the HSM. ï· Validationâestimates a subset or training model using data from four states, and then tests the model using data from Georgia. ï· Empirical Fixed Parameters Modelâpresents a re-estimation of the FP model in Chapter 5 using an empirical modeling approach. Methodological Background This section describes the FP, RP, and LC modeling methodologies. For each methodology, fatal-and- injury (FI) and property-damage-only (PDO) crashes were modeled separately; the reasons for this were previously discussed in Chapter 3. Freeway segment, ramp entrance speed-change lane, and ramp exit speed-change lane site types were modeled together, for the sake of maximizing the sample size. Initial exploration of RP and LC models considered each of these site types separately, like the CPMs in Chapter 5 and Chapter 18 of the HSM Supplement. However, the small sample sizes of ramp entrance speed-change lane sites and ramp exit speed-change lane sites created challenges that were best overcome by modeling them together with freeway segments and identifying the specific site type with indicator variables.

149 Fixed Parameters Model Regression analysis was used to estimate a cross-sectional model of expected FI and PDO crash frequencies with this projectâs dataset. The error distribution of the residual error was assumed to have a negative binomial distribution. This distribution addresses the overdispersion commonly found in crash data, in which the variance of the reported crash frequency exceeds the mean (Miaou 1994; Shankar et al. 1995). The model takes the following log-linear functional form: Equation 175 ln ð ð½ð ð where Î»i is the expected number of annual crashes at site i; Î² is a vector of estimable regression parameters with J + 1 elements; Xi is a vector of J geometric design, traffic volume, and other site- specific data for site i (i.e., independent variables); and Îµi is the gamma-distributed error term. The probability function that describes the negative binomial distribution is: Equation 176 Pr ð¦ ð¦ Î ð¦ 1 ð¼ y! Î 1ð¼ ð ð 1ð¼ 1 ð ð¼ where Î â is the gamma function, yi is the observed (reported crash count) outcome for site i, and Î± is the overdispersion parameter. The parameters Î²j are estimated using the maximum likelihood method, which identifies the parameters that maximize the product of the probabilities defined in Equation 176 for all sites in the dataset. The standard FP model assumes that the impacts of each parameter (i.e., the Î² values in Equation 175) are the same for each site (i.e., effects are fixed). The resulting FP model determined which variables were marginally significant in a log-linear model. When estimating the RP or LC model, these variables were subsequently carried forward into the RP and LC model. Random Parameters Model In the RP approach, the same basic form of the regression model in Equation 175 is used. However, in this approach the parameters Î² are allowed to differ across sites ð and are described using both a fixed and random component (Anastasopoulos and Mannering 2009; Washington et al. 2020). The two components are shown in the following equation: Equation 177 ð½ , ð½ ð , where Î²i,j is the ðth parameter applied to site i, ð½ represents the mean value for the jth parameter and Ïi,j is a randomly distributed term with some known distribution. Not all of the parameters Î² must be treated as random; those that are identified in the modeling process as not having significant variation across sites may be modeled as having fixed-effects, similar to the traditional FP modeling approach. So, for example, one feature (e.g., shoulder width) could have the same impact across all sites, while another (e.g., lane width) has a different impact for each individual site. Recent examples of RP models in traffic safety research include the study of continuous lighting on freeway segments (van Schalkwyk et al. 2016), road surface conditions on rural highway safety (Chen et al. 2017), crash frequency on urban streets (Liu et al. 2018), and lane width along urban/suburban arterials (Rista et al. 2018). The inclusion of the random terms Ïi,j in the Î² parameters makes the likelihood function of the RP model computationally intensive; thus, the model cannot be estimated using traditional maximum likelihood approaches. Instead, the parameters Î² (including both the fixed and random components) are estimated using simulation-based statistical methods.

150 The dependent and candidate independent variables used in the RP and LC CPMs are summarized in a subsequent section. The model specification in the RP and LC models is the same as the FP model, aside from the testing of independent variables as random parameters. The primary benefit of the RP approach over the traditional FP approach is that the former better accounts for unobserved heterogeneity in the dataset. This unobserved heterogeneity represents factors that are not possible to incorporate into the model, but which may vary across sites and influence safety performance. For example, unobserved heterogeneity might result from site characteristics not available for inclusion in the model (e.g., vertical curvature for the roadway segments are not measured or available) or features that cannot be observed (e.g., different levels of enforcement at nearby locations). The result is model parameters that are more precisely estimated than those that do not account for the unobserved heterogeneity, and thus that might more accurately reflect the relationship between the independent variables and expected or predicted crash frequency. Estimating a RP model requires more computational resources because the model parameters must be estimated using simulation-based approaches. FP models are estimated using a maximum likelihood approach that has well-defined and computationally efficient methods to establish convergence and generate parameter estimates. By contrast, RP models are estimated using a simulation-based maximum likelihood approach, where a mixing distribution is applied to parameter estimates to account for unobserved heterogeneity. This distribution of the random parameters incorporated in the model (specifically, the distribution of the random components, Ïi,j) must be pre-specified when the model is being estimated. The most commonly specified distribution is the normal distribution, which can be empirically compared to other assumed distributions; however, it is difficult for an analyst to know if the chosen distribution accurately reflects the true distribution of the random parameters in practical applications (although including heterogeneity in the means and variances of the random parameters can potentially better approximate the unobserved heterogeneity) (Mannering et al. 2016). The RP model described in this chapter is estimated using the simulation-based likelihood approach, which implements 200 Halton draws (i.e., random samples) with a normal distribution for the random parameters, consistent with the model estimation process recommended by Anastasopoulos and Mannering (2009). Latent Class Model LC models (referred to also as finite mixture models) are applied when the data in a sample may be generated from separate groups, but the number of groups (i.e., classes) and group affiliation (i.e., which class a specific site is assigned to) is unknown. In this framework, a random variable is assumed to be selected from a population (or group) which is an additive mixture of C distinct subpopulations (or subgroups) in proportions Ï1, Ï2, â¦, ÏC, where â ð 1,ð 0 (j = 1, 2, â¦, C) (Cameron and Trivedi 1998). The probability density of a finite mixture model is described using the following equation: Equation 178 ð ð¦ |Î ð ð ð¦ ð ð ð ð¦ |ð The mixing probabilities (Ïj) are estimated along with all of the other parameters, Î¸. The component distributions in a C-class finite mixture negative binomial regression model are as follows (Deb and Trivedi 1997): Equation 179 ð ð¦ Î ð¦ ð ,Î ð , Î ð¦ 1 ð , ð , ð , , ð , ð , ð ,

151 where: j = 1, 2, â¦, C are the latent classes, Î»j,i = exp(xâiÎ²j) and Ïj,i = (1/Î±j)Î»kj,i. Estimation of LC models is based on the method of expectation maximization, which is a method to estimate the maximum likelihood. In traffic safety research, most empirical investigations of LC models focus on two classes. If the mixing probabilities (Ïj) are given, the posterior probability that observation yi belongs to the population j is described by the following equation (Deb and Trivedi 1997): Equation 180 ðð ð¦ ð ðððð¢ððð¡ððð ð ð ð ð¦ ð¥ , ðâ ð ð ð¦ ð¥ ,ð An important issue associated with LC models is confirming the presence of mixing within the sample of data, and then selecting the appropriate number of classes in the sample. Lindsay and Roeder (1992) proposed a graphical method to inspect the sample for the presence of a mixture. If a mixture is present, a likelihood ratio test can be used to compare the models, where the null hypothesis is that the number of classes is zero. If a mixture is not present, other modeling techniques (FP, RP, etc.) can be used. Examples of LC model applications in traffic safety include Park and Lord (2009), who considered models of crash frequency using signalized intersection data from Toronto. The authors found that the data may contain two subpopulations. Park et al. (2014) found that a two-class LC model accounted for heterogeneity common in single-class FP models in a hotspot identification analysis for rural multi-lane highways in California and Texas. Park et al. (2016) considered the application of two-class LC models to develop crash modification factors for median width and shoulder widths. Yu et al. (2019) considered an LC modeling approach to model driver injury severity in single-vehicle crashes using data from the state of Washington. For the CPMs estimated in this study, the dependent and candidate independent variables used in the LC model are the same as those used in the FP and RP models. The same general form of the model is also held consistent across all modeling frameworks. This provides the most direct comparison of the predictive capabilities of the three modeling frameworks. Statistical Modeling Principles The FP models described in Chapter 5 represent the research teamâs recommended CPMs for freeway safety evaluation. They were estimated with non-linear regression. Estimation of RP or LC models with such non-linearities is an open research challenge within the statistical modeling field at this time. Thus, the nonlinearities in the Chapter 5 CPM models could not be incorporated into RP or LC models. For these reasons, an FP model that maintains the log-linear relationship in Equation 175 was re-estimated for direct application to RP and LC models. This FP model is an intermediate step in the creation of the RP and LC models. It is not intended to supersede the FP CPM in Chapter 5. The statistical modeling process used to develop the log-linear FP model was guided by the following principles: 1. Independent variables were added to the model one-by-one. The sign of the variable was assessed to confirm that it was consistent with expectations and the existing body of literature. If the variable was consistent with these theoretical and practical expectations, it was retained in the model. 2. Any variables that were highly correlated were examined to determine if including both variables in the model affected the sign or magnitude of the other, correlated variable. If including both variables in the model affected the sign (see step #1 above), one of the correlated variables was removed from the model. 3. The addition of an independent variable had to improve the log-likelihood for the model (improve the statistical fit).

152 4. When the model was developed using steps #1 through #3, the statistical significance of the variables was examined. Independent variables that were of particular relevance for the present study (e.g., presence of a PTSU facility) were retained in the model, regardless of statistical significance. Other variables that were not at least marginally significant (p-value > 0.50) were removed until a final model resulted. The same principles were repeated when the RP and LC CPMs were estimated. However, only independent variables included in the FP model were added in step #1. In an HSM context, independent variables remaining in the model at the end of this process are considered SPF adjustment factors (AFs). The total number of FI and PDO crashes at each site was used as the dependent variables in the models. The siteâs traffic volume and geometric design element dimensions were considered as candidate independent variables for inclusion in the model. For each model that was estimated, the relationship between expected crash frequency and site length was tested to see if it was proportional. If the relationship is proportional, then the site length was treated as an offset variable in the model. Additionally, since the number of crashes observed at each location was for some predefined time period (e.g., between 3 to 5 years), the amount of time that crash data were available at each location was also included as an offset variable to ensure that the model accurately predicted annual average crash frequencies. Highway Safety Database The database used to develop the CPMs is described in Chapter 4. In summary, the database consists of 728 study sites in Georgia, Hawaii, Minnesota, Ohio, and Virginia totaling 164.8 miles. About 25 percent of the mileage consists of facilities with PTSU operation during 1 or more hours of the day. The following candidate independent variables were explored in models: ï· Site length (miles) ï· Directional AADT (vehicles per day) ï· Number of lanes per direction ï· Degree of horizontal curve ï· Lane width (feet) ï· Inside shoulder width (feet) ï· Outside shoulder width (feet) ï· Proportion of site with inside rumble strips ï· Proportion of site with outside shoulder rumble strips ï· Proportion of site with outside barrier ï· Proportion of site with inside barrier ï· Offset to inside barrier (feet) ï· Offset to outside barrier (feet) ï· Width of unobstructed median (feet) ï· Proportion of time that PTSU is open ï· Distance to nearest downstream exit ramp (miles) ï· Distance to nearest upstream entrance ramp (miles) ï· Downstream exit ramp AADT (vehicles per day) ï· Upstream entrance ramp AADT (vehicles per day) In addition to the candidate independent variables in the previous list, several factors were created for some of the cross-sectional variables in order to account for combinations of elements within the same sites, or to create functional forms consistent with the HSM Supplement. These factors are described in the following equations:

153 Equation 181 ð min ð , 12 â 10 where fos is the factor for outside shoulder width; and Wos is the outside shoulder width in feet. The value of 10 is the baseline outside shoulder width. Equation 182 ð min ð , 12 â 6 where fis is the factor for inside shoulder width; Wis is the inside shoulder width in feet. The value of 6 is the baseline inside shoulder width. Equation 183 ð min ð , 13 â 12 where flw is the factor for lane width; Wl is the average width of all full-time travel lanes. The value of 12 is the baseline lane width. Equation 184 ð ð 1 â ð ð ð where fmw is the factor for median width; Wum is the non-shoulder part of the median width in feet, Pib is the proportion of the site length with a median barrier; and Wicb is offset to median barrier in feet for sections with median barrier. This factor represents the average width of the traversable median with no barrier and the median offset when barrier is present. Equation 185 ð ðmax ð , 0.75 where frb is the factor for roadside barrier; Pob is the proportion of the site length with an outside (roadside) barrier; and Wocb is the offset to the outside (roadside) barrier in feet. Model Estimation This section presents FP, RP, and LC CPMs for FI and PDO crash frequencies. FI CPMs estimate the sum of crashes with K, A, B, and C severities on the KABCO scale. The three FI CPMs are presented first, followed by the three PDO CPMs. The modeling effort described in this section used the full dataset collected for all of the 728 directional sites. A later modeling effort, described in the Validation section of this chapter, used data from only four of the five states, re-estimated the CPMs, and then applied the CPMs to the âhold outâ data from the fifth state, Georgia. This validation approach is a relatively common means of assessing FP models but presents challenges for RP and LC model validation. These challenges are discussed later in this chapter. Fatal-and-Injury Crash Frequency Prediction Models Fixed Parameters Model The FP CPM for fatal and injury crashes is shown in Table 61.

154 Table 61. Fixed parameters model for FI crashes. Variable Parameter Std. Error t-statistic p-value Intercept â4.695 0.564 â8.332 <0.001 Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.491 0.123 12.112 <0.001 Lane width factor (flw) â0.083 0.112 â0.742 0.458 Median width factor (fmw) â0.008 0.004 â1.999 0.046 Inside shoulder width factor (fis) â0.036 0.016 â2.225 0.026 Outside shoulder width factor (fos) â0.031 0.017 â1.820 0.069 Proportion of site with inside shoulder rumble strips â0.162 0.147 â1.104 0.270 Proportion of site with outside shoulder rumble strips â0.182 0.151 â1.201 0.230 Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.148 0.099 â1.492 0.136 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) 0.060 0.110 0.546 0.585 Indicator for PTSU on left-side of site (1 if present; 0 otherwise) 0.049 0.156 0.316 0.752 Indicator for PTSU on right-side of site (1 if present; 0 otherwise) 0.148 0.111 1.331 0.183 Proportion of day that PTSU is open 0.259 0.491 0.528 0.597 Indicator for freeway facility in Hawaii (1 if site is located in Hawaii; 0 otherwise) â0.271 0.231 â1.175 0.240 Indicator for freeway facility in Minnesota (1 if site is located in Minnesota; 0 otherwise) 0.142 0.199 0.711 0.477 Indicator for freeway facility in Ohio (1 if site is located in Ohio; 0 otherwise) 0.983 0.172 5.720 <0.001 Indicator for freeway facility in Virginia (1 if site is located in Virginia; 0 otherwise) 0.620 0.153 4.039 <0.001 Overdispersion parameter (ï¡) 0.594 0.048 12.406 <0.001 Number of observations: 728 Log-likelihood at convergence: â1,791.9 Log-likelihood (intercept only): â2419.8 McFadden Pseudo R2: 0.2595 veh/day = vehicles per day The relative effects of each variable in Table 61 may be determined using the following equation: Equation 186 100 ð 1 where Î² is the value in the âparameterâ column of Table 61. For example, the relative effect of a site being in Hawaii is 100 ð . 1 = â23.7 percent. In other words, the CMP predicts 23.7 percent fewer crashes on a site in Hawaii than a site in the âbase conditionâ state for which an indicator variable was not used, which in this case is Georgia. Presented as an AF, the effect of a site being in Hawaii is 0.763 [= (100â23.7)/100]. Variables not included in the model were either removed because their parameter was highly insignificant (p-values > 0.5) or they had a value that was inconsistent with existing safety literature. In some cases, variables with p-values > 0.5 were retained because either (1) their effect on crash frequency is established in the HSM or other literature or (2) they were of particular interest to this project (i.e., PTSU-related variables). Creation of this log-linear FP model was an intermediate step in the creation of

155 RP and LC models, so the research team erred on the side of retaining variables to enable their exploration in RP and LC models. Random Parameters Model The RP model for FI crashes is shown in Table 62. The model was estimated using the same specification as the FP model. The variables with random parameters are lane width factor, inside and outside shoulder width factors, proportion of the site length with inside shoulder rumble strips, presence of entrance and ramp exit speed- change lanes, and the Minnesota and Ohio state indicators. The natural logarithm of the directional annual average daily traffic, median width factor, proportion of the site length with outside shoulder rumble strips, presence of PTSU lanes on the left- and right-hand side of the site, proportion of the day that the PTSU lane is open, and the Hawaii and Virginia indicators all have fixed parameters. Among the fixed parameters shown in Table 62, the proportion of the site length with outside shoulder rumble strips, the PTSU left-side indicator, and the proportion of the day that the PTSU lane is open were not statistically significant. The natural logarithm of the directional annual average daily traffic, median width factor, right-side PTSU presence was statistically significant. The fixed parameters in the RP model are similar to the same parameters in the FP model shown in Table 61, and so is their interpretation.

156 Table 62. Random parameters model for FI crashes. Variable Parameter Std. Error t-statistic p-value Intercept â4.842 0.472 â10.265 <0.001 Standard deviation for intercept 0.215 0.028 7.543 <0.001 Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.493 0.102 14.649 <0.001 Lane width factor (flw) â0.152 0.094 â1.616 0.106 Standard deviation for lane width factor 0.362 0.075 4.826 <0.001 Median width factor (fmw) â0.008 0.004 â2.335 0.020 Inside shoulder width factor (fis) â0.038 0.013 â2.859 0.004 Standard deviation for inside shoulder width 0.027 0.008 3.590 <0.001 Outside shoulder width factor (fos) â0.028 0.012 â2.248 0.025 Standard deviation for outside shoulder width 0.021 0.008 2.770 0.006 Proportion of site with inside shoulder rumble strips â0.323 0.102 â3.174 0.002 Standard deviation for inside shoulder rumble strips 0.421 0.048 8.719 <0.001 Proportion of site with outside shoulder rumble strips â0.011 0.117 â0.096 0.923 Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.249 0.094 â2.635 0.008 Standard deviation for ramp entrance speed-change lane 0.849 0.153 5.540 <0.001 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) 0.060 0.110 0.546 0.585 Standard deviation for ramp exit speed-change lane 0.409 0.164 2.493 0.013 Indicator for PTSU on left-side of site (1 if present; 0 otherwise) 0.024 0.117 0.205 0.837 Indicator for PTSU on right-side of site (1 if present; 0 otherwise) 0.203 0.087 2.343 0.019 Proportion of day that PTSU is open 0.042 0.402 0.105 0.916 Indicator for freeway facility in Hawaii (1 if site is located in Hawaii; 0 otherwise) â0.260 0.167 â1.551 0.121 Indicator for freeway facility in Minnesota (1 if site is located in Minnesota; 0 otherwise) â0.102 0.154 â0.661 0.509 Standard deviation for Minnesota 1.396 0.111 12.672 <0.001 Indicator for freeway facility in Ohio (1 if site is located in Ohio; 0 otherwise) 0.990 0.140 7.069 <0.001 Standard deviation for Ohio 0.558 0.096 5.801 <0.001 Indicator for freeway facility in Virginia (1 if site is located in Virginia; 0 otherwise) 0.753 0.110 6.871 <0.001 Inverse overdispersion parameter (1/ï¡)a 3.836 0.390 9.826 <0.001 Number of observations: 728 Log-likelihood at convergence: â1,773.1 Log-likelihood (intercept only): â11,634.8 McFadden Pseudo R2: 0.8476 a This is the inverse of the over-dispersion parameter produced from an FP model. For example, in this RP model, the over- dispersion parameter that would be used as a part of an EB analysis is 1/3.836 = 0.261.

157 The CPM in Table 62 can be represented in equation form as follows: Equation 187 ð , ð¦ ð ð.ððð ð¿ ð´ð´ð·ð1000 . ð ð.ððð ð . ð ð.ððð ð ð.ððð ð ð.ððð ð . ð ð.ððð ðð.ððð ð . ð . ð . ð . ð ð.ððð ðð.ððð ð . where Np, fi = predicted average crash frequency of a site for fatal and injury severity during the analysis period (crashes); y = analysis period (years); L = length of site (miles); AADT = directional annual average daily traffic volume of site (veh/day); Pir = proportion of site length with a rumble strips present on the inside shoulder; Por = proportion of site length with rumble strips present on the outside shoulder; Ien = indicator variable for a ramp entrance speed-change lane site (=1.0 if site is an entrance speed- change lane, 0.0 otherwise); Iex = indicator variable for a ramp exit speed-change lane site (=1.0 if site is an exit speed-change lane, 0.0 otherwise); Iptsul = indicator variable for a site with left-side PTSU (=1.0 if site has left-side PTSU, 0.0 otherwise); Iptsur = indicator variable for a site with right-side PTSU (=1.0 if site has right-side PTSU, 0.0 otherwise); Pptsuo = proportion of day that PTSU is open; IHI = indicator variable for a site located in Hawaii (=1.0 if in Hawaii, 0.0 otherwise); IMN = indicator variable for a site located in Minnesota (=1.0 if in Minnesota, 0.0 otherwise); IOH = indicator variable for a site located in Ohio (=1.0 if in Ohio, 0.0 otherwise); and IVA = indicator variable for a site located in Virginia (=1.0 if in Virginia, 0.0 otherwise). The second, third, and fourth terms in Equation 187 compute the predicted average fatal and injury crash frequency for a site with base conditions for 1 year (i.e., it is the safety performance function portion of the CPM). The fifth term and beyond are adjustment factors that change the average fatal and injury crash frequency predicted with the safety performance function to account for site-specific conditions. For purposes of writing Equation 187, the mean value of each random parameter was used. These values are shown in bold, but in reality they vary from site to site (i.e., they are random, not fixed). It should be noted that using the mean parameters of a RP model for out-of-sample predictions, and not accounting for the variation in the random parameter, potentially introduces significant error in the predictions. Because the mean number of crashes is an exponential function of the random parameter, the variation of each random parameter enters into the conditional mean function. Thus, evaluation of the model at the simple means of the parameters would likely be systematically biased relative to the mean evaluated at the simple mean plus the random variation. This issue is further discussed in the Validation section. The relationship between crash frequency and traffic demand for a freeway segment is illustrated in Figure 43. The analysis period is 1 year, and all adjustment factors are set at 1.00 (i.e., the segment has base conditions). The trend in Figure 43a corresponds to the model in Table 62, and the trends in Figure 43b correspond to the model in the HSM Supplement (AASHTO 2014). The two lines in Figure 43a represent the crash frequency when computed with the upper and lower bounds of the 95th percentile of the value of the intercept term.

158 a. Proposed models. b. HSM models. Figure 43. Estimated freeway segment model for FI crashes, RP and HSM Supplement models. The following sections discuss each of the AFs in Equation 187 and the associated data in Table 62. Lane Width. The lane width AF is modeled with a factor shown in Equation 183. The factor is constructed such that the base condition is an average lane width of 12 feet for full-time travel lanes. The lane width factor is normally distributed with a mean of â0.152 and a standard deviation of 0.362. Using the mean parameter value of â0.152, lane widths of less 13 feet are associated with increased crash frequency, a trend that is consistent with the HSM Supplement and the FP model in Chapter 5. Crashes remain consistent with any lane width greater than 13 feet. However, because the lane width AF has a random parameter, proper interpretation of the AF must consider the range of values for the parameter, not just the mean value. Thus, the AF for lane width will vary from site to site to account for unobserved heterogeneity. Given this mean of â0.152 and a standard deviation of 0.362 and using a normal distribution table, 33.7 percent of the PTSU sites have a parameter greater than zero, meaning that increasing the lane width in the study sites is associated with an increase in FI crash frequency. In contrast, increasing the lane width is associated with a decrease in the FI crash frequency in 66.3 percent of sites. Median Width. Median width is modeled with a factor shown in Equation 184 that includes terms for the width of the median (but not including the width of the inside shoulders), the proportion of the site length with a median barrier, and the offset to the barrier (if present). This factor was found to have a fixed parameter. As median width increases, crash frequency decreases if a median barrier is not present for the entire site. If a median barrier is present, crash frequency decreases as the offset to the median barrier increases. The median width factor in Equation 184 results in sites with median barrier present for only a portion of the site length having a combination of these relationships. Figure 44 shows the proposed median width AF using a thick, solid line as well as the equivalent AF from the HSM Supplement (AASHTO 2014) using a thin, dashed line. Computation of the AFs in Figure 44 assumes an inside shoulder width of 6 feet and, when present, a barrier is 2 feet wide and centered in the median. The trend of the proposed AF and the HSM Supplement AF are similar. The base conditions (conditions at which AF = 1.00) are different in the models, resulting in the proposed AFs having lower values than the HSM Supplement CMFs.

159 Figure 44. Estimated median width AF for FI crashes, RP model. Inside Shoulder Width. Inside shoulder width is modeled with a factor shown in Equation 182. The factor is constructed such that the base condition is an inside shoulder width of 6 feet. The inside shoulder width factor is normally distributed with a mean of â0.038 and a standard deviation of 0.027. Using the mean parameter of â0.038, inside shoulder widths less than 6 feet are associated with increased crashes and inside shoulder widths greater than 6 feet are associated with decreased crashes. These trends are consistent with the HSM Supplement and the FP model in Chapter 5. There is no further decrease in crashes as the inside shoulder width becomes greater than 12 feet. Because the inside shoulder width AF has a random parameter, proper interpretation of the AF must consider the range of values for the parameter, not just the mean value. Thus, the AF for inside shoulder width will vary from site to site to account for unobserved heterogeneity. Given the distributional parameters, 8.0 percent of the PTSU sites have a parameter greater than zero, meaning that increasing the inside shoulder width in the study sites is associated with an increase in FI crash frequency. In contrast, increasing the inside shoulder width is associated with a decrease in the FI crash frequency in 92.0 percent of sites. Outside Shoulder Width. Outside shoulder width is modeled with a factor shown in Equation 181. The factor is constructed such that the base condition is an outside shoulder width of 10 feet. The outside shoulder width factor is normally distributed with a mean of â0.028 and a standard deviation of 0.021. Using the mean parameter of â0.028, outside shoulder widths less than 10 feet are associated with increased crashes and inside shoulder widths greater than 10 feet are associated with decreased crashes. These trends are consistent with the HSM Supplement and the FP model in Chapter 5. There is no further decrease in crashes as the outside shoulder width becomes greater than 12 feet. Because the outside shoulder width AF has a random parameter, proper interpretation of the AF must consider the range of values for the parameter, not just the mean value. Thus, the AF for outside shoulder width will vary from site to site to account for unobserved heterogeneity. Given these distributional parameters, 9.1 percent of the PTSU sites have a parameter greater than zero, meaning that increasing the outside shoulder width in the study sites is associated with an increase in FI crash frequency. In contrast, increasing the outside shoulder width is associated with a decrease in the FI crash frequency in 90.9 percent of sites.

160 Inside Shoulder with Rumble Strips. The variable representing the proportion of the inside shoulder with rumble strips is normally distributed with a mean of â0.323 and a standard deviation of 0.421. The base condition of the AF is that zero percent of the inside shoulder has rumble strips. Using the mean parameter of â0.323, the AF is 0.724 when 100 percent of the inside shoulder has rumble strips. This trend is consistent with the HSM Supplement and the FP model in Chapter 5, although the AF value of 0.724 is smaller than the AFs from these other documents. Because this AF has a random parameter, proper interpretation of the AF must consider the range of values for the parameter, not just the mean value. Thus, the AF for the proportion of inside shoulders with rumble strips will vary from site to site to account for unobserved heterogeneity. Given these distribution parameters, 22.1 percent of the PTSU sites have a parameter greater than zero, meaning that increasing the proportion of the inside shoulder with rumble strips is associated with an increase in FI crash frequency. In contrast, increasing the proportion of the inside shoulder with rumble strips is associated with a decrease in the FI crash frequency on 77.9 percent of sites. Outside Shoulder with Rumble Strips. The variable for the proportion of the outside shoulder with rumble strips was found to have a fixed parameter. The base case of this AF is zero percent of the outside shoulder having rumble strips. When 100 percent of the outside shoulder has rumble strips, the AF is 0.989. This trend is consistent with the HSM Supplement and the FP model in Chapter 5, although the AF value of 0.989 is larger than the AFs from these other documents (i.e., outside rumble strip presence results in a greater change to FI crash frequency in those models than in this model). Figure 45 shows the proposed outside rumble strip AF using a thick, solid line as well as the equivalent AF from the HSM Supplement (AASHTO 2014) using a thin, dashed line. This variable is not statistically significant but was retained because the trend (presence of rumble strips decreases crashes) is well established in the literature. Figure 45. Estimated outside shoulder rumble strip AF for FI crashes, RP model. Ramp Entrance Speed-Change Lane Site Type. The ramp entrance speed-change lane indicator is normally distributed with a mean of â0.249 and a standard deviation of 0.849. Given these distributional parameters, 38.5 percent of the ramp entrance speed-change lane sites are associated with an increase in

161 FI crash frequency, when compared to freeway segments. In contrast, 61.5 percent of the ramp entrance speed-change lane sites are associated with a decrease in FI crash frequency. Ramp Exit Speed-Change Lane Site Type. The ramp exit speed-change lane indicator is normally distributed with a mean of 0.060 and a standard deviation of 0.409. Given these distributional parameters, 55.8 percent of the ramp exit speed-change lane sites are associated with an increase in FI crash frequency, when compared to freeway segments. In contrast, 44.2 percent of the ramp exit speed-change lane sites are associated with a decrease in FI crash frequency. PTSU-Related AFs. The following three AFs are addressed in this section: left-side PTSU presence AF, right-side PTSU presence AF, and proportion of day that PTSU is open AF. One independent variable is associated with each AF. The parameter associated with each variable was found to have a fixed parameter. The variable indicating the presence of PTSU on the left shoulder and the variable representing the proportion of the day that PTSU is open are not associated with a statistically significant parameter. In contrast, the variable indicating the presence of PTSU on the right shoulder is associated with a statistically significant parameter. Each of the three variables has a positive parameter, which indicates an increase in crash frequency associated with PTSU. For any site with PTSU, the effect of the PTSU presence is captured by two AFs so both AFs must be considered in combination. Specifically, the effect of left-side PTSU is the product of the left-side PTSU AF and the proportion of day that PTSU is open AF. Similarly, the effect of right- side PTSU is the product of the right-side PTSU AF and the proportion of day that PTSU is open AF. Table 63 shows the application of these AF pairs for several scenarios. Table 63. PTSU adjustment factors in random parameters model for FI crashes. PTSU Location Proportion of Day that PTSU is Open Adjustment Factor Calculation Adjustment Factor Left-Side 0.0 ð . ð . . 1.02 Left-Side 0.1 ð . ð . . 1.03 Left-Side 0.2 ð . ð . . 1.03 Left-Side 0.3 ð . ð . . 1.04 Left-Side 0.4 ð . ð . . 1.04 Right-Side 0.0 ð . ð . . 1.23 Right-Side 0.1 ð . ð . . 1.23 Right-Side 0.2 ð . ð . . 1.24 Right-Side 0.3 ð . ð . . 1.24 Right-Side 0.4 ð . ð . . 1.25 The AF values in Table 63 are smaller than the inferred change in FI crash frequency described in Table 31, which suggests an AF value of about 1.5 for typical PTSU operations. The AF values in Table 63 are also smaller than the AFs presented in Table 38 of Chapter 5, which range from about 0.9 to 2.0. The literature review on PTSU safety (summarized in Chapter 2) is inconclusive on this subject. State Indicator Variables. The RP model includes state indicators for four of the five states in the database. The indicators for Hawaii and Minnesota are not significant, and the indicators for Ohio and Virginia are significant. The parameters for Hawaii and Virginia are fixed, and the parameters for Ohio and Minnesota are random. An indicator variable was not used for Georgia, effectively making presence in Georgia the base condition. Indicators, whether significant or not, were retained because crash reporting practices are known to vary from state to state and accounting for this heterogeneity is generally beneficial for observing the effects of other variables of interest.

162 Latent Class Model A LC CPM was also estimated using the same data as used to estimate the RP CPM. The independent variables used in the log-linear FP CPM and the RP CPM were included in the LC CPM. The results are shown in Table 64. The marginal mean of the model estimated in Class 1 is 6.20 FI crashes annually, while the marginal mean of Class 2 is 7.27 FI crashes per year. The proportion of the sample in Class 1 is 47.9 percent, while 52.1 percent of the observations are included in Class 2. Table 64. Latent class model for FI crashes. Variable Class 1 Class 2 Parameter t-statistic Parameter t-statistic Intercept â5.491 â6.78* â3.035 â2.81* Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.259 6.78* 1.288 5.63* Lane width factor (flw) â0.141 â0.85 0.104 0.53 Median width factor (fmw) â0.017 â2.63* â0.001 â0.13 Inside shoulder width factor (fis) â0.024 â0.98 â0.038 â1.58 Outside shoulder width factor (fos) â0.077 â2.94* 0.031 1.23 Proportion of site with inside shoulder rumble strips â0.185 â0.74 0.061 0.33 Proportion of site with outside shoulder rumble strips 0.286 0.89 â0.557 â2.80* Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.057 â0.31 â0.314 â1.85 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) 0.019 0.10 0.041 0.26 Indicator for PTSU on left-side of site (1 if present; 0 otherwise) â0.028 â0.11 0.173 0.84 Indicator for PTSU on right-side of site (1 if present; 0 otherwise) 0.174 1.11 0.342 1.97* Proportion of day that PTSU is open 0.882 1.05 â0.271 â0.36 Indicator for freeway facility in Hawaii (1 if site is located in Hawaii; 0 otherwise) 1.419 2.90* â1.255 â2.71* Indicator for freeway facility in Minnesota (1 if site is located in Minnesota; 0 otherwise) 2.360 5.58* â1.992 â5.56* Indicator for freeway facility in Ohio (1 if site is located in Ohio; 0 otherwise) 1.790 4.70* 0.401 1.33 Indicator for freeway facility in Virginia (1 if site is located in Hawaii; 0 otherwise) 2.034 5.89* â0.080 â0.36 Marginal probabilities 0.479 0.073 0.521 0.073 Overdispersion parameter (ï¡) 0.288 n.c. 0.266 n.c. *Statistically significant at 95 percent confidence level Number of observations: 728 Log-likelihood at convergence: -1,756.3 n.c. â value not computed by software used to estimate LC model. The distribution of reported crash frequency across sites in the two latent classes is compared in Figure 46. In this figure, the distribution of each class (i.e., the number of sites assigned to each class) is represented by a different color (blue or green). The blue color represents Class 1 while the (lighter) green

163 color represents Class 2. When the two bars overlap, they appear as darker green. As shown, the two groups have a similar distribution of crash frequencies. Predicted fatal and injury crash frequency Figure 46. Fatal and injury class distributions. The logit probability of a site being assigned to Class 2 rather than Class 1 is 0.084 (standard error is 0.291). Using this value, the proportion of the same assigned to Class 2 was computed as 52.1 percent {= 100 Ã exp(0.084)/[1+exp(0.084)]}. This result is not statistically significant (p-value > 0.05), which suggests that the two classes do not differ. The results in Table 64 indicate that the natural logarithm of the directional annual average daily traffic is statistically significant in both classes. Few other independent variables are statistically significant in either class, with the exception of the state indicators. This suggests that the differences in crash reporting affect the class assignment in the model. Furthermore, several parameter estimates have different signs between the two classes. This characteristic suggests a notably different relationship exists between the associated independent variables and expected crash frequency in each class. For these reasons, particularly the lack of unique crash frequency distributions across the two classes, the LC model is not considered to provide additional insights into PTSU safety or freeway safety compared to the FP model in Chapter 5 or the RP model in this chapter. The CPM in Table 64 for Class 1 can be represented in equation form as follows: Equation 188 ð , ð¦ ð . ð¿ ð´ð´ð·ð1000 . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . D en si ty (f ra ct io n of s ite s w ith co rr es po nd in g cr as h fr eq ue nc y)

164 The CPM in Table 64 for Class 2 can be represented in equation form as follows: Equation 189 ð , ð¦ ð . ð¿ ð´ð´ð·ð1000 . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . ð . The second, third, and fourth terms in Equation 188 and Equation 189 compute the predicted average FI crash frequency for a site with base conditions for 1 year (i.e., it is the safety performance function portion of the CPM). The fifth term and beyond are adjustment factors that change the average FI crash frequency predicted with the safety performance function to account for site-specific conditions. The relationship between crash frequency and traffic demand for a freeway segment is illustrated in Figure 47. The analysis period is 1 year, and all adjustment factors are set at 1.00 (i.e., the segment has base conditions). The trend in Figure 47a corresponds to the model in Table 64, and the trends in Figure 47b correspond to the model in the HSM Supplement. The two lines in Figure 47a represent the crash frequency when computed with each of the two LC models. a. Proposed models. b. HSM models. Figure 47. Estimated freeway segment model for FI crashes, LC and HSM Supplement models. The following sections discuss each of the AFs in Equation 188 and Equation 189 and the associated data in Table 64. Lane Width. The lane width AF is modeled with a factor shown in Equation 183. The factor is constructed such that the base condition is an average lane width of 12 feet for full-time travel lanes, and the safety effect of average lane widths greater than 13 feet is constant. The lane width AF for each class is shown in Figure 48 using thick, solid trend lines. The equivalent AF from the HSM Supplement (AASHTO 2014) is shown using a thin, dashed line.

165 Figure 48. Estimated lane width AF for FI crashes. Median Width. Median width is modeled with a factor shown in Equation 184 that includes terms for the width of the median (but not including the width of the inside shoulders), the proportion of the site length with a median barrier, and the offset to the barrier (if present). The median width AF for each class is shown in Figure 49 using thick, solid trend lines. The equivalent AF from the HSM Supplement (AASHTO 2014) is shown using a thin, dashed line. a. No barrier. b. Continuous barrier centered in median. Figure 49. Estimated median width AF for FI crashes. Inside Shoulder Width. The inside shoulder width AF is modeled with a factor shown in Equation 182. The factor is constructed such that the base condition is an average inside shoulder width of 6 feet. The inside shoulder width AF for each class is shown in Figure 50 using thick, solid trend lines. The equivalent AF from the HSM Supplement (AASHTO 2014) is shown using a thin, dashed line.

166 Figure 50. Estimated inside shoulder width AF for FI crashes. Outside Shoulder Width. The outside shoulder width AF is modeled with a factor shown in Equation 181. The factor is constructed such that the base condition is an average outside shoulder width of 10 feet. The outside shoulder width AF for each class is shown in Figure 51 using thick, solid trend lines. The equivalent AF from the HSM Supplement (AASHTO 2014) is shown using a thin, dashed line. Figure 51. Estimated outside shoulder width AF for FI crashes. Inside Shoulder Rumble Strips. The inside shoulder rumble strip AF is a function of the percent of the inside shoulder on which rumble strips are present. The base condition of the AF is that zero percent of the inside shoulder has rumble strips. The inside shoulder rumble strip AF for each class is shown in Figure 52 using thick, solid trend lines. The equivalent AF from the HSM Supplement (AASHTO 2014) is shown using a thin, dashed line.

167 Figure 52. Estimated inside shoulder rumble strip AF for FI crashes. Outside Shoulder Rumble Strips. The outside shoulder rumble strip AF is a function of the percent of the outside shoulder on which rumble strips are present. The base condition of the AF is that zero percent of the outside shoulder has rumble strips. The outside shoulder rumble strip AF for each class is shown in Figure 53 using thick, solid trend lines. The equivalent AF from the HSM Supplement (AASHTO 2014) is shown using a thin, dashed line. Figure 53. Estimated outside shoulder rumble strip AF for FI crashes.

168 Property-Damage-Only Crash Frequency Prediction Models Fixed Parameters Model The variables initially explored for the FI FP model were explored for the development of a PDO FP model. The process described in the Statistical Modeling Principles section of this chapter was used to guide model development. The parameters for a set of variables were estimated for the PDO model. These parameters are shown in Table 65. Table 65. Fixed parameters model for PDO crashes. Variable Parameter Std. Error t-statistic p-value Intercept â2.067 0.480 â4.309 <0.001 Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.176 0.109 10.815 <0.001 Median width factor (fmw) â0.011 0.004 â2.653 0.008 Inside shoulder width factor (fis) â0.013 0.016 â0.774 0.439 Outside shoulder width factor (fos) â0.049 0.018 â2.685 0.007 Proportion of site with inside shoulder rumble strips â0.378 0.117 â3.225 0.001 Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.037 0.088 â0.428 0.669 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) 0.184 0.106 1.746 0.081 Indicator for PTSU on left-side of site (1 if present; 0 otherwise) 0.320 0.139 2.303 0.021 Indicator for PTSU on right-side of site (1 if present; 0 otherwise) 0.093 0.101 0.921 0.357 Proportion of day that PTSU is open 0.364 0.509 0.715 0.474 Proportion of segment length with turnouta â0.454 0.371 â1.220 0.223 Indicator for freeway facility in Hawaii (1 if site is located in Hawaii; 0 otherwise) â1.969 0.221 â8.912 <0.001 Indicator for freeway facility in Minnesota (1 if site is located in Minnesota; 0 otherwise) â0.132 0.183 â0.723 0.470 Indicator for freeway facility in Ohio (1 if site is located in Ohio; 0 otherwise) 0.569 0.162 3.525 <0.001 Indicator for freeway facility in Virginia (1 if site is located in Hawaii; 0 otherwise) 0.092 0.159 0.581 0.561 Overdispersion parameter (ï¡) 0.768 0.048 15.844 <0.001 Number of observations: 728 Log-likelihood at convergence: â2,411.5 Log-likelihood (intercept only): â4,843.4 McFadden Pseudo R2: 0.5021 a Turnouts provide emergency refuge spaces for disabled vehicles beyond the shoulder and are sometimes found on PTSU facilities. Variables not included in the model were either removed because their parameter was highly insignificant (p-values > 0.5) or they had a value that was inconsistent with existing safety literature. In some cases, variables with p-values > 0.5 were retained because either (1) their effect on crash frequency is established in the HSM or other literature or (2) they were of particular interest to this project (i.e., PTSU-related variables). Creation of this log-linear FP model was an intermediate step in the creation of

169 RP and LC models, so the research team erred on the side of retaining variables to enable their exploration in RP and LC models. Random Parameters Model The RP model for PDO crashes is shown in Table 66. The model was estimated using the same specification as the FP model. Table 66. Random parameters model for PDO crashes. Variable Parameter Std. Error t-statistic p-value Intercept â2.135 0.376 â5.674 <0.001 Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.081 0.82 13.118 <0.001 Standard deviation for traffic volume 0.168 0.06 27.038 <0.001 Median width factor (fmw) â0.008 0.003 â2.746 0.006 Standard deviation for median width factor 0.005 0.002 2.314 0.021 Inside shoulder width factor (fis) â0.010 0.011 â0.917 0.359 Outside shoulder width factor (fos) â0.0379 0.011 â3.332 <0.001 Proportion of site with inside shoulder rumble strips â0.279 0.075 â3.732 <0.001 Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.158 0.073 â2.173 0.030 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) 0.124 0.082 1.512 0.131 Indicator for PTSU on left-side of site (1 if present; 0 otherwise) 0.357 0.098 3.659 <0.001 Indicator for PTSU on right-side of site (1 if present; 0 otherwise) 0.148 0.070 2.129 0.033 Standard deviation for right hand PTSU 0.505 0.063 8.022 <0.001 Proportion of day that PTSU is open 0.364 0.509 0.715 0.474 Proportion of segment length with turnout â0.495 0.221 â2.242 0.025 Indicator for freeway facility in Hawaii (1 if site is located in Hawaii; 0 otherwise) â1.720 0.160 â10.762 <0.001 Indicator for freeway facility in Minnesota (1 if site is located in Minnesota; 0 otherwise) â0.119 0.129 â0.924 0.355 Indicator for freeway facility in Ohio (1 if site is located in Ohio; 0 otherwise) 0.689 0.129 5.612 <0.001 Indicator for freeway facility in Virginia (1 if site is located in Hawaii; 0 otherwise) 0.389 0.102 3.807 0.001 Inverse overdispersion parameter (1/ï¡) 3.470 0.273 12.717 <0.001 Number of observations: 728 Log-likelihood at convergence: â2,391.171 Log-likelihood (intercept only): â37,025.43 McFadden Pseudo R2: 0.9354 The variables with random parameters are traffic volume, median width factor, and indicator for presence of PTSU on the right-hand side. Variables with fixed parameters include: inside and outside shoulder width factors, proportion of site with inside shoulder rumble strips, indicators for ramp entrance

170 speed-change lanes and ramp exit speed-change lanes, indicator for presence of PTSU on the left-hand side, proportion of segment with a turnout, and the indicators for individual states. Among the fixed parameters shown in Table 66, the outside shoulder width factor; proportion of site with inside shoulder rumble strips; indicator for ramp entrance speed-change lane; indicator for PTSU facility on the left-hand side; proportion of segment with a turnout; and indicators for sites in Hawaii, Ohio, and Virginia are statistically significant. The fixed parameters in the RP model are similar to the same parameters in the FP model shown in Table 65; therefore, interpretation of these variables is consistent with the interpretation of these same variables in the FP model. The traffic volume parameter is normally distributed with a mean of 1.081 and a standard deviation of 0.168. This suggests that all sites experience a positive relationship between traffic volume and PDO crash frequency. The median width factor is normally distributed with a mean of â0.008 and a standard deviation of 0.005. Given these distributional parameters, 5.6 percent of the sites have a parameter greater than zero, meaning that increasing the median width factor in the study sites is associated with an increase in PDO crash frequency. In contrast, increasing the median width factor is associated with a decrease in the PDO crash frequency in 94.4 percent of sites. The right-hand PTSU lane indicator variable is normally distributed with a mean of 0.148 and a standard deviation of 0.505. Given these distributional parameters, 61.5 percent of the sites have a parameter greater than zero, meaning that the presence of a PTSU facility on the right-hand side is associated with an increase in PDO crash frequency. In contrast, the presence of a PTSU facility on the right-hand side is associated with a decrease in the PDO crash frequency in 39.5 percent of sites. Latent Class Model An LC CPM was attempted for the PDO data. However, the model did not converge when two or more classes were considered. This is likely due to the existence of only a single distribution (class) within the set of observed sites for PDO crash frequency. Model Comparisons This section compares the predictive power of the FP and RP models estimated for expected FI crash frequency and PDO crash frequency. The LC models were not considered due to their unstable nature in this dataset. For the FI crash frequency, the FP model has a pseudo R2 value of 0.2595, whereas the RP model has a pseudo R2 value of 0.8476. This suggests that the RP model has a much better statistical fit to the observed data than the FP model. The pseudo R2 value for the RP model is the sum of the pseudo R2 value for the mean parameter value and the pseudo R2 value for the standard deviation of the random parameter parameters. If the RP model were used for out-of-sample analysis by applying the mean values of the random parameters, the pseudo R2 value associated with the resulting fit of the model would only be the portion associated with the mean parameter valueâless than 0.8476 and potentially less than 0.2595. For PDO crash frequency, the FP model has a pseudo R2 value of 0.5021, whereas the RP model has a pseudo R2 value of 0.9354. This also suggests that the RP model has a much better statistical fit to the observed data than the FP model for PDO crashes. The pseudo R2 values do not provide a measure of the predictive power of the models for out-of-sample analysis, such as an HSM application. Figure 54 and Figure 55 show scatterplots of predicted versus observed crash frequency for FI and PDO crashes, respectively. In each figure, the predictions are developed using the FP and RP models described above. Each point in the scatterplot represents the summation of 10 sites. These sites were grouped by sorting the sites by predicted values and grouping sites together in increments of 10 from smallest to largest predictions. Since there are 728 sites, the last group contains only eight sites.

171 Figure 54. Scatterplot of predicted versus observed FI crash frequency. Figure 55. Scatterplot of predicted versus observed PDO crash frequency. Both Figure 54 and Figure 55 show that predictions generally fall along the solid line on the diagonal axis that represents a perfect model prediction. This suggests that the two models generally fit the data used to estimate the model well. Furthermore, the points produced by the RP model predictions (solid dots) are consistently closer to the diagonal than the points produced by the FP model predictions (unfilled dots). This suggests that the RP model generally produces predicted values that are closer to the observed values than are the values from the FP model. This trend is expected when the RP model is applied to the set of observations used to estimate it.

172 The results of the model comparisons indicate that RP models, when applied to the data used to estimate them, more accurately predict a siteâs average crash frequency than the FP model. However, when considering models for use in the HSM, there are other important considerations. Notably, practitioners use the CPMs in Part C of the HSM to analyze sites that are not in the original model estimation database. The predictive accuracy of the RP model should be assessed before it is used to evaluate sites that are not in the original estimation database. This type of assessment is described in the next section. Validation A common application of the CPMS in Part C of the HSM is to use them to analyze existing or planned facilities that were not included in the dataset used by researchers when they initially developed the CPMs. This section describes the findings from a validation effort was undertaken to replicate this type of application. Specifically, the validation effort determined how well a model created with a âtrainingâ dataset can predict crashes on a âtestâ dataset (i.e., a dataset that was not part of the model development dataset). Validation involved re-estimating the FP and RP models with some sites removed and then using the re-estimated models to analyze the removed sites (i.e., the âtestâ dataset). The data from Georgia (73 observations) were withheld from the project database to serve as the test dataset. The remaining data (665 observations) were used to re-estimate FI crash frequency models using FP and RP models. The resulting models were then applied to the Georgia data to determine the predictive power of the FP and RP models. The modeling principles used for the validation process are the same as those used to develop models described previously in this chapter. These principles are outlined in the section titled Statistical Modeling Principles of this chapter. Because this is a validation effort, indicator variables associated with each state were excluded from the model. Table 67 shows the FP model using the âtrainingâ data set. In this model, the traffic volume, roadside barrier factor, PTSU presence indicators, and the proportion of the day that the PTSU is open are positively correlated with the expected FI crash frequency. The median width factor, outside shoulder width factor, and speed-change lane indicators are negatively correlated with the expected FI crash frequency. The lane width, inside shoulder width, rumble strip presence, horizontal curvature, and interchange ramp volume and spacing variables were tested in the model; they are not included in the specification shown in Table 67 because the sign of the parameter was not consistent with existing research literature, or the parameter was not statistically significant based on the modeling principles described earlier in this chapter. When comparing the model shown in Table 67 to the full FP model shown in Table 61, several cross- sectional variables are no longer included in the model. This loss of variables is due to the removal of the Georgia data. These variables include the lane width factor, inside shoulder width factor, and the proportion of the site with inside and outside rumble strips. The sign and magnitude of the parameters of the variables that are included in both models are similar, with a few exceptions. The sign of the ramp exit speed-change lane indicator is positive in the full model but is negative in the sample without the Georgia data; however, the parameter is not statistically significant in either model. In addition, the magnitude of the parameter for the proportion of the day that the PTSU facility is open is larger in the training dataset shown in Table 67 relative to the same variable shown in the model that was estimated using the full dataset in Table 61.

173 Table 67. Fixed parameters model for FI crashes (no Georgia data). Variable Parameter Std. Error t-statistic p-value Intercept â3.679 0.397 â9.271 <0.001 Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.305 0.101 12.976 <0.001 Median width factor (fmw) â0.011 0.004 â2.777 0.006 Outside shoulder width factor (fos) â0.019 0.017 â1.113 0.266 Roadside barrier factor (frb) 0.158 0.079 2.011 0.044 Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.246 0.107 â2.299 0.022 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) â0.127 0.118 â1.074 0.283 Indicator for PTSU lane on left-side of site (1 if present; 0 otherwise) 0.152 0.114 1.331 0.183 Indicator for PTSU lane on right-side of site (1 if present; 0 otherwise) 0.144 0.090 1.603 0.109 Proportion of day that PTSU lane is open 0.832 0.562 1.480 0.139 Overdispersion parameter (ï¡) 0.593 0.049 12.127 <0.001 Number of observations: 665 Log-likelihood at convergence: â1,631.15 Log-likelihood (intercept only): â2,188.21 McFadden Pseudo R2: 0.2546 In addition to the FP model, an RP model was also re-estimated using the database without the Georgia data. This model is shown in Table 68. The variables included in the model are the same as those shown in the FP model. The natural logarithm of the directional traffic volume, median width factor, ramp entrance speed-change lane presence, and left- and right-side PTSU presence are random parameters. Each of the random parameters was normally distributed in the model. Given the distributional parameters for the directional traffic volume (mean = 1.301 and standard deviation = 0.076), all of the sites experience increased FI crash frequency as the traffic volume increases. Given the distributional parameters for the median width factor (mean = â0.012 and standard deviation = 0.008), 6.7 percent of the sites are associated a parameter greater than zero, meaning that increasing the median width in the study sites is associated with an increase in FI crash frequency. In contrast, increasing the median width is associated with a decrease in FI crash frequency in 93.3 percent of sites. The presence of a ramp entrance speed-change lane, given the mean of â0.297 and standard deviation of 0.634, is associated with higher FI crash frequencies on 32 percent of the ramp entrance speed-change lane sites in the sample and is associated with a lower FI crash frequency on 68 percent of the ramp entrance speed-change lane sites in the sample. The left-hand PTSU indicator is normally distributed with a mean of 0.116 and a standard deviation of 0.306. Given these distributional parameters, 64.8 percent of the sites have a parameter greater than zero, meaning that the presence of a PTSU facility on the left-hand side is associated with an increase in FI crash frequency. In contrast, the presence of a PTSU facility on the left-hand side is associated with a decrease in the FI crash frequency in 35.2 percent of sites.

174 Table 68. Random parameters model for FI crashes (no Georgia data). Variable Parameter Std. Error t-statistic p-value Intercept â3.691 0.372 â9.913 <0.001 Natural logarithm of directional annual average daily traffic (in 1,000s veh/day) 1.301 0.095 13.753 <0.001 Standard deviation of traffic volume 0.076 0.008 9.261 <0.001 Median width factor (fmw) â0.012 0.004 â3.105 0.002 Standard deviation of median width 0.008 0.003 2.652 0.008 Outside shoulder width factor (fos) â0.022 0.015 â1.471 0.141 Roadside barrier factor (frb) 0.137 0.074 1.836 0.066 Indicator for ramp entrance speed-change lane (1 if present; 0 otherwise) â0.297 0.108 â2.764 0.006 Standard deviation of ramp ent. speed-change lane 0.634 0.173 3.677 <0.001 Indicator for ramp exit speed-change lane (1 if present; 0 otherwise) â0.140 0.120 â1.166 0.244 Indicator for PTSU lane on left-side of site (1 if present; 0 otherwise) 0.116 0.104 1.114 0.265 Standard deviation of left-side PTSU 0.306 0.136 2.257 0.024 Indicator for PTSU lane on right-side of site (1 if present; 0 otherwise) 0.082 0.084 0.983 0.326 Standard deviation of right-side PTSU 0.549 0.083 6.574 <0.001 Proportion of day that PTSU lane is open 1.069 0.468 2.282 0.023 Inverse overdispersion parameter (1/ï¡) 2.382 0.222 10.750 <0.001 Number of observations: 665 Log-likelihood at convergence: â1,627.5 Log-likelihood (intercept only): â10,372.8 McFadden Pseudo R2: 0.8431 The right-hand PTSU indicator is normally distributed with a mean of 0.082 and a standard deviation of 0.549. Given these distributional parameters, 55.9 percent of the sites have a parameter greater zero, meaning that the presence of a PTSU facility on the right-hand side is associated with an increase in FI crash frequency. In contrast, the presence of a PTSU facility on the right-hand side is associated with a decrease in the FI crash frequency in 44.1 percent of sites. When comparing the models in Table 67 and Table 68, the parameters in the FP model are very similar to the mean parameters in the RP model. This finding is consistent with Tang et al. (2019), who found that applying the mean parameters from an RP model to an out-of-sample dataset of two-lane rural highways in Pennsylvania produced nearly equivalent predicted crash frequencies when compared to an FP model. A similar process is repeated here to âtestâ the FP and RP models on the hold-out sample from Georgia. However, the comparison in this section differs from the prior work of Tang et al. (2019) because the hold-out sample in this study consists of data from a state that was not used to estimate the model parameters, which is likely a more common application in the HSM context. It should be noted that using the mean parameters of an RP model for out-of-sample predictions, and not accounting for the variation in the random parameter, potentially introduces error (i.e., bias) in the predictions. Because the mean number of crashes is an exponential function of the random parameter, the variation of the random parameters enters into the conditional mean function. Thus, predictions from the model at the simple means of the parameters would likely be systematically biased relative to the mean evaluated at the simple mean plus the random variation. Accounting for the random variation in parameters would require a simulation across the parameter distributions to analyze out-of-sample sites. While such a simulation has been done in the context of RP logit models (Alogaili and Mannering 2020;

175 Islam et al. 2020), such a simulation has not yet been conducted in the application of negative binomial RP models in the safety field. With the aforementioned potential bias in mind, the validation process included the following steps: 1. Use the FP model shown in Table 67 to predict the expected number of FI crashes using the hold-out sample from Georgia. The FP model was first calibrated using a standard HSM calibration factor. 2. Use the mean parameters from the RP model shown in Table 68 to predict the expected number of FI crashes using the hold-out sample from Georgia. The RP model was first calibrated using a standard HSM calibration factor. 3. Compute the root-mean square error (RMSE) and mean bias error (MBE) using the predictions from steps #1 and #2. These two metrics compare the predicted average FI crash frequency from each model to the reported number of FI crashes in the Georgia sample. The RMSE and MBE are computed as follows: Equation 190 ðððð¸ â ð¦ , ð¦ ,ð Equation 191 ððµð¸ â ð¦ , ð¦ ,ð where yP,i = predicted number of crashes on site i; yR,i = reported number of crashes on site i, i = 1, â¦, n; n = number of observations. The RMSE and MBE metrics indicate how close the predicted values are to the reported number of FI crashes in the hold-out sample from Georgia. Larger values of RMSE indicate that the model predictions are further from the reported number of crashes. MBE values further from 0.0 (either positive or negative) are an indication of how much the model is over- or under-predicting the reported crash frequency in the Georgia hold-out sample. The training data set was calibrated to the Georgia data using the calibration procedure in the HSM. The calibration factor was computed as the ratio of the observed crashes from Georgia relative to the predicted number of crashes from the four-state model. This calibration factor was applied to the FP and RP models in order to minimize the MBE. The results of the validation effort are shown in Table 69. The first three rows show the measures of prediction accuracy for the model applied to the four states that were used to estimate the model. The fourth and fifth rows provide measures for this model applied to sites from Georgia. Note that for Georgia, only two types of predictions are provided: the FP model predictions and RP model predictions using mean parameters. A third option available to the four states is to estimate the predictions using a simulation-based approach that combines the estimated distribution of the random parameters. This is possible for the four states since the model estimation saves the individual parameter estimates for each observation in the model development database. However, making similar predictions for an out-of-sample dataset (Georgia in this case) is not possible without the development of a simulation-based software tool that mimics the model estimation process and applies it to the out-of-sample data. This type of software tool does not exist at this time, but the Model Comparison section of this chapter describes a vision for such software that could be developed as part of a future research effort.

176 Table 69. Model validation comparisons. Model Type Predictions for Calibration Factor RMSE MBE Fixed parameters HI, VA, MN, OH n.a. 6.579 0.226 Random parameters (mean) HI, VA, MN, OH n.a. 6.554 â0.235 Random parameters (predicted) HI, VA, MN, OH n.a. 4.923 â0.154 Fixed parameters (calibrated) GA 0.542 9.446 â3.56x10-16 Random parameters (calibrated mean) GA 0.572 9.616 1.16x10-15 n.a. = not applicable As expected, when comparing the four state (first and second rows) and Georgia (fourth and fifth rows) predictions in Table 69, RMSE and MBE for the FP and the mean parameter values from the RP models are nearly identical. The FP model appears to over-predict the FI crash frequency in Georgia, while the RP model under-predicts if the mean parameters are used for prediction. When the individual parameter estimates are considered to predict FI crash frequency based on the distributional properties of the random parameters (row 3 for the four state predictions), the RMSE and MBE both trend closer to zero, as expected. Empirical Fixed Parameters Model There are two basic approaches used for predictive model development using cross sectional data (Hauer 2015; Lehmann 1990). One approach has the objective of developing a model that explains the influence of each model variable on the predicted crash frequency. Elvik (2011) is one of the more recent authors to describe the steps involved in this âexplanatoryâ approach to model development. This approach can use model parameters to infer an SPF AF provided that the predicted AF value is logical in sign and magnitude, consistent with previous research findings, and associated with a reasonably small confidence interval (albeit not necessarily within the 95 percent confidence interval). This approach is essential to the development of predictive models with a relatively large number of AFsâlike in the CPMs included in HSM Part C. It is particularly helpful when a treatment, countermeasure, or design change is not suitable for formal CMF development with a before-after study. CPMs with a large number of AFs are desirable because they enable analysts to evaluate the individual elements of a design alternative. This approach is more complicated and time-consuming to apply than the second model development approach. The second model development approach has the objective of developing a model with the smallest possible confidence interval for the predicted crash frequency. Hauer (2015) is one of the more recent authors to describe the steps involved in this âempiricalâ approach to model development. Models developed using this approach are typically considered âparsimoniousâ and almost certainly have biased parameters due to omitted variable bias. As a result, using the model to infer how a change in one variableâs value will change the predicted crash frequency will likely lead to incorrect conclusions (Mannering and Bhat 2014). This approach, however, is essential to the development of predictive models for which AFs are not neededâsuch as the models used for network screening or for CMF development in an empirical-Bayes-based before-after study. For this project, the explanatory approach was used for the development of the predictive models documented in Chapters 5, 7, and 8. In most cases, the AFs documented in these chapters are consistent with AFs in Chapter 18 of the HSM Supplement (AASHTO 2014) and the safety literature. To explore the differences in the models produced by the two approaches, the FP models documented in Chapter 5 were re-estimated using the empirical approach. The re-estimated parameters are provided in Table 70 and Table 71 for the FI and PDO crash prediction models, respectively. Table 70 and Table 71 can be compared to Table 34 and Table 39, respectively, in Chapter 5.

177 Table 70. Predictive model re-estimation statistics for FI crashes based on empirical approach. Variable Description Parameter Std. Error t-statistic bhc,ast,fi Horizontal curvature â4.794 1.6709 â2.87 brs,ast,fi Shoulder rumble strip presence â0.5623 0.2009 â2.80 b0,fs,fi Freeway segment â4.816 0.3433 â14.03 b1,ast,fi Directional AADT volume 1.523 0.08421 18.09 b0,en,fi Ramp entrance speed-change lane â4.557 0.3825 â11.91 b2,en,fi Entrance ramp AADT volume â0.05723 0.02465 â2.32 b0,ex,fi Ramp exit speed-change lane â5.667 0.5121 â11.07 b3,ex,fi Number lanes adjacent to ramp exit speed- change lane 0.8115 0.3404 2.38 btout,fs,fi Turnout presence â0.8090 0.2844 â2.84 bptsuOpen,ast,fi PTSU lane presence and operation 1.412 0.1735 8.14 bnearOpen,ast,fi Transition zone presence up- or downstream of PTSU 1.330 0.5439 2.45 bohio,ast,fi Location in Ohio 0.7153 0.1180 6.06 bV264,ast,fi Location on I-264 in Virginia 0.5452 0.1203 4.53 bG400,ast,fi Location on GA 400 in Georgia â0.8502 0.1503 â5.66 bVA66,ast,fi Location on I-66 in Virginia 0.2775 0.09402 2.95 K Inverse dispersion parameter for all site types 9.797 0.807 12.1 Table 71. Predictive model re-estimation statistics for PDO crashes based on empirical approach. Variable Description Parameter Std. Error t-statistic bhc,ast,pdo Horizontal curvature â4.951 1.9579 â2.53 blen,en,pdo Ramp entrance speed-change lane length 0.1021 0.03623 2.82 blen,ex,pdo Ramp exit speed-change lane length 0.04507 0.02128 2.12 b0,fs,pdo Freeway segment â3.357 0.3116 â10.77 b1,ast,pdo Directional AADT volume 1.397 0.07849 17.80 b0,en,pdo Ramp entrance speed-change lane â3.463 0.3260 â10.62 b0,ex,pdo Ramp exit speed-change lane â3.657 0.4066 â8.99 b3,ex,pdo Number lanes adjacent to ramp exit speed- change lane 0.5749 0.2763 2.08 btout,fs,pdo Turnout presence â1.104 0.1231 â8.97 bptsuOpen,ast,pdo PTSU lane presence and operation 1.780 0.1391 12.80 bnearOpen,ast,pdo Transition zone presence up- or downstream of PTSU 1.757 0.4305 4.08 bohio,ast,pdo Location in Ohio 0.5824 0.07022 8.29 bHI01,ast,pdo Location on I-H1 in Hawaii â1.636 0.1401 â11.67 bG400,ast,pdo Location on GA 400 in Georgia â1.035 0.1168 â8.86 bGA85,ast,pdo Location on I-85 in Georgia 0.5423 0.2026 2.68 K Inverse dispersion parameter for all site types 9.456 0.618 15.3 Table 72 summarizes the differences in the two model building approaches in terms of the adjustment factors retained in the associated predictive model. The models presented in Chapter 5 are based on the explanatory approach. The models presented in Table 70 and Table 71 are based on the empirical approach.

178 Table 72. AFs included in models based on two modeling approaches. Adjustment Factor (AF) Availability of AF based on Explanatory Approach Availability of AF based on Empirical Approach FI Model PDO Model FI Modela PDO Modela Horizontal curve AF Yes Yes Yes+ Yes+ Lane width AF Yes Yes No No Inside shoulder width Yes Yes No No Inside shoulder rumble strip Yes No Yes- No Median width Yes Yes No No Median barrier Yes Yes No No Lane change Yes No No No Outside shoulder width Yes Yes No No Outside shoulder rumble strip Yes No Yes- No Outside clearance Yes Yes No No Outside barrier Yes Yes No No Ramp entrance Yes Yes No Yes Ramp exit Yes Yes No Yes Turnout presence Yes Yes Yes- Yes- PTSU operation Yes Yes Modified to exclude Wptsu effect Note: Shaded cells indicate AF that is eliminated from the explantory model by application of the empirical approach. a â+â indicates parameter value is less negative or more positive (i.e., shifted right in the number line). â-â indicates value is more negative or less positive. The FI model based on the explanatory approach has 15 AFs. In contrast, the FI model based on the empirical approach has only four AFs retained in the FI model. In addition, the PTSU operation AF in the empirical model form was algebraically modified to exclude the PTSU lane width variable (because its parameter was not statistically significant). A similar loss of AFs occurs for the PDO model developed using the empirical approach. The shaded cells in the last two columns of Table 72 indicate which AFs were eliminated when the model developed using the explanatory approach was re-estimated using the empirical approach.

Next: Chapter 8: Supplemental Safety Findings »

Safety Performance of Part-Time Shoulder Use on Freeways, Volume 2: Conduct of Research Report (2021)

Chapter: Chapter 7: Advanced Crash Prediction Models

Welcome to OpenBook!

Get Email Updates