National Academies Press: OpenBook

Guidelines for the Development and Application of Crash Modification Factors (2022)

Chapter: Appendix C - Guidelines for Developing Crash Modification Functions

« Previous: Appendix B - Procedure for Estimating the Combined Safety Effect of Two Treatments
Page 225
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 225
Page 226
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 226
Page 227
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 227
Page 228
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 228
Page 229
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 229
Page 230
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 230
Page 231
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 231
Page 232
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 232
Page 233
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 233
Page 234
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 234
Page 235
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 235
Page 236
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 236
Page 237
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 237
Page 238
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 238
Page 239
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 239
Page 240
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 240
Page 241
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 241
Page 242
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 242
Page 243
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 243
Page 244
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 244
Page 245
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 245
Page 246
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 246
Page 247
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 247
Page 248
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 248
Page 249
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 249
Page 250
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 250
Page 251
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 251
Page 252
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 252
Page 253
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 253
Page 254
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 254
Page 255
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 255
Page 256
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 256
Page 257
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 257
Page 258
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 258
Page 259
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 259
Page 260
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 260
Page 261
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 261
Page 262
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 262
Page 263
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 263
Page 264
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 264
Page 265
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 265
Page 266
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 266
Page 267
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 267
Page 268
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 268
Page 269
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 269
Page 270
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 270
Page 271
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 271
Page 272
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 272
Page 273
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 273
Page 274
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 274
Page 275
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 275
Page 276
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 276
Page 277
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 277
Page 278
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 278
Page 279
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 279
Page 280
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 280
Page 281
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 281
Page 282
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 282
Page 283
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 283
Page 284
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 284
Page 285
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 285
Page 286
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 286
Page 287
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 287
Page 288
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 288
Page 289
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 289
Page 290
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 290
Page 291
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 291
Page 292
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 292
Page 293
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 293
Page 294
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 294
Page 295
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 295
Page 296
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 296
Page 297
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 297
Page 298
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 298
Page 299
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 299
Page 300
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 300
Page 301
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 301
Page 302
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 302
Page 303
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 303
Page 304
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 304
Page 305
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 305
Page 306
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 306
Page 307
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 307
Page 308
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 308
Page 309
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 309
Page 310
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 310
Page 311
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 311
Page 312
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 312
Page 313
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 313
Page 314
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 314
Page 315
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 315
Page 316
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 316
Page 317
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 317
Page 318
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 318
Page 319
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 319
Page 320
Suggested Citation:"Appendix C - Guidelines for Developing Crash Modification Functions." National Academies of Sciences, Engineering, and Medicine. 2022. Guidelines for the Development and Application of Crash Modification Factors. Washington, DC: The National Academies Press. doi: 10.17226/26408.
×
Page 320

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

C-1   C-2 Introduction C-3 Chapter 1 Guidelines for Developing CMFunctions from Cross-Sectional Regression Models C-3 1.1 Bias Due to Aggregation, Averaging, or Incompleteness in Data C-5 1.2 Functional Form for Effects of Independent Variables C-9 1.3 Model Structure—Application of Hierarchical Modeling C-10 1.4 Tools for Assessing Model Fit and Choosing Among or Amalgamating Information from Competing Models C-14 1.5 Including Estimates from Previous Studies in the Estimation Methodology through Full Bayes Methods C-14 1.6 Addressing Multicollinearity Among Explanatory Variables C-15 1.7 Addressing Endogeneity C-16 1.8 Modeling Interactions, Especially for Estimating Effects of Combination Treatments C-18 1.9 Estimating Precision of CMFs from CMFunctions Derived C-18 1.10 Corroboration of Results C-21 1.11 Database Requirements C-23 Chapter 2 Guidelines for Developing CMFunctions from Models That Relate CMF Point Estimates to Application Circumstance C-23 2.1 Guidelines for Conducting Systematic Reviews C-26 2.2 Guidelines on Which Application Circumstances and Key Influential Factors to Collect Information on, Grouped by Treatment and Location Types C-26 2.3 Guidelines for Conducting Meta-Regression C-31 2.4 Fixed Versus Random Effects Models C-33 2.5 Selection of Estimation Method C-35 2.6 Guidelines for Creating Subgroups C-36 2.7 Estimating Precision of CMFs from CMFunctions Derived C-36 2.8 Improving Site and Study Level Estimates of CMFs C-37 Chapter 3 Example Applications C-37 Case Study 1 Simultaneous Application of Shoulder Rumble Strips and Centerline Rumble Strips on Rural Two-Lane Roads C-52 Case Study 2 Conversion of Conventional Intersections to Roundabouts C-63 Case Study 3 Safety Effects of Flattening a Horizontal Curve C-82 Case Study 4 Safety Effects of Left- and Right-Turn Lanes on Major Roads at Three-Legged Stop-Controlled Intersections C-94 References A P P E N D I X C Guidelines for Developing Crash Modification Functions

C-2 The primary objective of this appendix is to provide guidelines for researchers to estimate future crash modification factors (CMFs) and crash modification functions (CMFunctions) that iden- tify key influential site characteristics. The focus is on CMFunctions from observational studies. This appendix discusses the key issues in CMFunction development from observational data, and, where possible, tools for overcoming and mitigating these issues. This appendix also provides case studies that demonstrate these issues and the potential for overcoming them. Complementary guidelines are provided in Appendixes F and G, which make the case for researchers and highway safety agencies to engage where practical in randomized trials, or at least an approach that is close to achieving this desideratum, to overcome some of the key issues with observational studies that are illuminated in this appendix and in the related case studies. The guidelines provided herein for CMFunction development from observational data are intentionally not detailed enough to be seen as prescriptive, since that level of information is more effectively provided in sources dedicated to that purpose. Nevertheless, some level of pre- scriptive guidelines may be obtained by reading this document in conjunction with the four case studies presented here. Introduction

C-3   CMFs derived from cross-sectional panel data are based on a single period under the assump- tion that the ratio of average crash frequencies for sites with and without a feature is an estimate of the CMF for implementing that feature. Cross-sectional designs are particularly useful for estimating CMFs where there are insufficient instances for a preferred before-after design where a feature is implemented. For example, in designing a road a CMF may be desired for assessing the implications of a selecting a six-foot shoulder over a four-foot shoulder. But, although there are many road segments with a shoulder width of four feet and many with a shoulder width of six feet, there are likely very few projects in a jurisdiction where the shoulder is actually widened from four feet to six feet. So, a cross-sectional study may be considered in which a CMF is inferred from differences in crash experience of roads with four- and six-foot shoulders while control- ling for other differences between the two sets of roads. Known factors, such as traffic volume or geometric characteristics, can be controlled for in principle by estimating a multiple variable regression model and/or matching sites based on these variables. Alternatively, a CMFunction that relates the CMF to one or more of these factors can be explored. The basic issue with the cross-sectional design is that an unknown portion of the observed difference in crash experience can be due to factors that cannot be controlled for (e.g., if data are non-existent) or are unknown. For this reason, caution needs to be exercised in making infer- ences about CMFs based on cross-sectional designs. There are other issues that require attention in undertaking a cross-sectional regression study to derive a CMF or CMFunction. These issues, and how they may be resolved, are addressed in the following subsections. 1.1 Bias Due to Aggregation, Averaging, or Incompleteness in Data The successful development of a CMFunction from cross-sectional data is dependent on the quality of the data being used and how the data are sampled. One such data quality issue is bias that can be introduced due to three sources: data aggregation, averaging and incompleteness. Use of such biased data can lead to biased estimates of model coefficients. Bias Due to Aggregation or Averaging Regarding aggregation or averaging bias, Lord and Mannering (2010) provide a good discus- sion of various ways in which this bias may affect CMFunction development such that the model may miss the relationship between crashes and the explanatory variable. Such aggregation or averaging over time and space is inevitable in modeling crash occurrence since crashes are rela- tively rare events. This is commonly done in the meta-regression approach that is illustrated in Chapter 3, Case Study 3, and can produce reasonable results when the bias is not substantial. However, significant bias will occur if the factors that explain crash risk, for example traffic flow, C H A P T E R   1 Guidelines for Developing CMFunctions from Cross-Sectional Regression Models

C-4 Guidelines for the Development and Application of Crash Modification Factors change significantly over time. Lord and Mannering (2010) cite the effects of precipitation on crash risk as another good example. The intensity of precipitation may have a large effect on crash risk if measured hourly or minute-by-minute. However, precipitation data may only be avail- able monthly, and thus this true relationship may be masked by the aggregation of data over a longer period. The result of this unobserved heterogeneity may introduce errors in the model estimation. The alternative of not aggregating or averaging data when they are available in small sampling units may result in data that may show a preponderance of zero observed crashes, creating a low sample mean which, along with small sample sizes, can cause estimation problems. As Lord and Mannering (2010) point out, the desirable large sample properties of some parameter-estimation techniques (for example, maximum likelihood estimation) are not realized. With low sample- means, the distribution of crash counts will be skewed excessively towards zero, which can result in incorrectly estimated parameters and erroneous inferences. Additional references given by Lord and Mannering for further discussion of this problem include Maycock and Hall (1984), Piegorsch (1990), Fridstrøm et al. (1995), Maher and Summersgill (1996), Wood (2002) and Lord and Bonneson (2005). Lord (2006) showed that the dispersion parameter of the negative binomial model, commonly used in crash data modeling, can be incorrectly estimated when data charac- terized by a small sample size and low sample mean values are used. The incorrect estimation of the dispersion parameter also negatively affects the inferences associated with the parameters of the model. Elvik (2011b) discusses the low sample mean problem and states that there is no good solution to the problem and further adding “If a sample is very small and/or has a very low mean number of accidents, it is just not possible to fit an accident model to it. In some cases it is possible to test whether there is a low mean value problem by selecting a subsample with an even lower mean value than the full sample and estimate model coefficients for the subsample. This approach was applied by Christensen and Elvik (2007), who analyzed a sample with a mean number of accidents of 0.051. A subsample with an even lower mean of 0.028 accidents was analyzed, and the value of the over-dispersion parameter was found to be very stable across different model specifications, suggesting that there was no low mean value problem.” A preponderance of zeros can arise, for example, from the definition of a roadway segment site as a sampling unit. Typically, it is desirable to create homogeneous segments that hold constant values for all geometric and traffic variables available. In practice however this can result in very short segments. This may cause problems, as there could be a preponderance of segments with zero observed crashes. A related concern is the accuracy with which crash loca- tions are recorded. Crashes may be recorded in a different segment than where they occurred for short segments where there is a higher chance “that a feature of the road in one segment triggered a crash officially located on another segment” (Koorey 2009). Hauer and Bamfo (1997) recommend that “road sections shorter than 0.1 mi should either be reassembled into longer road sections or removed from the database used for modeling.” A possible solution is to create longer segments that are not homogeneous with respect to factors that may affect crash risk. This approach, taken by Bonneson et al. (2012), used longer non-homogenous segments but included the proportion of segment lengths for which roadway characteristics are present. For example, some of the safety performance functions (SPFs) included the proportion of segment lengths with a barrier present in the median, the proportion of segment lengths with rumble strips on the outside shoulder, and the proportion of segment lengths with rumble strips on the inside shoulder. This approach makes it more likely that all crashes attributable to severe features (or combination of such features, for example short, sharp curves) fall within the appropriate analysis unit (segment) but the disadvantage is that the safety effects of those features may be underestimated because of the inclusion of adjacent features within the same analysis unit. An additional potential bias due to aggregation may result from the selection of the dependent variable to be modeled. Crash data are classified in many ways, for example by location type,

Guidelines for Developing Crash Modification Functions C-5   crash type, severity etc. The factors affecting crash risk for different crash types may differ and the combining of different crash types may mask this relationship. For example, the presence of a shoulder rumble strip may decrease the risk of a single-vehicle run-off-road crash. However, if instead of single-vehicle run-off-road crashes the analyst were to model all crashes this relation- ship may not be seen and a variable indicating presence of rumble strips may not even be statis- tically significant. On the other hand, when exclusive crash types and/or severities are modeled separately, a potentially serious statistical problem results (Lord and Mannering 2010). As Lord and Mannering point out, there is a correlation among injury severities and collision types. For example, an increase in the number of crashes that are classified as incapacitating injury will also be associated with some change in the number of crashes that are classified by other injury types, which sets up a correlation among the various injury-outcome crash-frequency models. This necessitates a more complex model structure to account for the cross-model correlation. Bias Due to Incompleteness in Data Incompleteness in data may also bias model estimation. For example, property-damage-only crashes are less likely to be reported than crashes that result in a definite injury and this under- reporting can lead to bias, particularly if the relationship between such crashes and explanatory variables is different than it is for more severe crashes and the under-reporting is not considered (Kumara and Chin 2005, Ma 2009). Incomplete data may also include the lack of important explanatory variables. By not includ- ing important explanatory variables the model estimate may suffer from omitted variable bias. Omitted variable bias denotes bias that occurs because a variable not included in a model is statistically associated both with a variable which is included and the dependent variable in the model (Lord and Mannering 2010). The omission of this variable may then influence the value of the coefficient estimated for the included variable, so that it also captures part of the effect of the omitted variable. A simple example of omitted variable bias is provided by Jonsson (2005). In this research, predictive models for pedestrian and bicycle crashes were estimated both with and without pedestrian and bicycle volumes respectively. In the models without pedestrian or bicycle volumes the coefficients for motor vehicles were estimated to be much higher than for the models which included pedestrian or bicycle volumes. The impact of omitted variable bias in this case is an overestimation of the effect of vehicle volumes on pedestrian or bicycle crashes. Hauer (2015) discusses omitted variable bias and notes that when the values of model coefficients become stable and cease to change significantly when other variables are added to the model, confidence in the estimated values can grow. 1.2 Functional Form for Effects of Independent Variables The functional form sets the relationship between the explanatory and dependent variable. An incorrect functional form results in biased and inconsistent parameter estimates (Washington et al. 2011), so it makes sense that the selection of a functional form is critical to developing a reliable CMFunction. Most current multivariable regression models for crashes employ a nega- tive binomial error term or another error distribution from the Poisson family and a generalized linear form such as: crashes length AADT exp X Xn n( )( )= α ( )β β × + β Equation C11 . . .2 2 where AADT is traffic exposure X is a vector of explanatory variables

C-6 Guidelines for the Development and Application of Crash Modification Factors In this model form, the effect on safety of non-AADT explanatory variables is treated as an exponential function. While the shape of the exponential curve is somewhat flexible it does not permit relationships that have turning points (peaks or valleys). In some cases, this may lead to misleading conclusions about the effects of safety treatments (Hauer 2004). A good example of this is discussed in (Elvik 2011b). Figure C1 shows the relationship between the radius of horizontal curve and injury “accident” rate on national roads in Norway (Sakshaug 1998). Two functions have been fitted to the data. One of them is the best fitting monotonic function. The other function is a more complex function. It has variable curvature and a minimum point at a curve radius of about 500 m. This function suggests that the effect of increasing curve radius declines rapidly once the radius of a curve is more than about 150 m. Increasing the radius of a curve beyond about 500 m may be associated with an increase in accident rate. The model form above is logical in that zero or negative crash predictions are not possible. However, further guidelines on interaction terms or terms that are perhaps better represented in other than a multiplicative manner are lacking. There are few guidelines with respect to the choice between additive and multiplicative models and with respect to the inclusion of interaction terms. The choice of an exponential form is logical in view of the characteristics of the Poisson distribution. Some preliminary guidelines are dis- cussed by Hauer (2004), who argues that accident prediction models should contain both a multi- plicative and an additive portion such as: crashes scale parameter length multiplicative portion additive portion Equation C2 [ ]( ) ( ) ( )= × + The multiplicative part is intended to represent the effects of traffic volume and continuous hazards. The additive part is intended to represent the effects of point hazards, such as driveways. Some tools do exist to aid in the investigation of the appropriate functional form. Hauer and Bamfo (1997) introduced the integrate-differentiate (ID) and cumulative residual plot (CURE plot) methods. The following sections summarize these two methods. Figure C1. Relationship between horizontal curve radius (meters) and injury accident rate (Elvik 2011b, based on Sakshaug 1998).

Guidelines for Developing Crash Modification Functions C-7   Integrate-Differentiate Method Overview In the ID method, the integrate function is a cumulative function, F(x). The primary assump- tion of the ID method is if the empirical integral function, FE(X), is close to the integral func- tion, F1(x), then the linear transformation of FE(x) should be close to one of F1(x). One can list all possible integral functions and choose the one closest to FE(X). In their dataset, there is no visible relationship between crash frequency and the explanatory variable, so Hauer and Bamfo draw a bin graph to sum up the bin area, resulting in the empirical integral function. The width of the bin area is the difference between the nearest higher AADT and nearest lower AADT, divided by two, for each group. Then, we list all possible functions (e.g., power function, poly- nomial function, and Hoerl’s function) and compare their linear transform graphs with one of the empirical integral function, FE(x). Take Figure C2, for example. If the possible model, F1(x), is x1 1( ) α β + β+ , we should see a straight line with log 1( ) α β + as the intercept and β + 1 as the slope when we plot log[FE(X)]) against log(X) one. Obviously, this is true when the log (AADT) is larger than 6.5 (the dashed circle). The use of ID plots is illustrated in Chapter 3, Case Study 3. For a better understanding of this method, see Hauer and Bamfo (1997). CURE Plot Overview The ID method is both convenient and straightforward, but it does not work well when the F(X) cannot easily be transformed into a linear form. In this situation, an alternative is to apply the common crash prediction model and then focus on examining its residual term. The tool for examining residuals is called the CURE plot. A CURE plot is a graph of the cumulative residuals (observed minus predicted crashes) against a variable of interest sorted in ascending Figure C2. (a) FE(X), (b) F1(x), and (c) linear form for FE(X).

C-8 Guidelines for the Development and Application of Crash Modification Factors order. Hauer (2015, p. 150) also mentions that “the overall fit of the SPF is best judged by the CURE plot for fitted values.” A good CURE plot should not have vertical drops because these are indicative of inordinately large residuals (possible outliers). It should not have long increasing or decreasing runs because these correspond to regions of consistent over and under-estimation. It should meander around the horizontal axis in a manner consistent with a ”symmetric random walk.” Even in the absence of any bias, symmetric random walks have ”runs;” i.e., stretches in which several consecutive residuals tend to be positive (an up-run) or negative (a down-run). For a CURE plot that is consistent with a symmetric random walk one can find the limits beyond which the plot should only rarely go. The steps to constructing a CURE plot include: • Step 1. Sort sites in ascending order of the variable of interest, such that N is the number of sites, n is an integer between 1 and N and S(n) is the cumulative sum of residuals from 1 to n. • Step 2. For each site calculate the residuals, res, by subtracting (observed – predicted) • Step 3. For each site calculate the cumulative residuals, S(n) • Step 4. For each site calculate the squared residuals, res2 • Step 5. For each site calculate the cumulative squared residuals, σ2(n) • Step 6. Sum the cumulative squared residuals over all sites, σ2(N) • Step 7. For each site estimate the variance of the random walk as: n n N Equation C312 2 2 2 ( ) ( )( )σ = σ − σ σ     • Step 8. For each site calculate the 95% confidence limits as: Lower Limit 1.96 2= − σ Upper Limit 1.96 2= + σ • Step 9. Plot the cumulative residuals S(n) and the 95% confidence limits on the y-axis with the explanatory variable of interest on the x-axis. An example CURE plot is shown in Figure C3 for the variable major-road AADT. In this example the model is performing well only ventur- ing outside the boundary limits in a short range of lower AADT. While the CURE plot method works well for continuous variables, it is not applicable to variables with few categories; e.g., a database with speed limits of 45, 55, and 65 mph. For such -300.00 -200.00 -100.00 0.00 100.00 200.00 300.00 400.00 0 50000 100000 150000 200000 250000 Cu m ul at iv e Re si du al s Major-Road AADT cumulative residuals Plus 1.96 standard deviations Minus 1.96 standard deviations Poor Fit Figure C3. Example of CURE plot.

Guidelines for Developing Crash Modification Functions C-9   variables, a table can be produced showing the prediction bias for each level of the variable as in the example in Table C1. In this example there are three categories of speed limit. As shown by the values of observed divided by predicted, the crash prediction model is over predicting at lower speed limits and under predicting crashes at higher speed limits. There is, however, no statistical test to indicate if these biases are statistically significant. Hauer (2015) further discusses CURE plots and provides some guidelines on how to deter- mine if it is acceptable and estimate the bias in fit. The guidelines provided by Hauer are largely subjective. The most objective consideration is achieved by considering the 95% (2σ) confidence limits for the CURE plot. As Hauer notes, “inasmuch as the CURE plot is a sum of many inde- pendent random variables, it is approximately normally distributed. For a normal distribution, about 95% of the probability mass is within two standard deviations from the mean. Thus, the CURE plot for an ‘everywhere unbiased’ SPF should only rarely go beyond the 2σ limits.” Considerations in choosing the functional form in conjunction with CURE plots are also discussed in Hauer (2015). That discussion illustrates the use of “bump” functions to improve fit when the CURE plots indicate sudden vertical drops and illustrates the shapes of various basic mathematical functions. The importance of interaction effects between independent variables is also discussed. Hauer emphasizes that the source of uncertainty in a model is not only due to the uncertainty of the parameter estimates but also includes the uncertainty of the correct model form which is not usually considered. 1.3 Model Structure—Application of Hierarchical Modeling In a hierarchical modeling framework, the parameter estimate for an explanatory variable may vary by group, for example for sites from different states, or vary in a more complex way by being a function of other explanatory variables. As described in Lord (2010), hierarchical models can handle temporal, spatial and other correlations among groups of observations. Hierarchical models are used for analyzing data that are characterized by correlated responses within hier- archical clusters. Not considering the potential hierarchical structure of the data (the potential of a complex correlation structure) may lead to poorly estimated coefficients and associated standard errors, particularly when they are modeled using a traditional count-data modeling approaches (Skinner et al. 1989, Goldstein 1995). On the other hand, depending upon the study objectives, these models may not be warranted, even if correlations considered are not large (de Leeuw and Kreft 1995) and the modeling output may be difficult to interpret, especially by non-statisticians (Pietz 2003). There have been several applications of hierarchical models to crash data (Jones and Jørgensen 2003, Kim et al. 2007, Chen and Persaud 2014). Of these, Chen and Persaud’s seems most relevant in that they structured hierarchical models as a combination of base SPF and sub-level CMFunctions, where the former is an AADT-only base SPF while the sub-level models are functions that define various coefficients in the first-level model. The structure is as follows: Equation C4First Level: N A AADT AADTmv maj B min C( ) ( )= Sub-level CMFunctions: Equation C5A = exp LT exp RT0 1 2( ) ( )α × α × × α × 45 mph 55 mph 65 mph Observed/Predicted 0.85 1.05 1.15 Table C1. Example of categorical variable model fit results.

C-10 Guidelines for the Development and Application of Crash Modification Factors Equation C6B = exp local factor0 1( )β × β × Equation C7C = exp local factor0 1( )ϒ × ϒ × where A, B, C are parameter coefficients estimated for first-level AADT-only modeling α0, α1, α2, β0, β1, γ0, γ1 are covariate coefficients estimated in CMFunctions Nmv = number of multi-vehicle crashes AADTmaj = AADT on the major road AADTmin = AADT on the minor road LT = number of approaches with left-turn lanes RT = number of approaches with right-turn lanes “local-factor” is class for Toronto, area type for Edmonton This maintains homogeneity of the first-level modeling while addressing local specifics at a lower level. By this means, the multi-level nature of safety data is addressed. Because data heterogeneity is prevalent in the road safety context, it follows that hierarchical modeling is an appropriate choice for acquiring CMFunctions without sacrificing model conformity. A hierar- chical structure was applied in Chapter 3, Case Study 4 to estimate CMFunctions for presence of turn lanes at three-legged stop-controlled intersections. 1.4 Tools for Assessing Model Fit and Choosing Among or Amalgamating Information from Competing Models It is necessary to assess the fit of models from which information in CMFunctions is derived. There are many suggested ways of measuring the goodness of fit (GOF) of a model. This section documents some more common methods. Mean Prediction Bias (MPB) The MPB is the sum of predicted accident frequencies minus observed accident frequencies in the validation data set, divided by the number of validation data points. This statistic provides a measure of the magnitude and direction of the average model bias as compared to validation data. The smaller the average prediction bias, the better the model is at predicting observed data. The MPB can be positive or negative, and is given by: MPB Y Y n i i i n Equation C8 ˆ 1 ∑( ) = − = where n = validation data sample size Y = the fitted value Yi observation A positive MPB suggests that on average the model overpredicts the observed validation data. Conversely, a negative value suggests systematic underprediction. The magnitude of MPB pro- vides the magnitude of the average bias.

Guidelines for Developing Crash Modification Functions C-11   Mean Absolute Deviation (MAD) The MAD is the sum of the absolute value of predicted validation observations minus observed validation observations, divided by the number of validation observations. It differs from mean prediction bias in that positive and negative prediction errors will not cancel each other out. Unlike MPB, MAD can only be positive. MAD Y Y n i i i n Equation C9 ˆ 1 ∑( ) = − = where n = validation data sample size The MAD gives a measure of the average magnitude of variability of prediction. Smaller values are preferred to larger values. Mean Squared Prediction Error (MSPE) and Mean Squared Error (MSE) The MSPE is the sum of squared differences between observed and predicted crash frequen- cies, divided by sample size. MSPE is typically used to assess error associated with a validation or external data set. The MSE is the sum of squared differences between observed and predicted crash frequencies, divided by the sample size minus the number of model parameters. MSE is typically a measure of model error associated with the calibration or estimation data, and so degrees of freedom are lost (p) as a result of producing Yhat, the predicted response. Y Y n p i i i n Equation C10MSE ˆ 2 1 1 ∑( ) = − − = ∑( ) = − =MSPE ˆ 2 1 2 Equation C11 Y Y n i i i n n1 = estimation data sample size n2 = validation data sample size A comparison of MSPE and MSE reveals potential overfitting or underfitting of the models to the estimation data. An MSPE that is higher than MSE may indicate that the models may have been overfit to the estimation data, and that some of the observed relationships may have been spurious instead of real. This finding could also indicate that important variables were omitted from the model, or the model was misspecified. Finally, data inconsistencies could cause a rela- tively high value of MSPE. Values of MSPE and MSE that are similar in magnitude indicate that validation data fit the model like the estimation data and that deterministic and stochastic com- ponents are stable across the comparison being made. Typically, this is the desired result. Overdispersion Parameter, K The overdispersion parameter, K, in the negative binomial distribution is reported from the variance equation expressed as: Var m E m K E m Equation C122{ } { } { }= + ×

C-12 Guidelines for the Development and Application of Crash Modification Factors where Var{m} = the estimated variance of the mean accident rate E{m} = the estimated mean accident rate K = the estimated overdispersion parameter Variance overdispersion in a Poisson process can lead to a negative binomial dispersion of errors, particularly when the Poisson means are themselves approximately gamma distributed or possess gamma heterogeneity. The negative binomial distribution has been shown to adequately describe errors in motor vehicle crash models in many instances. Because the Poisson rate is overdispersed, the estimated variance term is larger than the same under a Poisson process. As overdispersion gets larger, so does the estimated variance, and consequently all the standard errors of estimates become inflated. As a result, all else being equal, a model with smaller over- dispersion (i.e., a smaller value of K) is preferred to a model with larger overdispersion. Pearson’s Product Moment Correlation Coefficients Between Observed and Predicted Crash Frequencies Pearson’s product moment correlation coefficient, usually denoted by r, is a measure of the linear association between the two variables Y1 and Y2 that have been measured on interval or ratio scales. A different correlation coefficient is needed when one or more variable is ordinal. Pearson’s product moment correlation coefficient is given as: r Y Y Y Y Y Y Y Y i i i i ∑ ∑ ∑ ( )( ) ( ) ( ) = − − − −  Equation C13 1 1 2 2 1 1 2 2 2 2 1 2 where Y = the mean of the Yi observations A model that predicts observed data perfectly will produce a straight-line plot between observed (Y1) and predicted values (Y2) and will result in a correlation coefficient of exactly 1. Conversely, a linear correlation coefficient of 0 suggests a complete lack of a linear association between observed and predicted variables. The expectation during model validation is a high correlation coefficient. A low coefficient suggests that the model is not performing well and that variables influential in the calibration data are not as influential in the validation data. Random sampling error, which is expected, will not reduce the correlation coefficient significantly. Fridstrom et al. (1995) introduce a modified R2 value. This GOF measurement subtracts the normal amount of random variation that would be expected even with a perfectly specified model. As a result, what is measured is how much of the systematic variation is explained by the model. If the value were to be greater than 1.0 this would indicate that the model is overfit to the data and some of the random variation to be expected is incorrectly being explained as the systematic variation. R y y y y y ii ıi ii ıi Equation C14 ˆ 2 2 2 2 ∑ ∑ ∑ ∑ ( ) ( ) = − − µ − − where: yi = the observed counts ŷı = the fitted values from the model y– = the sample average ıµ = yi − ŷı The paper presents several alternate goodness-of-fit measures to be considered.

Guidelines for Developing Crash Modification Functions C-13   CURE Plots CURE plots may also be used as a GOF measure to see if the predictions are biased for ranges of the independent variables; e.g., overpredicting crashes at high AADTs. CURE plots are dis- cussed in Section 1.2, including guidelines from Hauer (2015) in interpreting CURE plots. Akaike’s Information Criterion (AIC) The AIC penalizes for the addition of parameters and thus selects a model that fits well but has a minimum number of parameters. AIC is not typically used alone as a GOF measure but can be used to compare the relative fit of alternate models. The lower value of AIC is preferred. Equation C15AIC 2 log-likelihood 2K( )= − + where K is the number of estimated parameters included in the model (i.e., number of variables + the intercept) The log-likelihood of the model, given the data, is readily available in statistical output and reflects the overall fit of the model (smaller values indicate worse fit). Schwarz Bayesian Information Criterion (BIC) The BIC is complementary to AIC in that it also penalizes for the addition of parameters, and thus selects a model that fits well but has a minimum number of parameters. The BIC is not typically used alone as a GOF measure but can be used to compare the relative fit of alternate models. The lower value of BIC is preferred. Equation C16BIC 2 log-likelihood K log numobs( ) ( )= − + × where K is the number of estimated parameters included in the model (i.e., number of variables + the intercept). The log-likelihood of the model given the data, is readily available in statistical output, and reflects the overall fit of the model (smaller values indicate worse fit). Stepwise Regression Hauer (2004) addresses the issue of the sequence in which variables could be added in a model. He suggests that traffic volume be introduced first since it is the dominant factor in terms of influence on crashes. This can be followed by including variables in the order in which they contribute to the increase in log-likelihood/parameter. A common approach for identifying sig- nificant variables from those available is a stepwise regression approach. The stepwise approach could be based on either a forward selection or a backward elimination procedure. Forward selection involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until no additional variable significantly improves the model at a predetermined significance level (e.g., 80% or 90%). Backward elimination involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, eliminating the variable (if any) that does not significantly degrade the model, and repeating this process until only variables that significantly contribute to the model

C-14 Guidelines for the Development and Application of Crash Modification Factors remain. Consideration should also be given to the implied relationship with crashes and the magnitude of this relationship to avoid illogical results based on engineering knowledge when deciding which variables to include in a model. Another possible GOF measure was proposed by Liu and Cela (2008). Their approach involves the comparison of the empirical distribution of the observed counts to the negative binomial distribution with the mean estimated from the data. The probabilities from the two distributions are plotted. The extent of the overlap between the predicted and observed probabilities provides insight into the GOF of the model. The goal is to have nearly complete overlap between the predicted and the observed probabilities. How well a model fits the data should not just consider overall GOF measures but also how the model predictions fit the data over ranges of each explanatory variable. For this assessment the tools in Section 1.2 can be applied. 1.5 Including Estimates from Previous Studies in the Estimation Methodology Through Full Bayes Methods As stated in the How-To Guidebook for States Developing Jurisdiction-Specific SPFs (Srinivasan and Bauer 2013), Bayesian models integrate Bayes’ theorem with classical statistical models. Bayesian models allow the use of prior information about the parameters (i.e., regression coeffi- cients) in addition to the data to obtain the “posterior estimate” of the parameter values. Bayesian models have become more common because of the accessibility of Markov chain Monte Carlo (MCMC) methods. MCMC methods more easily allow the estimation of complex functional forms that cannot be linearized and so cannot estimate using traditional maximum likelihood methods that utilize generalized linear modeling. Bayesian methods are also more effective in modeling spatial correlation. Examples of the use of Bayesian estimation methods can be found in Lan and Srinivasan (2013), Guo et al. (2010), Ma et al. (2008), and El-Basyouny and Sayed (2009). The use of prior information is illustrated in Chapter 3, Case Study 4, which seeks to develop CMFs for turning lanes at three-legged stop-controlled intersections. 1.6 Addressing Multicollinearity Among Explanatory Variables Washington et al. (2011) define multicollinearity as arising when independent variables are correlated with each other or with omitted variables that are related to the dependent variable. This subsection is focused on the first case as the latter one was briefly discussed in Chapter 1.1 under bias due to incompleteness in data. The consequences include high standard errors of the estimated parameters for the variables included in the model. Estimated parameters are also unstable in that the direction of effect and/or magnitude can fluctuate to a large degree dependent on what other variables are included in the model. The interpretation of a parameter is difficult since in adjusting the value of the independent variable by one unit to assess its effect on the dependent variable, the value of the correlated independent variable will also change. As a result, counterintuitive parameter estimates are typical, and so not very useful for deriving CMFs, and adding or dropping another explanatory variable causes large fluctuations in the magnitude and direction of effect of the other independent variables (Washington et al. 2011). Ways to address multicollinearity include (Washington et al. 2011): • Use pair-wise correlation between variables to diagnose multicollinearity • Ridge regression to avoid inefficient parameter estimates—still produces biased estimates

Guidelines for Developing Crash Modification Functions C-15   • Removal of one of the variables if its effect is associative rather than causative, or if only one variable is typically available • Do nothing, with limitations documented • During study design ensure that all levels of correlated variables are collected and samples are stratified 1.7 Addressing Endogeneity Elvik (2011b) provides a succinct description of the concern of endogeneity in multivariable regression models. “The introduction of a safety treatment is often influenced by accident his- tory. Sites that are treated tend to have worse safety records than sites that are not treated. Even if treatment improves safety, the treated sites may continue to have a higher rate of accidents than untreated sites. A regression model may then find that treatment is associated with an increased number of accidents, when in fact the opposite is true.” Obviously, this becomes a critical issue if the SPF is used to estimate the CMF associated with a particular treatment. Kim and Washington (2006) show an example as part of a study that examines the safety effectiveness of left-turn lanes. Since left-turn lanes are likely to be implemented at intersections. with large numbers of left-turn related crashes, a prediction model that includes the presence of left-turn lanes as an independent variable is likely to suffer from endogeneity bias. Kim and Washington (2006) first estimated a model which predicted the number of angle crashes as a function of AADT and the number of driveways on the major road within 250 ft of the center of the intersection, with indicator variables to represent the presence/absence of left turn lane on the major road, and the presence/absence of lighting on the major road. This model seemed to indicate that angle crashes would increase due to the presence of a left-turn lane. Next, to account for the possible bias due to endogeneity, the authors simultaneously estimated two models. In one model, the dependent variable was crash frequency, and in the second model, the dependent variable was a binary variable indicating the presence/absence of a left-turn lane; the two models were estimated simultaneously using limited information maximum likeli- hood (LIML). The models estimated using LIML indicated that the number of angle crashes would decrease with the presence of a left-turn lane, suggesting that the original single equation approach did not adequately account for endogeneity bias. Another example of endogeneity is in modeling the frequency of ice-related accidents to estimate the effectiveness of ice-warning signs in reducing this frequency (Carson and Mannering 2001). In the case of the ice-warning sign indicator, ignoring the endogeneity may lead to the erroneous conclusion that ice warning signs increase the frequency of ice-related crashes because the signs are going to be associated with locations of high ice-crash frequencies (because these are the locations where the signs are most likely placed). Elvik (2011b) describes another example. Speed limits are well suited to illustrate this problem. The endogeneity of speed limits as a safety treatment is discussed at length by Taylor et al. (2002). Speed limits tend to be lower in urban areas where traffic is more complex and associated with a higher accident rate than in rural areas. Motorways (freeways), which tend to have the lowest accident rate of any type of road, have the highest speed limits. Actual mean driving speed is closely related to the posted speed limit. Thus, if the simple bivariate relationship between speed and accident rate is estimated, it will often be negative. Accounting for endogenous variables in traditional least-squares regression models is relatively straight forward (Washington et al. 2011). For count-data models, the modeling processes typically applied (more on this below) do not lend themselves to traditional endogenous-variable correction techniques (such as instrumental variables). Therefore, accounting for endogenous variables adds considerable complexity to the count-data modeling process (Kim and Washington 2006).

C-16 Guidelines for the Development and Application of Crash Modification Factors In yet another example, Austin and Carson (2002) investigated the safety impacts of warning treatments at highway-rail grade crossings. As the authors point out, these devices are placed in response to high accident frequencies resulting in an inappropriate cause-effect relationship between the dependent variable and presence of a warning device. To control for endogeneity the method of Instrumental variables was used. In this, logistic regression was used where each warning device was modeled as a discrete binary variable (present = 1; not present = 0) whereby: Equation C17P Yi e i 1 e i( ) = ( )λ + λ where P(Yi) is the probability that a given warning device is present at site i Equation C18i = X . . . X0 1 1 k kλ β + β + β + ε where the βs are coefficients to be estimated the Xs are explanatory variables ε is the random error These probability models were then used to estimate, for each site, the probability of each warning device type being present. These continuous variables of probability (between 0 and 1) were then used in conventional negative binomial regression models for crashes in lieu of variables indicating the actual presence or absence of the warning device. The estimate coefficients for each “probability of device present” variable indicate whether each type of device is associated with more or fewer crashes. However, since the variables are probability of device present and not the actual presence of a device the model coefficients cannot be used to determine a CMF. Another way to reduce the impact of endogeneity is through careful comparison site selec- tion. Propensity score matching, discussed in Section 1.11, can be applied in this regard. 1.8 Modeling Interactions, Especially for Estimating Effects of Combination Treatments Interaction occurs when the effect of an explanatory variable on crash frequency depends on the values of another explanatory variable. The following illustration from Srinivasan and Bauer (2013) first considers an SPF that includes segment length, AADT, lane width, and shoulder width, without interaction terms: Y L a b AADT c LW d SW[ ]( )= × + × + × + × Equation C19exp ln where LW is the lane width, SW is the shoulder width, and a, b, c, and d, are regression param- eters to be estimated as part of the modeling process. Equation C19 can also be written as follows: Y L e AADT e ea b cLW dSW Equation C20( )= × × × × If this SPF is used to determine the safety effect of changing from a lane width of LW1 to a lane width of LW2, then the CMF for changing a LW1-ft lane to a LW2-ft lane can be calculated as: CMF L a b AADT c LW d SW L a b AADT c LW d SW Equation C21 exp ln exp ln 2 1 [ ] [ ] ( ) ( )= × + × + × + × × + × + × + ×

Guidelines for Developing Crash Modification Functions C-17   Equation C21 simplifies to: CMF c LW c LW c LW LW Equation C22 exp exp exp2 1 2 1[ ]( )( ) ( )= × × = × − The CMF in this case is a function of just the parameter estimate c, LW2, and LW1. Alterna- tively, the following SPF that includes an interaction term between lane and shoulder width can be considered: Y L a b AADT c LW d SW e LW SW Equation C23exp ln[ ]( )= × + × + × + × + × × where a, b, c, d, and e, are regression parameters to be estimated as part of the modeling process. Based on Equation C23, the CMF of changing from LW1 to LW2 foot lanes will be the following: CMF L a b AADT c LW d SW e LW SW L a b AADT c LW d SW e LW SW Equation C24 exp ln exp ln 2 2 1 1 [ ] [ ] ( ) ( )= × + × + × + × + × × × + × + × + × + × × Equation C24 simplifies to: CMF c LW e LW SW c LW e LW SW LW LW c e SW[ ]( )( ) ( ) ( )= × + × × × + × × = − × + × Equation C25 exp exp exp2 2 1 1 2 1 The CMF is not only a function of the parameter estimates c and e and LW2 and LW1, but also shoulder width (SW) because of the interaction between lane width and shoulder width shown in the SPF in Equation C23. In this case, the CMF is a crash modification function, rather than a single crash modification factor. Interaction effects are not commonly found in SPFs, probably because there is no easy way to identify which interactions are important and how they should be included in a model, unless there is some theoretical reason for including certain interactions. In addition, interaction terms (e.g., LW × SW in the above example) can be correlated with their individual terms (LW and SW). Thus, the regression coefficients are likely to have a large standard error (as noted previously in Section 1.6). This illustrates that interaction terms are hard to quantify because of the collinearity they introduce but does not imply that interactions cannot be modeled. In fact, the following SPFs in Equation C26 and Equation C27 were estimated for fatal and injury and PDO crashes using data from rural two-lane roads in Washington, and they clearly indicate the modeled interaction between curve radius and length of horizontal curves (Bauer and Harwood 2014): N b b AADT b G b R I b R L IFI HC C HC Equation C26 exp ln ln 2 5730 1 10 1 2 3 4( )= + + + ×   × +         ×    N b b AADT b G b R I b R L IPDO HC C HC Equation C27 exp ln ln 2 5730 1 10 1 2 3 4( )= + + + ×   × +         ×    where NFI = fatal-and-injury crashes/mi/yr NPDO = PDO crashes/mi/yr AADT = veh/day

C-18 Guidelines for the Development and Application of Crash Modification Factors G = absolute value of percent grade; 0% for level tangents, ≥ 1% otherwise R = curve radius (ft); missing for tangents IHC = horizontal curve indicator: 1 for horizontal curves; 0 otherwise LC = horizontal curve length (mi); not applicable for tangents ln = natural logarithm function b0, . . . , b4 = regression coefficients Modeling interaction terms, however, can lead to overfitting of SPFs to the data when too many parameters are included in the regression model. Even though they are found to be sta- tistically significant, which is especially the case with large sample sizes, the inclusion of some parameters may not be of practical importance and might even lead to counterintuitive CMF inferences. Chapter 3, Case Study 4, illustrates how parameter estimates for presence of turn lanes at intersections can interact with the AADT variables. In this application the model structure was hierarchical, but this is not a requirement. The How-to Guidebook for States Developing Jurisdiction-Specific SPFs (Srinivasan and Bauer, 2013) has the following to say. One way to deal with overfitting is using cross-validation. When cross-validating, the data set is randomly divided into two parts, where one part is used for estimating the model parameters and the other part is used for validation. If the model validates poorly using the validation data, this is a sign that the model is overfit and it can be re-estimated with fewer variables. Examples of validation can be found in studies led by Simon Washington (Washington et al. 2001, Washington et al. 2005). Another approach is to use relative GOF mea- sures such as the AIC and BIC in selecting models; these measures penalize models with more estimated parameters than needed and help reduce the possibility of overfitting. 1.9 Estimating Precision of CMFs from CMFunctions Derived The estimation of the standard error of a CMF estimate derived from a cross-sectional model may be accomplished based on Bahar et al. (2007). The essence is as follows: 1. Estimate the CMF from the regression model using a value of the coefficient(s) of the variable of interest equal to the point estimate plus one standard deviation 2. Estimate the CMF from the regression model using a value of the coefficient(s) of the variable of interest equal to the point estimate minus one standard deviation 3. Estimate the standard error of the CMF as one half of the absolute value of the difference between Estimates 1 and 2. 1.10 Corroboration of Results Given the difficulties with deriving reliable CMFunctions from cross-sectional regression models, it is important to validate the results in some way. One way to increase confidence in the CMFunctions developed is through multiple studies. If the relationship between the explanatory and dependent variables is found to be consistent between multiple studies than confidence is increased. In comparing multiple studies there are several important considerations: 1. The dependent variable should be the same whether this is crash counts, crash rates, or other. To compare studies a transformation of the results may be necessary (Elvik 2013).

Guidelines for Developing Crash Modification Functions C-19   2. The definition of target crashes should also be same, or as close as possible. Crashes may be defined in many ways, including intersection-related versus non-intersection-related, crash types (e.g., angle, rear-end). 3. Explanatory variables should also be measured similarly and be similar, and ideally, all the same confounding factors should be accounted for in each study. 4. The method for controlling for confounding factors should be consistent. 5. Severity definitions and reporting practices between jurisdictions should be considered. 6. Consideration should be given to the time of study as vehicles, drivers and safety measures and regulations have changed over time. 7. Consider both published and unpublished research to avoid publication bias. Publication bias refers to the tendency for surprising results to not be published, thus biasing the knowledge base. Methods of detecting and dealing with publication bias are discussed in Section 2.1. Another consideration in interpreting the CMFunction results is intuitive reasoning and an assessment of causality. It is important to assess if the results meet expectations based on engineering knowledge. In this comparison the direction of effect and the magnitude of effect should be considered. In addition to published research roadway design manuals can be con- sulted for guidelines and warrants which may provide an indication of the expected relationship with safety. Elvik (2011b) describes several criteria for assessing the causality of a relationship between crashes and an explanatory variable. This includes nine criteria to assess whether the coefficients estimated in a regression model represent causal relationships or are non-causal statistical asso- ciations. As Elvik says, a cause is a necessary condition of a change: without the cause, the change would not have occurred. The main implication of this definition is that statements about causal relationships should lend support to counterfactual statements; i.e., statements about what would otherwise have happened (if the cause had not been present). Thus, to infer a causal relation- ship, it is always necessary to establish the counterfactual; i.e., describe what happened when the cause was absent. In observational studies, the task of establishing the counterfactual (saying what would otherwise have occurred) is tantamount to controlling for confounding factors. A confounding factor can be defined as any variable, other than the cause of principal interest in a study, that can either generate effects that may be mixed up with the effects of the causal vari- able; distort the effects attributed to the causal variable, for example modify their direction or strength; or hide the effects of the causal variable. The nine criteria are described below, verbatim from Elvik (2011b): 1. There should be a statistical association between a road safety treatment and variables mea- suring its effects (e.g., number of accidents, accident rate, number of injured road users). 2. A strong statistical association between treatment and effect is more likely to be causal than a weak statistical association 3. The statistical association between treatment and effect should be internally consistent; i.e., identical within the bounds of randomness in all subsets of data or across all similar studies. 4. The direction of causality between treatment and effect should be clear; i.e., it should be clear that the treatment is (one of) the cause(s) of the effect and not the other way around. 5. The association between treatment and effect should exist when potentially confounding factors are controlled for. 6. A mechanism generating the association between treatment and effect should be identified and described behaviorally or statistically. 7. It should be possible to account for the association between treatment and effect in theo- retical terms

C-20 Guidelines for the Development and Application of Crash Modification Factors 8. There should be a dose–response pattern in the relationship between treatment and effect (provided the treatment comes in different doses). A dose–response pattern need not be linear and may be influenced by moderating variables. 9. The effect should only be found within the target group of the treatment (provided a clearly defined target group can be identified). The first criterion states that there should be an effect (i.e., changes in variables measuring road safety) associated with the introduction of the safety treatment. If there is no change in the expected number of accidents (or other dependent variables), it is in principle still possible that the treatment has had an effect. In such a case, however, the effect of the safety treatment has been offset by the effects of other factors. This is one of the reasons why the strength of the statis- tical association between treatment and effect is often proposed as a criterion of causality (Crite- rion 2). If the effects of all other factors are stronger than the effects of the treatment, it may be unable to produce the changes that are the hallmark of causality. Consistency has for a long time been regarded as one of the distinguishing characteristics of a causal relationship (Criterion 3). This does not mean that variation in estimates of effect as such invalidates causality. The point is that such variation should be explicable in terms of either a dose-response pattern, the specificity of an effect to a target group, or a known causal mechanism—or combinations of these. Causal relationships can go in both directions, in the sense that A may be a cause of B and B in turn a cause of A. Homeostatic processes, like the regulation of body temperature can be thought of as bi-directional causation. As far as effects of road safety measures are concerned, however, we are searching for causal relationships that go in one direction only: from treatment to safety (Criterion 4). To determine causal direction means to ascertain that the treatment was the cause of changes in safety, and not the other way around. While it has been proposed (Wilde 1982) that a homeostatic-like process can influence the number of road accidents, this process does not involve bi-directional causality, but only refers to behavioral adaptations among road users induced by the introduction of a road safety measure, but not necessarily influencing subsequent use of the same measure. Controlling for confounding factors is important in establishing causality (Criterion 5). Con- founding refers to any alternative explanation for the observed changes in safety. Thus, to rule out confounding, it is necessary to remove (control for) the effects of any potentially confound- ing factor. The effects attributed to the safety treatment can only be regarded as causal if they can be identified statistically when confounding factors have been controlled for. No observational study can control perfectly for confounding, but in some cases the most important potentially confounding factors are known. A study can then be regarded as acceptable if it controls for these factors. Few, if any, road safety treatments affect safety directly. Such treatments influ- ence safety by modifying one or more risk factors that are associated with accidents. Thus, the mechanism through which safety treatments generate their effects is the modification of risk factors that affect accident rate or injury severity (Criterion 6). It is important to note, however, that many safety treatments influence not just the risk factors they are intended to influence, but also other risk factors. This influence generally takes the form of behavioral adaptation on the part of road users. A complete description of the causal mechanism of a road safety treatment should therefore identify and measure changes in both target risk factors and risk factors that represent behavioral adaptation to the treatment. One of the problems of road safety evaluation studies is that few findings can be ruled out on theoretical grounds. Yet, as argued above, one may at least treat some findings as less plausible than others by reference to theory, in particular laws of physics and perception (Criterion 7). A dose–response pattern between cause and effect is relevant for road safety evaluation stud- ies in two cases (Criterion 8). One case is when the safety treatment itself comes in different doses, such as different amounts of training, different levels of police enforcement, different

Guidelines for Developing Crash Modification Functions C-21   frequencies of technical inspections, etc. The other case is when the size of the effect of a safety treatment on its target risk factor (or factors) varies within the sample studied. One would then, all else equal, expect the largest effects on accidents or injuries to be associated with the largest changes in the target risk factors. Some safety treatments have clearly defined target groups (Criterion 9). When this is the case, one would expect to find an effect within the target group only, but not outside it. For a case illustration, see a study by Elvik (2000). The application of the criteria of causality to multivariate, statistical models requires an assessment of both the structure of a model and of each of the coefficients estimated by a model. As noted above, it is particularly important to assess the presence of potentially confounding factors, as any poten- tially confounding factor not controlled for seriously weakens the case for a causal interpretation by representing an unchallenged rival explanation of study findings (i.e., a confounder lends support to statements of the form: the effect may not have been caused by A, but by B, since the model did not statistically eliminate (control for) the effect of B. A discussion of generic con- founding factors in multivariate accident models follows. When comparing the results from cross-sectional models to other studies it is particularly important to compare to the results of before-after studies where possible. In a before-after context the safety of a site is compared before and after the treatment with all other aspects of the unit staying the same or at least experiencing changes; e.g., traffic volumes, that one can account for. In cross-sectional data, one cannot be confident that all differences between sites that affect crash risk are being accounted for. As Hauer (2010) cautions “there is the omnipresent suspicion that units have trait ‘A’ but not ‘B’ for a good reason and that these reasons are not fully known and difficult to account for in a regression model.” Because a properly conducted before-after study should provide more reliable results it is vital to verify results from cross-sectional studies using at least limited before-after data. 1.11 Database Requirements Sample Size Considerations The determination of required sample sizes for observational cross-sectional studies is dif- ficult. For multiple variable regression models, the number of locations required will depend on several factors including: • Average crash frequencies • The number of variables desired in the model • The level of statistical significance desired in the model • The amount of variation in each variable of interest between locations • The range in the size of the CMFunction to be inferred Determining if the sample size is adequate can only be done once the model output is avail- able. If the effects of interest are not statistically significant, then more data are required. For this reason, the determination of required sample size is an iterative process, although through experience and familiarity with specific databases, an educated guess may be possible. Matching on Secondary Variables Where possible, matching sites with and without the feature of interest—so they are similar in other variables that are not of interest but that may affect safety—will limit variability between sites due to such variables. This so-called “propinquity” approach has been proposed by Bonneson and Pratt (2008) and suggested in Appendix F as a step-down alternative to fully randomized trials. This becomes more challenging in CMFunction development since there is a need to know and separate

C-22 Guidelines for the Development and Application of Crash Modification Factors variables that impact the CMF which still need to be included, from those that may still affect safety but do not impact the CMF. The approach needs to be tested, developed, and refined, and if found promising, widely used to exploit available cross-sectional data. One method for selecting appropriate comparison sites is through the propensity score approach that is also suggested for pursuing this approach in Appendix F, which gives sub- stantial coverage to the application of this technique. The propensity score is a measure of the likelihood of selection for treatment and is estimated using a logistic regression model based on a set of covariates that impact the treatment/no treatment decision. If such matching were 100% accurate then this would replicate an experimental study design. Treatment and com- parison groups should be selected so that sites with a similar propensity score are matched. The propensity score method has been used in road safety studies, but only sparingly. See, for example, Aul and Davis (2006), Park and Saccomanno (2007), and Sasidharan and Donnell (2013). As Appendix F suggests, “the approach should perhaps be used more extensively so that an opinion can be formed about its usefulness in road safety research.” Ensuring Variability in the Variables of Interest It makes sense that to estimate the effect on safety of changes in a variable it is necessary to have data that exhibit a range of values for the CMF variable of interest as well as for the variables that impact the CMF for CMFunction development.

C-23   2.1 Guidelines for Conducting Systematic Reviews The guidelines provided for conducting systematic reviews are relevant to the case where CMFunctions are being developed using CMF estimates from multiple studies. Elvik (2005) provides an introductory guide to systematic reviews and meta-analysis. The guide is intended to inform readers what meta-analysis is and to provide guidelines. The following text is taken verbatim from this guide. A systematic review is a comprehensive and critical review with the goal of identifying the relevant studies and extracting information from each such that the knowledge from each can be combined. The three key elements of a systematic review identified are: 1. A systematic and extensive search for relevant studies is performed with the objective of including all studies that have been made, even unpublished studies. 2. Data are extracted from each study according to a standardized procedure, using a data extraction form. To ensure the accuracy of data extraction, two researchers independently extract data from the same studies. 3. Clear study inclusion criteria are formulated. An attempt is made to assess study quality and present the findings of the best studies. Procedures for study retrieval, data extraction and meta-analysis are reported in detail to ensure reproducibility of the review. The preparation for a meta-analysis consists of five activities: 1. Defining the topic of the meta-analysis as precisely as possible 2. Conducting a systematic search for relevant studies 3. Defining study inclusion criteria 4. Determining which data to extract from each study 5. Converting estimates of effect to a common scale The study inclusion criteria include: 1. The study should provide one or more numerical estimates of the effect of interest. 2. The statistical precision (standard error) of each estimate of effect should be stated or be possible to derive. 3. In addition to these criteria, various other restrictions on study inclusion are sometimes made, for example language restrictions, restrictions on study age or the exclusion of “bad” studies. The latter criterion is controversial; whenever studies rated as bad have been excluded from a meta-analysis, it is important to state clearly and explicitly how study quality was assessed and list all studies that were excluded, stating for each study the main reason or reasons for its exclusion. C H A P T E R   2 Guidelines for Developing CMFunctions from Models That Relate CMF Point Estimates to Application Circumstance

C-24 Guidelines for the Development and Application of Crash Modification Factors Elvik (2005) also recommends a sensitivity analysis be done to see if the outcome is dependent on the choices made in conducting the systematic review. The choices, whose effects should be examined in a sensitivity analysis, include: • Estimate of effect selected • Including, and respectively excluding, studies providing outlying estimates of effect • Adjusting estimates of effect for the possible presence of publication bias • Approach to the assessment of study quality, including the effect on the results of analysis of leaving out bad studies • Statistical weights, which is tantamount to the choice between a fixed effects and a random- effects model of analysis Fixed versus random effects models are discussed in Section 2.8 of this appendix. Guidelines have been developed for the presentation of meta-analyses, in particular a checklist developed by the Cochrane collaboration (Egger et al. 2001.). When presenting a meta-analysis, it is generally recommended to include: • A list of all studies included in the meta-analysis and a brief presentation of their main find- ings; for example, in the form of a forest plot. This is a plot showing the best estimate and the 95% confidence interval surrounding it for each study. • A list of studies that were judged to be relevant, but were not included in the meta-analysis, stating explicitly for each study why it was not included. • A concise description of how the literature search was performed. • A list of all variables coded for each study, as well as frequency distributions for these variables. • A detailed explanation of how study quality has been assessed, if this was done. • A funnel plot of estimates of effect and an analysis of the funnel plot with respect to modality, skewness, the presence of outliers and the presence of publication bias. • A presentation of the analysis of publication bias made and the possibility of adjusting for publication bias if it was detected. • A presentation of the findings of meta-analysis, for all versions of it that were performed (i.e., both fixed-effects analysis and random-effects analysis in case both models were applied). • A presentation of the main findings of the sensitivity analysis performed. These items are mandatory. In addition, any meta-analysis invites a discussion of the current state of knowl- edge and an In an unpublished manuscript, Elvik provides further guidelines focusing on CMFunctions and not meta-analyses in general. Classifying, Coding, and Selecting Studies Once relevant studies have been identified, they should be classified according to study design and how well they control for potentially confounding factors. This is an essential preparatory step for analysis because studies employing different designs do not control for the same poten- tially confounding factors and tend to produce different estimates of effect. A meta-regression of studies that have evaluated the safety effects of converting junctions to roundabouts (Elvik 2003) illustrates the differences in estimates of effect obtained by different study designs. In this study, for simple before-after studies, not controlling for any confounding factors, the coefficients of the meta-regression predicted a 46% reduction of injury accidents when three leg junctions are converted to roundabouts. In before-after studies controlling for regression-to-the-mean and changes in traffic volume, the corresponding estimate was a 29% reduction of injury accidents. In cross-sectional studies, comparing accident rates in round- about to other types of junctions, the estimated accident reduction was 21%.

Guidelines for Developing Crash Modification Functions C-25   Table C2 shows Elvik’s (unpublished) proposed classification of study designs that identi- fies three levels of study quality for each design. Four main types of study design are listed in Table C2. For each study design, studies are classified as high, medium, or low quality depending on how well they control for potentially confounding factors. Elvik (unpublished) recommends avoiding mixing studies using different designs, or not controlling for the same confounding factors, when developing a crash modification function. If there are enough studies to discard those of medium or low quality, doing so is recom- mended. If most studies are of medium or low quality, one should not try to develop an accident modification function, as there is a non-negligible risk that it will be biased and misleading. When the studies forming the basis for developing an accident modification have been selected, they should be coded. One should, as a minimum, code the following information for each study (each estimate of effect): • Country where the study was performed • Year of publication, or year(s) when data were collected • Estimate of effect (all estimates should be comparable; see below) • Accident or injury severity • Characteristics of the measure that may influence its effect • Characteristics of the context that may influence the effect of the measure Main category of study design Versions of study design by level of control for confounding factors Rating for study quality (within main group) Randomized controlled trials (experiments) Randomized controlled trial demonstrating pre-trial equivalence of groups and controlling for treatment implementation, attrition bias and unintended effects High Randomized controlled trial demonstrating or controlling for some but not all the factors listed above Medium Randomized controlled trials with evidence of systematic differences between treatment group and control group Low Before-and-after studies (observational) Before-and-after studies controlling for regression- to-the-mean, long-term trends and changes in traffic volume not induced by the measure High Before-and-after studies controlling for some, but not all the factors listed above Medium Simple before-and-after studies not controlling for any confounding factors Low Case-control studies Case-control studies controlling for self-selection of cases and/or controls and important known risk factors by means of multivariate analysis High Case-control studies controlling partly for self- selection bias and for some but not all known important potentially confounding factors Medium Simple case-control studies not controlling for potentially confounding factors or simple case-series Low Cross-sectional studies—multivariate models Multivariate models not known to be influenced by any of the following potential sources of error: small samples or low mean values; bias due to aggregation or averaging; outlying data points; inclusion of endogenous variables; co-linearity among High independent variables; omitted variable bias; wrong functional form; inappropriate model form; inappropriate dependent variable Multivariate models not known to be influenced by most of the potential sources of error listed above Medium Multivariate models known to be influenced by one or more of the potential sources of error listed above Low Table C2. Proposed classification of study designs.

C-26 Guidelines for the Development and Application of Crash Modification Factors As far as developing accident modification functions is concerned, country of origin and year of publication are potentially confounding variables. If estimates of effect are associated with these variables, the accident modification function may apply only to certain countries or a certain period. This is not desirable, as it is an objective of research to produce knowledge of general validity. On the other hand, it may be the case that the effect of a road safety measure changes over time. It is, however, unlikely that it is the mere passage of time that causes the effect of a road safety measure to change. It is more likely that effects change because, for example, the measure improves in quality (better road lighting; better seat belts) or because the context of its use changes (if stop signs are put up in all junctions, driver may respect them less than if they are used more restrictively). One should then try to include variables describing changes in standard or context and relate the effect of the measure to these variables rather than to time per se. Many estimators of effect are used in road safety evaluation studies, although the odds ratio is perhaps the most common. If the studies use different estimators of effect, these need to be converted to a common metric. Examples of how to convert different estimators of effect to a common metric can be found in the textbook in meta-analysis by Borenstein et al. (2009). The effects of many road safety measures vary with respect to accident-or injury severity. If the studies contain estimates of effect referring to different levels of accident- or injury severity, this needs to be coded. The key variables of interest in developing accident modification functions are characteristics of the measure that may influence its effects and characteristics of the context of use the measure that may influence its effects. Context of use refers, for example, to the type of traffic environ- ment in which the measure is used. 2.2 Guidelines on Which Application Circumstances and Key Influential Factors to Collect Information on, Grouped by Treatment and Location Types The following information should be identified for the studies involved: • Origin; • Publication year; • Characteristics of the road safety measure, such as indicators of its quality or standard or extent of use; and • Characteristics of the context for use of the road safety measure, such as the type of traffic environment it is used in. For further guidelines on this point, Appendix A, Step 4, provides an indication of site characteristics that are influential to the effect of the treatment. The independent variables should preferably be numerical or lend themselves to a meaningful numerical coding (Elvik, unpublished). 2.3 Guidelines for Conducting Meta-Regression Meta-regression is the fitting of a mathematical function to multiple estimates of effect to derive a CMFunction. In the case of CMFs, one is seeking to quantify the sources of systematic variation of effect for the expected CMF value. Often meta-regression is conducted as a weighted regression analysis, in which estimates of effect are weighted in inverse proportion to the vari- ance of the estimates. Meta-regression can be conducted on two types of data. The first is to use estimates of effect from different studies. Section 2.1 provides information on how to conduct a review of literature

Guidelines for Developing Crash Modification Functions C-27   to extract this information. The second type is to use the site-level data from one or more studies with estimates of effect determined for each site. In effect, this is no different from the first type if we think of each site as constituting a “study.” One could potentially also combine site-level and study-level estimates of effect if so desired. The challenge is in assessing the rigor with which the CMF estimate to be used for meta regression was obtained, as experience has shown study quality even for published ones can be quite variable. Even so, as Elvik (unpublished) notes in referring to study-level meta-regression, “the main limitations in developing accident modifica- tion functions are the small number of good evaluation studies and the often huge variation in estimates of effect.” In conducting a meta-regression analysis, the same concerns for developing cross-sectional regression models arise, including the choice of model form, estimation method, multi-collinearity, omitted variable bias, and others. 2.3.1 Exploratory Analysis and Identification of Outliers After assembling the data for each CMF estimate, including the variance of the estimate and key influential factors, the first step in conducting the meta-regression is an exploratory analysis. The exploratory analysis seeks to identify those variables that appear to impact the expected CMF value and the form of this relationship; e.g., linear or exponential. At this point the analyst will have multiple estimates of the CMF. Sometimes multiple CMFs will be available from the same data set within the same study. For example, NCHRP Report 641 (Torbic et al. 2009) has multiple estimates for the effectiveness of shoulder rumble strips because estimates for individual states are provided as well as for all states combined, and furthermore, results are also provided for both before-after and cross-sectional analyses. Where multiple esti- mates from the same study exist Elvik (email correspondence) indicates five options. 1. Include a single estimate from each study. 2. Combine estimates from a single study into an overall estimate; i.e., do a “micro” meta-analysis of a study containing several estimates to get a single estimate. 3. Model statistically the dependence between estimates from the same study and develop an overall estimate based on a regression model accounting for the fact that estimates are correlated. 4. Test whether estimates from a single study have a higher degree of correlation than esti- mates from different studies. If not, ignore the problem and treat estimates as independent. If within-study correlations are higher than between-study correlations, use one of Options 1 to 3 above. 5. Include all estimates, but down-weight them by 1/N in the meta-analysis (so that if a study has 10 estimates, each gets 1/10 the weight it would ordinarily do). According to Elvik (email correspondence), mostly this problem has been ignored, or that analysts have not even thought about it. Elvik favors a somewhat exploratory approach, trying several of the options listed above and see if it makes a difference. If no significant difference is seen, then he suggests ignoring the issue. Exploratory analysis methods range from construction of simple scatterplots of the CMF estimates versus those variables thought to potentially affect the expected CMF value to more elaborate techniques suggested by Tukey (1977) that are implemented in software packages such as R and S-Plus. Elvik (unpublished) recommends that the effects of country and publication year on estimates of effect should be examined. This can be done in at least two ways. The simplest is to run an ordinary least squares regression, using estimate of effect as dependent variable, assigning the

C-28 Guidelines for the Development and Application of Crash Modification Factors same weight to all estimates of effect, and include all independent variables in the regression model. The reason for not using the statistical weights assigned to each estimate of effect in meta- analysis is that the weights may confound the analysis. If, for example, recent studies have larger statistical weights than older studies, even small differences in estimates of effect may become statistically significant simply because the statistical weights differ. Another way of testing the effects of country and publication year is to run a set of bivariate meta-regressions, one for each country, and compare the coefficients to a meta-regression including all countries. During the exploratory analysis it may be seen that CMF estimates are very widely scattered and that many estimates also have very high variances. This can particularly be a problem when a meta-regression is being conducted within a study and each site is an observation. This would also be seen when site-level data from multiple studies are combined. This problem is particularly problematic for studies of roadway segments, which are often defined to be homogeneous in many geometric and other characteristics, the effect being that segment lengths are quite short. With these short lengths come small crash counts, a preponderance of 0s and high variances for CMF estimates. Where estimates are more widely dispersed in certain ranges of outcomes than others this is referred to as heteroscedasticity. Elvik (2015) outlines four options for dealing with heteroscedasticity but notes that none of the solutions is ideal and that one may have to accept that the residuals of a CMFunction will be heteroscedastic. 1. Transform variables to stabilize the variance. A natural logarithm transformation will often reduce heteroscedasticity. 2. Merge data points that are widely dispersed. The drawback is that aggregation bias may be introduced. 3. Restrict the CMFunction to a limited range of data by omitting the widely dispersed data points. The drawback here is bias due to omitting information. 4. Develop multiple CMFunctions if it is believed that the effects of a measure vary in a more complex way than can be described by a single function. Heteroscedasticity however does not by itself indicate a variation in effects and is therefore not reason enough to justify multiple functions. When this situation is encountered, a sensible approach is to combine individual sites with others with respect to characteristics of interest. For example, if the CMFs are to be modeled as a function of traffic volume, then ranges of traffic volume can be used to create groups. This can be done in an iterative fashion such that by combining sites more stable estimates of CMFs are obtained with a smaller variance and a more reliable CMFunction can be modeled. At the same time however the combining of sites should be minimized to avoid losing valuable information and creating aggregation bias as discussed in Section 1.1. Further discussion on combining sites and/or study estimates is provided in Section 2.6. The identification of outlier data points should be done with caution. While including out- liers can hamper the model development one should not remove data points simply because they do not meet our expectations or are decreasing the model quality. In fact, what may appear to be an outlier at extreme observations may in fact be providing important information about the expected CMF value. For example, a treatment may reduce crashes up until a high traffic volume and then suddenly increase crashes. Hauer (2015) suggests the use of CURE plots (see Section 1.2) to identify potential outliers. 2.3.2 Selection of Model Form The selection of an appropriate model form is critical to developing an accurate CMFunction. At the same time, it is a difficult task in meta-regression when the number of individual estimates is low and several model forms may appear appropriate and logical. Scatterplots of CMF estimates

Guidelines for Developing Crash Modification Functions C-29   versus potential influential factors can be useful for revealing appropriate models forms. This approach is undertaken in Chapter 3, Case Studies 1 and 2. Elvik (unpublished) suggests compar- ing a few different common functional forms (linear, inverse, logarithmic, power, exponential) to see which form best fits the data. Some tools for investigating model forms are provided in Section 1.2. Assessment of model fit is part of model form selection and is recommended through a variety of GOF measures as discussed in Section 1.4. Those same tools can be applied for meta-regression models. Elvik (2009) outlines a procedure for selecting the best form for a CMFunctions. The process recommended by Elvik is to first look for patterns in the findings of a single study, propose an initial form for the CMFunction, and then test the validity of this function by applying it on a new evaluation study, whose data were not used in developing the CMFunction. If the initial CMFunction is validated by the second study in that the same model form fits the data, then the data for all studies are combined and the CMFunction is then re-estimated. The search for patterns should have some basis in hypothesized relationships between crashes and the explanatory variables of interest. Elvik (2009) provides the following for developing CMFunctions: 1. Develop hypotheses regarding systematic variation in effects of a road safety measure (i.e., identify the independent variables of CMFunctions). 2. Perform an exploratory analysis of patterns for the observations from an evaluation study designed to propose a first version of a CMFunction. 3. Test the CMFunction on the observations from a new evaluation study that employs the same study design as the one used to develop the first version of the CMFunction. 4. Synthesize the results obtained for the two evaluation studies in the form of a revised CMFunction. 5. Test the revised CMFunction by means of the individual results from a third evaluation study. 6. If the third study results fit the CMFunction, then synthesize the results obtained from three evaluation studies in the form of a revised and possibly refined CMFunction. 7. Repeat steps 5 and 6 each time a new evaluation study is reported. The analyst also needs to decide whether to use a weighted or non-weighted regression. Weighted regression ensures that non-precise estimates do not bias the model. At the same time, though, those estimates may provide valuable information but are not precise since they represent extreme values such as very high AADT or very sharp curvature etc. In this case a weighted regression will give them less influence on the model and important information can be lost. Another consideration when grouping data is deemed necessary is what variable to model for the grouped data. For example, if site-level road segments are being modeled and due to short segment lengths with few crashes per site, the data requires some grouping of sites by traffic volume (e.g., <5,000; 5,001–10,000; 10,001–15,000; >15,000), then it is desirable to have some continuous variable to represent each group as opposed to modeling the data using categorical variables. One solution would be to use the weighted average volume within each group. The accompanying case studies that apply meta-regression illustrate how this issue could be addressed. 2.3.3 Validation Validating the model is as important as determining the appropriate CMFunction model form for the observed data. If a CMFunction does not validate well then this would indicate that the model has been overfit to the data and conclusions drawn from the model are not reliable.

C-30 Guidelines for the Development and Application of Crash Modification Factors Elvik (unpublished) lists several model criteria. 1. Overall GOF assessed in terms of a cumulative residuals plot 2. Unbiasedness of model predictions: The model should not, on the average predict a larger or smaller effect than the data points serving as its basis 3. Explanatory value of model in terms of share of systematic variation of estimates of effect explained 4. Normality in the distribution of standardized residuals 5. Heteroscedasticity in the standardized residuals 6. Autocorrelation of residuals One way to validate the initial CMFunction is to split the database into two sets; use one to develop the CMFunction and the other for validation. In practice, however, this is difficult since few estimates may be available to begin with. Where a CMFunction is being developed using site-level data for one or more studies, this approach may be more successful if the data set is large. As more studies/data become available, the existing CMFunction can be applied to the new data and its predictive performance assessed using the methods discussed in Section 1.4. Elvik (unpublished) proposes another validation method called a range-of-replications analysis useful for testing the validity of estimates of effect over time. Figure C4 shows a way of present- ing the results of such an analysis. The solid, almost horizontal lines show the upper and lower 95% confidence limits of the summary estimate of effect based on cumulative meta-analysis. The cumulative estimates are calculated after sorting the individual estimates by increasing year of study. Thus, at the value of 10 on the abscissa, these lines represent the cumulative evidence of the first 10 studies (i.e., all studies to the left of the tenth study), including the tenth study itself. The rightmost data point, at the value of 57, includes all 57 estimates of effect. The vertical lines show the 95% confidence intervals of each estimate of effect. These confi- dence intervals are obviously wider than the confidence interval for the summary estimate of 0.010 0.100 1.000 10.000 100.000 1000.000 0 O dd s r ati o (lo g sc al e) Replication number 10 20 30 40 50 60 The consistency of replications with accumulated evidence—bypass roads Cumulative lower Cumulative upper Single estimate lower Single estimate upper Figure C4. Illustration of range-of-replications analysis.

Guidelines for Developing Crash Modification Functions C-31   effect. If the confidence intervals for the individual estimates of effect overlap the confidence interval for the summary estimate of effect, the estimates are consistent, differing only in terms of statistical precision. Confidence intervals that are consistent with the summary estimate of effect in this sense are shown by solid lines in the Figure. Confidence intervals that do not entirely overlap the confidence interval for the summary estimate of effect are shown by means of dashed lines. It is seen that six of the 57 estimates of effect are inconsistent with the summary estimate of effect. These six estimates occurred at replications number 4, 8, 13, 15, 28, and 38. Thus, if any- thing, estimates of effect have become more consistent with the summary estimate over time. A prediction made after, say, replication number 25 would have been wrong only in 2 out of 32 cases, in which the new estimate of effect differed significantly from the cumulative evidence of prior estimates of effect. 2.3.4 Examples of Meta-Regression in Road Safety Case Studies 1 and 2 in Chapter 3 provide detailed illustration of the procedures and issues involved in using meta-regression for developing a CMFunction. Some specific examples of relevance to these case studies and to the use of meta-regression in road safety in general include the following references. • Elvik, R. (2003). Effects on road safety of converting intersections to roundabouts: review of evidence from non-U.S. studies. Transportation Research Record, Issue 1847. • Elvik, R., Fridstrom, L., Kaminska, J. and S. Meyer. (2013). Effects on accidents of changes in the use of studded tyres in major cities in Norway: A long-term investigation. Accident Analysis and Prevention, Volume 54. • Elvik, R. (2013). International transferability of accident modification functions for horizon- tal curves. Accident Analysis and Prevention, Volume 59. • Vanlaar, W., Mayhew, D., Marcoux, K., Wets, G., Brijs, T. and J. Shope. (2009). An evaluation of graduated driver licensing programs in North America using a meta-analytic approach. Accident Analysis and Prevention, Volume 41. 2.4 Fixed Versus Random Effects Models In a fixed effects model, it is assumed that the observed variation between CMF estimates is due entirely to the random variation of the samples used. Under this assumption, if each study had infinite data available, then all study estimates would converge to the same CMF value. In a random effects model, it is assumed that each study or site has its own true CMF value. Therefore, there is variation between studies and this extra source of variation is considered in estimating a mean effect from the observed data. The validity of the fixed effects assumption can be tested using the following Cochran’s test statistic, Q (Elvik et al. 2001): Q w y w y w i i i g i ii g ii g Equation C282 1 1 2 1 ∑ ∑ ∑ ( ) = − = = = where g is the number of CMF estimates yi is the CMF value of study i wi is the statistical weight equal to 1 divided by the variance of the CMF estimate for study i

C-32 Guidelines for the Development and Application of Crash Modification Factors The test statistic, Q, is chi-square distributed with g-1 degrees of freedom. If the test is statis- tically significant a random effects model should be used. Random effects models that include one or more explanatory variables may be referred to as mixed effects models. Elvik (unpublished) states that if there is only random variation in study findings, it does not make sense to develop an accident modification function, because such a function would then only give a spurious explanation of variation in estimates of effect. The I2 statistic in meta- analysis is a useful indicator of the relative contribution of systematic variation to the overall variation in estimates of effect (Borenstein et al. 2009), where I2 = (Q-df)/Q × 100% (if negative, set to zero). It is best stated as a percentage and should, as a guideline, have a value greater than 50% to proceed with developing a CMFunction. When conducting the test of homogeneity, it should be recognized that the test has low power when few studies are available. This means the possibility of a type II error (false nega- tive, or, deciding there is no heterogeneity between jurisdictions while there is) must always be considered. The amount of variation that can be attributed to heterogeneity can be measured in several ways. If there is random variation in study findings, then a CMFunction can be pursued. If all the variation is now assumed to be explained by parameters in the model and the random variation of the samples used, then a fixed effects model is calibrated. It is possible however that random effects still exist, for example, if the treatment differs somehow between jurisdictions from which the individual estimates come. A model considering random effects has an extra term representing the between-study variation. For example, consider the following linear model: CMF x u ej i ij j ji n∑= α + β + += Equation C291 where CMFj is the CMF estimate for study j x’s are the explanatory variables explaining the heterogeneity between studies for study j α and β’s are the estimated model parameters uj is the residual error term for the between-study variation, assumed to have a normal distri- bution with a mean of 0 and variance σu2 ej is the residual error term due to sampling variation, determined by the within study varia- tion and sample size and is assumed to be known for each study If weighted regression is applied, the weights for each study will differ between a fixed and random effects model. In a fixed effects model, each study is weighted by the inverse of the within study variance. In a random effects model, each study is weighted by the inverse of the sum of the within study variance plus the between studies variance. Case Study 2 in Chapter 3 illustrates the differences between a fixed and random effects model in developing a CMFunction for the application of shoulder and centerline rumble strips. The choice of a fixed effects or random effects model may impact the number of explanatory variables kept in a model if a cutoff level of statistical significance is used. Elvik (2001) applied both fixed and random effects models to studies of the impact of bypass roads and found only small differences in the parameter estimates. However, the standard errors of the estimated parameters were consistently greater in the random effects models than the fixed effects model. As further illustration, consider the data in Table C3 for the addition of shoulder rumble strips. The CMFs and their standard errors were obtained from the CMF Clearinghouse along with some limited information related to the application circumstance. Figure C5 plots the CMF values versus the maximum AADT for a road segment in the study on which the CMF is based. Fixed Effects Random effects Intercept 1.0888 (0.0097) 1.0065 (0.1018) Maxvol -0.0122 (0.0038) -0.0101 (0.0075) Table C3. Fixed and random effects on CMF values related to the addition of shoulder rumble strips.

Guidelines for Developing Crash Modification Functions C-33   Unfortunately, the average AADT was not provided. The plot indicates there may be a linear relationship between the CMF and volume variable with a lower CMF at higher values of AADT. A CMFunction has been fit to these data, weighted by the inverse of the variance of the CMF estimate, assuming a normal error distribution and with the model form: Equation C30CMF intercept b MaxVOLi i= + × where MaxVOLi is the maximum AADT for a road segment included in the data for the ith study Table C3 shows the parameter estimates and their standard errors for the model estimated using both fixed and random effects. It is apparent that the parameter estimates are similar, but the standard errors are much higher for the random effects model. 2.5 Selection of Estimation Method When conducting a meta-regression, the same considerations for cross-sectional models apply with respect to the estimation methods. Information from Sections 1.3, 1.5, and 1.8 may be considered pertinent for meta-regression as well. Linear models, generalized linear models, and full Bayes MCMC methods are all possible. Modeling interactions between explanatory variables and multi-level models may also be con- sidered, but there are no good examples of actual applications for infrastructure safety treatments are. A related example from the driver improvement area of both an application of multi-level models and full Bayes MCMC techniques is a study by Vanlaar et al. (2009) who applied meta- regression to identify the most effective components of graduated licensing programs (GDL) in reducing the relative fatality risk of young drivers. All states and provinces with a GDL pro- gram in place were included. A random effects meta-regression approach was adopted which acknowledges that there is variation in program effectiveness that is due to differences between GDL programs and that not all variation is due to random variability in the observed data. The outcome measure was standardized across US and Canadian jurisdictions by using the counts of fatalities per age cohort. Fatality rates were calculated for 16-, 17-, 18-, and 19-year-old drivers using reported fatalities and population information both pre- and post-implementation of the GDL. The same rates were computed for a comparison cohort of drivers in each jurisdiction that 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 20000 40000 60000 80000 100000 CMF vs. MaxVol CMF Figure C5. Plot of CMF vs. maximum annual average daily traffic.

0.81 0.20 Regression cross-section PA Divided by median NULL Rural 8267 18753 0.74 0.24 Regression cross-section PA Divided by median NULL Rural 8267 18753 0.76 0.09 Before/after using empirical Bayes or full Bayes PA Undivided Rural 948 9067 0.76 0.15 Regression cross-section PA Undivided Rural 910 10177 0.76 0.15 Regression cross-section PA Undivided Rural 910 10177 0.99 0.06 Before/after using empirical Bayes or full Bayes PA Divided by median NULL Urban 11254 59391 0.96 0.05 Regression cross-section PA Divided by median NULL Urban 11254 92757 0.95 0.06 Regression cross-section PA Divided by median NULL Urban 11254 92757 0.93 0.04 Regression cross-section PA All All CMF Standard Error Study Method State roadDivType numLanes areaType minTrafficVol maxTrafficVol 0.84 0.06 Simple before/after MN Divided Multilane Rural 2000 50000 1.10 0.15 Before/after using empirical Bayes or full Bayes MN Divided by median NULL Rural 4959 7459 1.16 0.09 Regression cross-section MN Divided by median NULL Rural 4959 31692 1.17 0.14 Regression cross-section MN Divided by median NULL Rural 4959 31692 1.14 0.08 Before/after using empirical Bayes or full Bayes MN Undivided Rural 782 10386 0.96 0.07 Regression cross-section MN Undivided Rural 180 10386 1.18 0.10 Regression cross-section MN Undivided Rural 180 10386 1.18 0.08 Before/after using empirical Bayes or full Bayes MN,MO,PA Divided by median NULL Rural 4959 20763 1.20 0.10 Regression cross-section MN,MO,PA Divided by median NULL Rural 4956 31692 1.28 0.11 Regression cross-section MN,MO,PA Divided by median NULL Rural 4956 31692 1.06 0.06 Before/after using empirical Bayes or full Bayes MN,MO,PA Undivided Rural 782 10386 0.86 0.10 Regression cross-section MN,MO,PA Undivided Rural 180 12776 0.94 0.13 Regression cross-section MN,MO,PA Undivided Rural 180 12776 1.08 0.04 Before/after using empirical Bayes or full Bayes MO Divided by median NULL Rural 11539 37112 1.22 0.09 Before/after using empirical Bayes or full Bayes MO Divided by median NULL Rural 5326 20763 1.08 0.07 Regression cross-section MO Divided by median NULL Rural 11539 37112 1.11 0.08 Regression cross-section MO Divided by median NULL Rural 11539 37112 1.28 0.14 Regression cross-section MO Divided by median NULL Rural 4956 20763 1.28 0.14 Regression cross-section MO Divided by median NULL Rural 4956 20763 1.40 0.18 Before/after using empirical Bayes or full Bayes MO Undivided Rural 861 6205 0.83 1.16 Regression cross-section MO Undivided Rural 861 12776 0.85 1.29 Regression cross-section MO Undivided Rural 861 12776 1.07 0.04 Before/after using empirical Bayes or full Bayes MO,PA Divided by median NULL Rural 6777 37112 1.01 0.07 Regression cross-section MO,PA Divided by median NULL Rural 6777 37112 1.07 0.08 Regression cross-section MO,PA Divided by median NULL Rural 6777 37112 0.85 0.20 Simple before/after ND Undivided NULL Rural 1.00 0.12 Before/after using empirical Bayes or full Bayes PA Divided by median NULL Rural 6777 24752 0.87 0.36 Before/after using empirical Bayes or full Bayes PA Divided by median NULL Rural 9653 18753 1.08 0.12 Regression cross-section PA Divided by median NULL Rural 6777 34406 1.06 0.16 Regression cross-section PA Divided by median NULL Rural 6777 24752 Table C4. CMFs for addition of shoulder rumble strips.

Guidelines for Developing Crash Modification Functions C-35   comprised drivers aged 25–54. The pre-period commenced 2 years prior to GDL and lasted for one year. The post-period commenced 1 year following GDL introduction and lasted for one year. For each cohort in each jurisdiction, the fatality rate after GDL was divided by the rate before GDL, and this ratio divided by the same measure for the comparison group defined the relative fatality risk. To identify the most successful elements of the GDL programs 23 variables describing the programs themselves were identified. These serve as the predictor variables in the meta-regression analysis. The model form used a log-link function to linearize the relation- ship between the dependent and independent variables and includes error terms for both the variation between jurisdictions and the sampling variation within jurisdictions. The parameter estimates from the meta-regression were then used as initial values in a fully Bayesian analysis using MCMC estimation to increase the robustness of the results. In the fully Bayesian models, more variables proved to be estimated with statistical significance. 2.6 Guidelines for Creating Subgroups In the section in Chapter 2.3 titled Exploratory Analysis and Identification of Outliers, it was suggested that it may be desirable to combine individual estimates of effect when individual estimates are highly variable such that no discernable pattern is seen in the data or when the variances of the estimates of effect are very large. Often these two characteristics will be seen together. To investigate appropriate subgroups, one can divide the data into bins based on values of the variable of interest and observe if the calculated CMF values begin to show more stability and precision. The cutoff points for creating groups can be done through trial and error. The number of groups should be kept to a minimum ensuring that information on the effect of a variable is not lost by combining the data. Case studies 1 and 2 in Chapter 3 both illustrate the grouping of individual sites when conducting meta-regression. Potential tools for creating subgroups are the classification and regression trees (CART) or random forrest methods. CART has been employed in the traffic safety field for several years (e.g., Karlaftis and Golias 2002, Chang and Wang 2006, Harb et al. 2009, and Yan et al. 2010). CART uses a tree-like algorithm that recursively partitions data into homogenous subgroups. A decision-based model selects the variables which can explain the most variability in the data. The Gini index (Han et al. 2006) is used and is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. The Gini index can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its min- imum (zero) when all cases in the node fall into a single target category. Also, the cutting point of the selected variable is defined as a node for obtaining the maximum reduction in variability. Identifying these nodes and ranking them can provide a decision rule for predicting a category or continuous data. In terms of defining groups for CMFunction development the dependent variable would be the CMF. Random forrest is an advanced form of tree-based models. The main concept underlying the random forest is to “bag” sampling data to grow multiple trees for estimation. “Bagging” stands for “bootstrap aggregation” (Bootstrap is a sampling method; Han et al. 2006)). Bagging applies the learning scheme to each one of artificially derived data sets, and the classifiers generated from them vote for the class to be predicted. For example, Bagging works as a method of increasing accuracy. Suppose that you would like to identify the contributing causes for a crash. Instead of asking one expert, you may choose to ask several. If a certain cause occurs more than any other, you may choose this as the final or best cause(s). That is, the final determination is made based

C-36 Guidelines for the Development and Application of Crash Modification Factors on a majority vote, where each doctor gets an equal vote. Now replace each expert by a classifier, and you have the basic idea behind bagging. After each tree “votes,” the forest chooses the classification with the most votes (or will take an average in the regression data). Random forest is more efficient than CART because it can build multiple trees simultaneously. Additional advantages include accuracy, convenience (there is no need for variable deletion), and the ability to deal with missing data. This study will use the “rpart” and “randomForest” packages in the R software (the R-project) for CART and random forest modeling, respectively. 2.7 Estimating Precision of CMFs from CMFunctions Derived The topic of precision of CMFs from CMFunctions seems to be an area for future research. In any case, it is arguable that this may not be needed for a CMFunction since this is already capturing variability in CMFs. 2.8 Improving Site and Study Level Estimates of CMFs Appendix F makes the case for the application of the “gold standard” (randomized trials) since using the best available methodologies for observational studies still produce CMF esti- mates that could be improved for CMFunction development. It also makes the case for the improvement of CMF estimation from observational studies with the use of quasi-experimental techniques, the propinquity study design suggested by Bonneson and Pratt (2008) and referred to in Section 1.11, causal modeling frameworks (Karwa et al. 2011), and tools such as propensity score matching to select comparison groups.

C-37   The following example applications consist of four case studies that use real observational data to illustrate the issues in CMFunction development. Two case studies demonstrate methods for developing CMFunctions from application circumstance data while the other two case studies demonstrate methods for developing CMFunctions from cross-sectional data. Following is the list of the case studies. Demonstration of Methods for Developing CMFunctions from Application Circum- stance Data • Case Study 1. Simultaneous application of shoulder rumble strips and centerline rumble strips on rural two-lane roads • Case Study 2. Conversion of Conventional Intersections to Roundabouts Demonstration of Methods for Developing CMFunctions from Cross-Sectional Data • Case Study 3. Safety Effects of Flattening a Horizontal Curve • Case Study 4. Safety Effects of Left- and Right-Turn Lanes on Major Roads at 3-Legged Stop- Controlled Intersections Case Study 1 Simultaneous Application of Shoulder Rumble Strips and Centerline Rumble Strips on Rural Two-Lane Roads Preamble This is one of a series of four case studies to demonstrate the proposed guidelines for devel- oping CMFunctions from either cross-sectional data or before-after data from actual safety treatment applications. The purposes of this specific case study are to: • Illustrate a heuristic methodology a future researcher may follow to derive a CMFunction from before-after application circumstance data on individual treatments sites using regres- sion analysis. • Illuminate some of the considerable issues and challenges that may be encountered in the process and, in so doing, address what it may take for future researchers to resolve them. The following list of potential issues identifies, in bold text, the ones that are highlighted in this case study: – Selection of candidate influencing variables (Section 1.4) – Accounting for interactions among candidate variables (Section 1.8) – Different CMFunctions for different subgroups of data (e.g., grouped by jurisdiction) (Section 2.6) – Bias due to aggregation, averaging, or incompleteness in data C H A P T E R   3 Example Applications

C-38 Guidelines for the Development and Application of Crash Modification Factors – Tools for exploring the appropriate functional form of the model (Section 1.2) – Combining results from multiple sites (or groups) from one study with single CMF estimates for entire studies – Improving site and study levels estimates of CMFs and variance to reduce variance of CMFunction estimates – Appropriate modeling methodology; e.g., generalized linear model (GLM) or full Bayes MCMC – Whether fixed or random effects models are appropriate (Section 2.4) – Guidelines for creating subgroups of sites using important variables (Section 2.6) – Assessing model fit – Selection of the most robust CMFunctions from among several candidates (Sections 1.4 and 2.3) – Estimating precision of CMFs from CMFunctions derived • Illustrate the considerable data requirements for estimating a robust CMFunction so that future research planning will endeavor to assemble appropriate data sets Given this scope, the purpose is not to derive robust CMFunctions, although that would have been a bonus objective worth achieving. In any case, as highlighted in the demonstration, ideally suited retrospective data sets for this purpose are hard to come by. Indeed, the best, and perhaps only defendable way forward in deriving CMFunctions for some treatments may well be the application of randomized trials, or propinquity designs that approach such trials, for which the case is made in Appendixes F and G. 1 Introduction The treatment in this case study is the simultaneous application of shoulder rumble strips and center line rumble strips on rural two-lane roads. The data set used originated with an empirical Bayes before-after study analysis conducted under the FHWA Development of Crash Modification Factors (DCMFs) evaluation study FHWA-HRT-15-048. All site types are rural two-lane roadways and data from Pennsylvania, Missouri and Kentucky are included. While the original analysis evaluated several crash types, including total crashes, the case study analysis focuses on run-off-road crashes, the predominant target crash, for which a CMFunction would be of most interest. In developing a CMFunction, only those variables that are shared amongst the three states and that are suspected to influence the value of the CMF are considered. These are: • AADT before treatment • The EB expected crashes per year before treatment • Average shoulder width Additional variables, such as horizontal curvature, that may influence the expected value of a CMF were unavailable for one or more of the states and were thus could not be considered. This highlights a major difficulty in developing a CMFunction from available retrospective data. Another key difficulty is that the three variables available for consideration are likely corre- lated. The illustrative exploration of the CMFunction development applied the meta-regression approach. 2 Data The database includes treatment sites from Kentucky, Missouri, and Pennsylvania. In Missouri, shoulder rumble strips are placed on the edgeline; while in Kentucky and Pennsylvania, they are installed both on the edgeline and placed further into the shoulder.

Guidelines for Developing Crash Modification Functions C-39   In the original analysis, to account for potential selection bias and regression-to-the-mean, an empirical Bayes (EB) before-after analysis was conducted, utilizing reference groups of untreated two-lane rural roads with similar characteristics to the treated sites. Crash types con- sidered exclude intersection-related and animal crashes. The data available included the EB expected number of after period crashes in the absence of treatment, its variance, and the observed after period crashes. With this information a CMF may be calculated for any individual site or group of sites. Table C5 provides summary statistics for the treatment sites. The aggregate CMF estimated and reported is 0.742 with a standard error of 0.041. Table C6 provides CMFs disaggregated by AADT the empirical Bayes estimate of crashes per mile-year without treatment. 3 Identified Issues There were several issues identified at the outset of the CMFunction development analysis that needed to be considered. Issue 1 How to Aggregate Sites into Groups Based on the Variables of Interest When conducting a meta-regression, the CMF can be calculated for each individual site and used as a single observation. However, this can be problematic when individual sites’ CMFs are based on very few crashes. In the present case, for road segments, this is frequently the case because road segments are defined when there is a change in any of several roadway related variables and so are typically short in length. As a result, the estimated CMFs for individual Variable Kentucky Missouri Pennsylvania Number of miles 164 460 218 Number of sites 27 1,594 464 Mile-years before 604 4,238 1,407 Mile-years after 764 1,286 512 Run-off-road crashes/mile/year before 0.62 0.30 0.16 Run-off-road crashes/mile/year after 0.20 0.21 0.18 AADT before Avg 6,101 Min 1,282 Max 20,433 Avg 5,290 Min 154 Max15,848 Avg 4,990 Min 782 Max 25,796 AADT after Avg 6,101 Min 1,282 Max 20,433 Avg 5,106 Min 155 Max13,522 Avg 4,657 Min 562 Max 26,118 Average paved shoulder width (ft) Avg 8.19 Min 2.00 Max 12.00 Avg 7.21 Min 0.00 Max12.00 Avg 4.60 Min 0.00 Max 10.00 Table C5. Data summary for treatment sites. Crash Type AADT Expected crashes/mi-year without treatment Range CMF (Standard Error) Range CMF (Standard Error) Run-off-road < 3,200 0.851 (0.089) < 0.500 0.840 (0.058) > 3,200 0.702 (0.045) > 0.500 0.621 (0.055) Table C6. Results disaggregated by ranges of AADT and expected crash frequency.

C-40 Guidelines for the Development and Application of Crash Modification Factors sites show an extremely large variation with many CMF estimates near 0 and a large value of the CMF for any site that happened to have a crash in the after period. For sites with no after period crashes, the CMF would be equal to 0 and it is not possible to estimate the standard error of the CMF because one of the terms in the variance equation would involve a division by 0. Attempt- ing to fit a model to such data to explain the CMF variation between sites would likely be futile. Figure C6 illustrates the problem by showing the frequency of segment-level CMFs. By far, most segments have a CMF equal to 0 and the range of CMF values includes estimates between 75 and 100. To resolve this challenge, sites could be aggregated into groups. The question then is how to do this aggregation such that the estimated CMFs for each group will be sufficiently precise and insights can be obtained into how the CMF can be expected to vary between sites. It is logical to define groups based on similar values of those variables that may influence the expected value of the CMF. Further complicating the issue is how to represent the variables used to define groups in the CMFunction. For categorical variables such as presence of median (with or without) this is not an issue as all sites within a group have the same value. For continuous variables, such as AADT, the resolution of this issue is less clear. One option is to consider the grouped data as a categorical variable. For example, if AADT were divided into 5 groups by range of AADT then the AADT variables would be categorical with 5 values. The negative impact of this is that the estimated CMF will suddenly jump between levels of the categorical variable which is not logical for what is truly a continuous variable. Another option is to determine the weighted average of the continuous variable within each group. For example, the average AADT weighted by segment length can be calculated for each group and that average value used in the model. Issue 2 Whether the State/Jurisdiction a Site Is Located Should be Used as One of the Grouping Variables and Then as an Indicator Variable Because the treatment sites come from three different states with potentially different appli- cation details, it should be considered if including state in the CMFunction is appropriate. It is possible that these differences in application circumstance cannot be represented by other variables. If this is the case, then including a state indicator variable in the model will estimate a unique CMFunction by state and should also more accurately estimate the parameters for other variables in the model. Issue 3 Whether to Use Weighted or Non-Weighted Regression In conducting the meta-regression, either a weighted or non-weighted regression may be applied. If weighted regression is pursued the weights for each site should be equal to the inverse Figure C6. Frequency of CMFs for segment-level data. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Histogram of CMF Values

Guidelines for Developing Crash Modification Functions C-41   of the variance of the CMF. This weighting will give CMF estimates that are based on more data and thus are more precise more influence on the model. On the other hand, information may be lost if the less precise CMFs are at the extreme ends of the values for the explanatory variables. For example, there may be few sites with a high AADT and therefore these estimates will be less precise and would receive less weight in the model. By weighting the model however important information about the impact of high AADTs on the CMF may be lost. Issue 4 Whether Fixed or Random Effects Models Are Appropriate Meta-regression can be run as a fixed or random effects model. The assumption of random effects assumes that there is some fundamental difference between groups of sites that is not represented in the model. It may be that sites can be grouped into clusters that are correlated. For example, this may be sites from the same geographic region or results from the same study if data from multiple studies are being combined. 4 Meta-Regression CMFunction Exploration Results for Run-Off-Road Crashes As noted earlier, this is a heuristic approach illustration. In this, the first step taken was to look at the impact of AADT on the CMF estimate, followed by exploration of the impacts of expected number of crashes and shoulder width. For each of these, Issue 1 was very much at play and needed to be resolved. 4.1 Impact of AADT Issue 1 was resolved by grouping sites by range of AADT in an iterative fashion such that each group had a reasonable number of sites and observed after period crashes. In this illustra- tion, deciding what was reasonable was based on subjective judgment for illustration purposes. In an actual CMF development effort, group sizes can be increased or reduced iteratively to determine the smallest group size that maintains statistical significance in the developed model for the key variable of interest. The final groupings by volume category, Volcat, are shown in Table C7. The AADT variable modeled was the average AADT weighted by the length of each site in the group. As seen in Figure C7, there appears to be a negative relationship between AADT and the CMF for run-off-road crashes. It also appears that this relationship can be approximated by a linear model for illustration purposes, although exploring other forms may well be justified and is recommended for future CMFunction development. Section 1.2 provides pertinent guidelines in this regard. A linear model was fit to the four data points in Figure C7 weighted by the inverse of the variance of the CMF using the model form: CMF intercept AADT 10,000 Equation C31a= + × The results show that the CMF is lower at higher AADTs, although the p-value for the AADT variable is of low statistical significance at 0.21. Volume Category AADT Range Weighted Average AADT Volcat1 0<AADT<5,000 2,905 Volcat2 5,000 ≤AADT<7,500 6,166 Volcat3 7,500≤AADT<10,000 8,582 Volcat4 10,000≤AADT 15,071 Table C7. Definition of volume categories.

C-42 Guidelines for the Development and Application of Crash Modication Factors In the next step the data were grouped by both state and AADT, resulting in 12 groups. Fig- ure C8 plots the CMFs versus AADT for these groupings. A linear model was again t, this time to the data grouped by state and AADT, but without a state indicator variable. e estimated is shown in Table C9. e parameter estimates of the model are reasonably consistent with the rst model but with much higher statistical signi- cance for the AADT variable, with p-value reduced from 0.2146 to 0.0762. Figure C8. Plot of CMF vs. AADT data grouped by state and AADT. CMF 0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 0 5000 10000 15000 20000 ROR CMF vs. AADT CMF 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 0 5000 10000 15000 20000 ROR CMF vs. AADT Figure C7. Plot of CMF vs. AADT data grouped by AADT. Parameter Estimate Standard Error p-value Intercept 0.8399 0.0844 0.0099 -0.1841 0.1926 0.2146 Table C8. ROR CMFunction grouped by AADT. Parameter Estimate Standard Error p-value Intercept 0.8741 0.1976 <0.0001 -0.2525 0.1277 0.0762 Table C9. ROR CMFunction grouped by state and AADT.

Guidelines for Developing Crash Modification Functions C-43   Next, a state indicator variable is included in the model for the data grouped by state and AADT. Thus, an additional term is estimated for each state that is added to the intercept. In this model, shown in Table C10, the p-value of the AADT parameter is further improved. That the differences between the state intercepts are not statistically significant in this model is moot regarding the conclusion about the effect of AADT on the CMF. To summarize the effect of AADT on the CMF estimate, this preliminary look at AADT as a variable for explaining the variance in CMFs between sites indicates that it is indeed a factor with a lower CMF at higher AADTs. Including state as a grouping variable and in the model led to the highest statistical significance for the AADT parameter. Although the differences in the intercept terms between states were not statistically significant, those individual terms can still be used in deriving more precise point estimates of CMFs in each state than would be obtained with a constant intercept term. 4.2 Impact of Expected Crash Frequency Before Treatment Next we look at the EB estimate of the expected crashes per mile-year before treatment, ROR/mi-yr, which is termed “crash rate” here. As it was for AADT, Issue 1 was resolved by group- ing sites by ranges of crash rate that were determined in an iterative fashion. These groups are as follows: The crash rate variable, Rateb, measured in crashes per mile-year, was calculated by summing the EB estimate of the expected annual crashes without treatment and dividing this by the sum of length for sites in each group. As seen in Figure C9, there appears to be a negative relationship between crash rate and the CMF for run-off-road crashes. With only five data points it is hard to say from visual inspection if the functional form of the model should be anything but linear, at least for the purposes of this illustration. A linear model was fit to the five data points weighted by the inverse of the variance of the CMF using the following model form: CMF intercept Rateb Equation C32a= + × Parameter Estimate Standard Error p-value Intercept 0.9556 0.1396 <0.0001 Kentucky -0.1625 0.1368 0.2347 Missouri -0.0507 0.1276 0.6911 Pennsylvania 0.0000 n/a n/a -0.2661 0.1120 0.0175 Table C10. ROR CMFunction grouped by state and AADT including state in model. Crash Rate Category Crash Rate Range Rateb Rate1 0<rateb<0.1 0.072 Rate2 0.1 ≤rateb<0.2 0.150 Rate3 0.2≤rateb<0.3 0.246 Rate4 0.3≤rateb<0.4 0.349 Rate5 0.4≤rateb 0.702 Table C11. Definition of crash rate categories.

C-44 Guidelines for the Development and Application of Crash Modification Factors The results in Table C12 confirm that the CMF is lower at sites with higher crashes per mile- year before treatment. The parameter estimate for crash rate is statistically significant with a p-value of 0.0324. In the next step the data were grouped by both state and crash rate, resulting in 15 groups. Figure C10 plots the CMFs versus crash rate for these groupings. Next we also categorize the data by state as well as crash rate before treatment and estimate a linear model based on the 15 data points, but without a state indicator variable. The results for the parameter estimates shown in Table C13 are quite different from those from the previous model without the state categorization. Although the direction of effect for crash rate is the same, the effect is now much stronger and the significance of this effect higher with a p-value of 0.01 compared to 0.03. CMF 0.000 0.200 0.400 0.600 0.800 1.000 1.200 0.000 0.200 0.400 0.600 0.800 ROR CMF vs. Rateb Figure C9. Plot of CMF vs. crash rate data grouped by Rateb. Table C12. ROR CMFunction grouped by Rateb. Parameter Estimate Standard Error p-value Intercept 1.1491 0.1060 0.0017 -0.1023 0.0271 0.0324 Figure C10. Plot of CMF vs. crash rate data grouped by state and Rateb. CMF 0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 0.000 0.200 0.400 0.600 0.800 1.000 ROR CMF vs. Rateb

Guidelines for Developing Crash Modification Functions C-45   The next model, shown in Table C14, includes an indicator variable for State and is consistent with the previous model without the State indicator variable. The intercept terms for Pennsyl- vania and Kentucky are statistically different from each other, while the intercept for Missouri is somewhere in the middle of the term for the other two. In summary, the preliminary look at crash rate as a variable for explaining the variance in CMFs between sites indicates that it is indeed a factor with a lower CMF at higher values of crash rate. Including state as a grouping variable changed the parameter estimates significantly and improved the precision of the parameter estimates. The results suggest that using a different intercept term for each State should be recommended. 4.3 Impact of Shoulder Width Next we look at average shoulder width. It could be expected that rumble strips are most effec- tive at sites with a narrow shoulder. As resolved for AADT and crash rate, Issue 1 was resolved by grouping sites by ranges of shoulder width that were determined in an iterative fashion. These groups are as follows: The shoulder width variable, SHLDWID, was determined by calculating a weighted average for each group weighting by segment length. As seen in Figure C11, there appears to be a nega- tive relationship between shoulder width and the CMF for run-off-road crashes. Again, with only four data points it is hard to say from visual inspection if the functional form of the model should be anything but linear, at least for the purposes of this illustration. This negative relationship is surprising in that we might have thought rumble strips are more effective when the shoulder width is smaller. This finding may be because wider shoulder widths are associated with roadways with higher volumes and crash rates, each of which, as seen earlier, Parameter Estimate Standard Error p-value Intercept 0.9508 0.0813 <0.0001 -0.4232 0.1659 0.0107 Table C13. ROR CMFunction grouped by state and Rateb. Parameter Estimate Standard Error p-value Intercept 1.0336 0.0917 <0.0001 Kentucky -0.2156 0.1022 0.0349 Missouri -0.1024 0.0927 0.2693 Pennsylvania 0.0000 n/a n/a -0.3472 0.1486 0.0195 Table C14. ROR CMFunction grouped by state and Rateb including state in model. Crash Rate Category Shoulder Width Range Shldwid Width1 0 ≤shldwid≤3 2.50 Width2 3<shldwid≤6 4.98 Width3 6<shldwid≤9 7.76 Width4 9<shldwid 10.50 Table C15. Definition of shoulder width categories.

C-46 Guidelines for the Development and Application of Crash Modification Factors appears to be associated with a lower CMF. It may also be that at narrow shoulder widths there is not enough room for drivers to recover even if alerted that they are leaving the travel lane. A linear model was fit to the four data points weighted by the inverse of the variance of the CMF with the following model form: CMF intercept SHLDWID Equation C33a= + × The results in Table C16 confirm that the CMF is lower at wider shoulder widths. The parameter estimate for SHLDWID shows a small effect that is not statistically significant, with a p-value of 0.2947. Next we also group the data by state, creating 12 data points instead of 4. Figure C12 plots these data. Using the data grouped by state and SHLDWID, the model shown in Table C17 is estimated. The estimated effect of shoulder width is consistent with the model without using state to categorize the data and the p-value of the SHLDWID parameter is improved to 0.1021. Including state as an indicator variable in the model makes little change to the effect of shoulder width and the intercept terms as shown in Table C18. Though not statistically different between the three states, the results suggest that the CMFs for Kentucky may lower than for the other two states for a given shoulder width. The same pattern was observed based on the models that related the CMFs to AADT and to crash rate before treatment. In summary, the preliminary look at shoulder width as a variable for explaining the variation in CMFs between sites indicates that it may be a factor, with a lower CMF at higher values of shoulder width. Including state as a grouping variable resulted in higher statistical significance for the shoulder width parameter but including state as an indicator variable in the model as well did not change the effect of shoulder width. The results did indicate that CMFs in one state could be lower than in the other two. CMF 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 0.00 2.00 4.00 6.00 8.00 10.00 12.00 ROR CMF vs. SHLDWID Figure C11. Plot of CMF vs. shoulder width grouped by SHLDWID Parameter Estimate Standard Error p-value Intercept 0.9783 0.2034 0.0406 -0.0383 0.0272 0.2947 Table C16. ROR CMFunction grouped by SHLDWID.

Guidelines for Developing Crash Modification Functions C-47   CMF ROR CMF vs. SHLDWID 0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600 1.800 0.00 2.00 4.00 6.00 8.00 10.00 12.00 Figure C12. Plot of CMF vs. shoulder width grouped by state and SHLDWID. Parameter Estimate Standard Error p-value Intercept 1.0294 0.1912 0.0003 -0.0457 0.0254 0.1021 Table C17. ROR CMFunction grouped by state and SHLDWID. Parameter Estimate Standard Error p-value Intercept 1.0370 0.1921 <0.0001 Kentucky -0.1195 0.1965 0.5430 Missouri -0.0115 0.1774 0.9485 Pennsylvania 0.0000 n/a n/a -0.0405 0.0241 0.0931 Table C18. ROR CMFunction grouped by state and SHLDWID including state in model. 4.4 Consideration of Combinations of AADT, Crash Rate, and Shoulder Width Simultaneously Because there appears to be a relationship with all three identified variables and the expected CMF, the next step in developing a CMFunction is to consider them simultaneously. This poses two difficulties. Firstly, by grouping the data by three variables, and possibly by State, the number of sites within each group becomes smaller and the estimates of the CMF for each group become less precise. In fact, if State is not used to group the data, 17 of the 80 potential categories have 0 crashes after treatment and most categories have fewer than 10 crashes after treatment. The second difficulty is that AADT is correlated with crash rate and shoulder width is also likely cor- related with AADT and crash rate. The inclusion of correlated variables in the same regression model can lead to counterintuitive results and in any case, imprecise CMF estimates. In keeping with the earlier models, a linear model is fit weighted by the inverse of the variance of the CMF with the data now grouped by all three variables but not state. There were 17 model categories with 0 crashes in the after period that were not included in the estimation because a variance of the CMF cannot be estimated. The estimated model is shown in Table C19. The p-values for AADT and shoulder width are high, particularly for AADT, which now shows an increasing CMF with AADT. It appears that grouping by all three variables is not productive and is likely failing at least in part because the

C-48 Guidelines for the Development and Application of Crash Modification Factors number of sites and crashes within groups is low. The correlation between the three variables may also contribute to this difficulty. Next we try grouping only by AADT and crash rate. This results in only 1 group with 0 after period crashes and larger sample sizes within groups. The model shown in Table C20 indicates the CMF decreases with both AADT and crash rate although the p-value for AADT is high at 0.45. If the data are also grouped by state, the model shown in Table C21 is estimated. The p-value for the AADT variable improves substantially but the p-value for the crash rate variable increases. The latter change is a slight one, so on balance this model can be considered better than the previous one. The next model, shown in Table C22, includes a state indicator variable and shows little dif- ference in the parameter estimates. A model including an interaction term for AADT ∗ Rateb was unsuccessful in that the direction of effect for AADT and Rateb were flipped to positive and were estimated with increased p-values. Next we try grouping only by AADT and shoulder width. The model, shown in Table C23, indicates the CMF decreases with both AADT and shoulder width although the p-value for both variables is high at 0.45 and 0.20 respectively. Parameter Estimate Standard Error p-value Intercept 1.0366 0.1701 <0.0001 -0.1266 0.1647 0.4540 -0.5371 0.2959 0.0895 AADT Rateb Table C20. ROR CMFunction grouped by AADT and crash rate. Parameter Estimate Standard Error p-value Intercept 0.9394 0.1403 <0.0001 AADT -0.1736 0.1301 0.1820 Rateb -0.3493 0.2347 0.1368 Table C21. ROR CMFunction grouped by state, AADT and crash rate. Parameter Estimate Standard Error p-value Intercept 0.9787 0.1739 <0.0001 Kentucky -0.1753 0.1622 0.2798 Missouri -0.0281 0.1465 0.8482 Pennsylvania 0.0000 n/a n/a AADT -0.1760 0.1299 0.1755 Rateb -0.2908 0.2358 0.2174 Table C22. ROR CMFunction grouped by state, AADT and crash rate including state in the model. Parameter Estimate Standard Error p-value Intercept 1.0749 0.1992 <0.0001 AADT 0.0290 0.1561 0.8532 -0.5792 0.2281 0.0143 SHLDWID -0.0255 0.0189 0.1834 Rateb Table C19. ROR CMFunction grouped by AADT, crash rate, and shoulder width.

Guidelines for Developing Crash Modification Functions C-49   Parameter Estimate Standard Error p-value Intercept 0.9062 0.1701 <0.0001 AADT -0.1053 0.1400 0.4520 SHLDWID -0.0252 0.0199 0.2042 Table C23. ROR CMFunction grouped by AADT and shoulder width. Parameter Estimate Standard Error p-value Intercept 1.5757 0.2617 <0.0001 AADT -1.2386 0.3967 0.0018 SHLDWID -0.1158 0.0342 0.0007 SHLDWID*AADT 0.1503 0.0506 0.0030 Table C24. ROR CMFunction grouped by AADT and shoulder width including interaction term. Grouping the data by State in addition to AADT and shoulder width did not improve the model fit. In the next model, shown in Table C24, an interaction term is included between AADT and shoulder width. This model indicates that while the CMF decreases with increasing AADT and shoulder width this decreasing trend can be lessened and even reversed at combinations of high AADT and shoulder width. Based on the results of the modeling and the perceived usefulness of each model to prac- titioners, the preferred model is that where the data was categorized by state, AADT, and EB expected crash rate without treatment and where AADT and crash rate were included in the model. The selection of a ”best” model is perhaps subjective and given the high p-values for the parameter estimates this distinction is perhaps not warranted. However, for the purposes of this illustration we will proceed as such. 4.5 Addressing Other Issues Two of the other issues identified at the outset were whether weighted vs. unweighted or random effects models should be considered. Thus far all results have used weighted (by the inverse of the variance of the CMF) and fixed effects models. First, we run the final model again but this time without weighting the observations. Any group with 0 after period crashes was not included, to make the results directly comparable with the weighted model. Table C25 compares the results for the weighted and unweighted models. The results exhibit fairly large differences in magnitude although similar in direction of effect for the AADT and crash rate variables. The p-values of the estimated parameters are signifi- cantly higher for the unweighted model. Given these results and the logic behind weighting the regression to provide more weight to more precise estimates of the CMF it can be concluded that weighted regression is preferable in this application. Second, we run the final model again but this time modeling the state a site is located in as a random effect. This assumption assumes that there is some fundamental difference in the Weighted Model Unweighted Model Parameter Estimate Standard Error p-value Estimate Standard Error p-value Intercept 0.9394 0.1403 <0.0001 1.1819 0.2053 <0.0001 AADT -0.1736 0.1301 0.1820 -0.0909 0.1789 0.6114 Rateb -0.3493 0.2347 0.1368 -0.6954 0.3677 0.0586 Table C25. Comparison of weighed and unweighted models.

C-50 Guidelines for the Development and Application of Crash Modification Factors application circumstances between the states that is not represented in the model. Examining the results in Table C26 shows that there is a minimal difference in the parameter estimates between the two models. The standard errors of the parameter estimates are negligibly larger for random effects model. 5 Discussion of Findings This case study sought to explain the variation in CMF findings for run-off-road crashes between sites where both shoulder and centerline rumble strips were applied through meta- regression. The data consisted of site-level results from an empirical Bayes before-after analysis using data from three states. Three variables were available for all states and thought to potentially impact the expected value of the CMF at a site: AADT, EB expected crash rate without treatment, and the shoulder width. Because the road segments were short in length it was necessary to aggregate the sites into similar groups for analysis. Table C27 illustrates the difficulty with defining groups that are disaggregate enough to pro- vide insights into the variables influencing the CMF value while at the same time including enough sites to provide meaningful results. When defining groups by AADT, crash rate, and shoulder width alone with the ranges previously defined there are many groups with fewer than 10 observed crashes and several with 0. Groups with 0 crashes after cannot be used since a CMF value of 0 cannot produce a standard error of that estimate which is used in weighted regression. Besides which, a CMF of 0 is not practically possible. As a result of this disaggregation the CMFs between groups is highly variable and the precision of the estimates is large. Making sense of such data is difficult to say the least. For all models a simple linear model form was chosen. With relatively few data points it was not practical to investigate more complicated model forms. Interaction terms between the multi-variable models was attempted but offered little model improvement or insight into such interactions. Again, not surprising given the limited number of observations. The use of state as a categorical variable in the models improved the statistical significance of the other variables in some models although these state terms were not always statistically significant themselves. The comparison of weighted versus non-weighted models indicated that weighted (by the inverse of the variance of the CMF) models provided a better fit and more precise parameter estimates. A comparison of a fixed and random effects models indicated that the parameter estimates were similar and the standard errors for the parameter estimates were slightly larger for the random effects model. Fixed Effects Model Random Effects Model (by State) Parameter Estimate Standard Error p-value Estimate Standard Error p-value Intercept 0.9394 0.1403 <0.0001 0.8939 0.1417 0.0242 AADT -0.1736 0.1301 0.1820 -0.1633 0.1275 0.2084 Rateb -0.3493 0.2347 0.1368 -0.3027 0.2258 0.1884 Table C26. Comparison of fixed and random effects models.

Volcat Rateb SHLDWID No. sites Observed Crashes Expected Crashes CMF Std Error 1 1 1 16 0 0.91 0.00 1 1 2 23 3 3.10 0.90 0.57 1 1 3 29 0 1.94 0.00 1 1 4 27 12 13.50 0.79 0.36 1 2 1 95 11 10.70 1.01 0.33 1 2 2 140 18 17.85 1.00 0.25 1 2 3 61 1 4.94 0.20 0.20 1 2 4 113 9 12.32 0.72 0.25 1 3 1 68 2 6.51 0.30 0.22 1 3 2 108 12 12.10 0.98 0.30 1 3 3 59 1 3.03 0.32 0.33 1 3 4 66 0 4.33 0.00 1 4 1 39 6 5.81 1.01 0.44 1 4 2 87 12 15.77 0.75 0.24 1 4 3 15 2 3.27 0.59 0.43 1 4 4 10 5 16.45 0.28 0.15 1 5 1 43 53 47.18 1.09 0.24 1 5 2 123 76 76.57 0.99 0.13 1 5 3 11 2 6.77 0.29 0.21 1 5 4 10 24 78.33 0.30 0.08 2 1 1 1 4 0.39 5.24 4.15 2 1 3 3 1 0.38 2.00 2.19 2 1 4 7 3 1.37 1.90 1.31 2 2 1 11 6 2.78 1.93 1.02 2 2 2 28 4 5.83 0.66 0.35 2 2 3 15 0 2.47 0.00 2 2 4 47 18 16.56 1.07 0.28 2 3 1 10 1 1.92 0.45 0.48 2 3 2 31 5 6.24 0.77 0.38 2 3 3 30 6 4.44 1.29 0.60 2 3 4 118 48 44.28 1.05 0.23 2 4 1 17 6 8.25 0.70 0.32 2 4 2 24 5 6.93 0.70 0.34 2 4 3 19 4 0.91 4.06 2.32 2 4 4 51 9 14.84 0.60 0.21 2 5 1 40 6 16.57 0.36 0.15 2 5 2 83 54 75.15 0.72 0.11 2 5 3 18 3 10.04 0.29 0.18 2 5 4 29 22 33.26 0.63 0.19 3 1 4 1 0 0.14 0.00 3 2 1 2 5 1.54 2.43 1.68 3 2 2 3 0 0.57 0.00 3 2 3 2 0 0.17 0.00 3 2 4 10 2 3.73 0.51 0.38 3 3 1 3 1 1.56 0.49 0.54 3 3 2 12 4 2.27 1.62 0.94 3 3 3 4 0 0.29 0.00 3 3 4 50 6 6.72 0.86 0.39 3 4 1 2 2 0.98 1.40 1.24 3 4 2 10 0 1.59 0.00 3 4 3 1 0 0.33 0.00 3 4 4 33 14 24.04 0.54 0.21 3 5 1 6 1 4.22 0.22 0.23 3 5 2 51 1 6.14 0.16 0.16 3 5 3 3 0 1.77 0.00 3 5 4 19 14 14.00 0.98 0.29 4 2 1 1 0 0.37 0.00 4 2 3 3 1 0.64 1.18 1.30 4 2 4 6 4 2.47 1.08 0.84 4 3 2 3 0 0.39 0.00 4 3 3 7 1 0.89 0.97 1.03 4 3 4 21 6 4.34 1.32 0.61 4 4 1 2 0 0.45 0.00 4 4 2 4 0 0.77 0.00 4 4 3 9 0 2.14 0.00 4 4 4 30 2 1.96 0.97 0.72 4 5 1 12 1 2.33 0.39 0.41 4 5 2 13 0 3.62 0.00 4 5 3 16 5 13.59 0.34 0.18 4 5 4 23 5 14.08 0.35 0.16 Table C27. Illustration of data defined by three categories.

C-52 Guidelines for the Development and Application of Crash Modification Factors Case Study 2 Conversion of Conventional Intersections to Roundabouts Preamble This is one of a series of four case studies to demonstrate the proposed guidelines for develop- ing CMFunctions from either cross-sectional data or before-after data from actual safety treat- ment applications. The purposes of this specific case study are to: • Illustrate a heuristic methodology a future researcher may follow to derive a CMFunction from before-after application circumstance data on individual treatments sites using regres- sion analysis. • Illuminate some of the considerable issues and challenges that may be encountered in the process and, in so doing, address what it may take for future researchers to resolve them. The following list of potential issues identifies, in bold text, the ones that are pertinent to this case study: – Selection of candidate influencing variables (Section 1.4) – Accounting for interactions among candidate variables (Section 1.8) – Different CMfunctions for different subgroups of data (e.g., grouped by jurisdiction) – Bias due to aggregation, averaging or incompleteness in data – Tools for exploring the appropriate functional form of the model (Section 1.2) – Combining results from multiple sites (or groups) from one study with single CMF esti- mates for entire studies – Improving site and study levels estimates of CMFs and variance to reduce variance of CMFunction estimates – Appropriate modeling methodology; e.g., GLM, full Bayes MCMC – Whether fixed or random effects models are appropriate (Section 2.4) – Guidelines for creating subgroups of sites using important variables (Section 2.6) – Assessing model fit (Section 1.4) – Selection of the most robust CMFunctions from among several candidates (Sections 1.4 and 2.3) – Estimating precision of CMFs from CMFunctions derived • Illustrate the considerable data requirements for estimating a robust CMFunction so that future research planning will endeavor to assemble appropriate datasets. Given this scope, the purpose is not to derive robust CMFunctions, although that would have been a bonus objective worth achieving. In any case, as highlighted in the demonstration, ideally suited retrospective datasets for this purpose are hard to come by. Indeed, the best, and perhaps only defendable way forward in deriving CMFunctions for some treatments may well be the application of quasi randomized trials, or propinquity designs that approach such trials, for which the case is made in Appendixes F and G. 1 Introduction The treatment in this case study is the conversion of conventional signalized intersections to roundabouts. The dataset used originated from several research efforts that each conducted an empirical Bayes before-after study and estimated a CMF for each converted site and groups of sites. The case study analysis focuses on total crashes. In developing a CMFunction, only those variables that are shared across the data and that are suspected to influence the value of the CMF are considered. These are: • Entering AADT • The empirical Bayes (EB) expected crashes per year before treatment

Guidelines for Developing Crash Modification Functions C-53   • Area type (urban versus rural) • Number of circulating lanes • Number of entering legs A key difficulty that would be typically encountered in developing CMFunctions in before- after evaluations is that the variables available for consideration are likely correlated. The illus- trative exploration of the CMFunction development applied the meta-regression approach. 2 Data The data came from the three sources listed below. Some of the sites were used for both Source 1 and Source 2 and were only included once in this analysis. • Source 1. Rodegerdts, L., et al. NCHRP Report 572: Roundabouts in the United States. Trans- portation Research Board of the National Academies, Washington, D.C., 2007. • Source 2. Srinivasan, R., et al. NCHRP Report 705: Evaluation of Safety Strategies at Signal- ized Intersections. Transportation Research Board of the National Academies, Washington, D.C., 2011. • Source 3. Bagdade, J., et al. Evaluating the Performance and Safety Effectiveness of Round- abouts. Michigan Department of Transportation Report RC-1566, 2011. In the analyses for these studies, to account for potential selection bias and regression-to- the-mean, an EB before-after analysis was conducted, utilizing reference groups of untreated intersections with similar characteristics to the treated sites. Crash types considered all those identified as related to the intersection. The data available included the EB expected number of after period crashes in the absence of treatment, its variance, and the observed after period crashes for each location. With this infor- mation a CMF may be calculated for any individual site or group of sites. Table C28 provides summary statistics for the data. Note that for some sites the number of circulating lanes and/or number of approaches was not reported. 3 Identified Issues There were several issues identified at the outset of the CMFunction development analysis that needed to be considered. Issue 1 Whether to Consider Each Site Separately or Aggregate Sites into Groups Based on the Variables of Interest When conducting a meta-regression, an analyst can calculate the CMF for each individual site and use it as a single observation. However, this can be problematic when individual sites’ CMFs are based on very few crashes. By considering each site separately, the individual CMF estimates MAX 43123.00 3.00 5.00 42.17 2.43 MEAN 20933.79 1.69 3.78 9.25 1.01 STDev 8976.75 0.64 0.47 10.05 0.73 Statistic Entering AADT No. Circulating Lanes No. Approaches Setting, n EB expected crash rate prior to conversion CMF No. Sites 39 32 41 Urban, 20 Suburban, 15 Rural, 4 41 41 MIN 2000.00 1.00 3.00 1.14 0.00 Table C28. Summary of data.

C-54 Guidelines for the Development and Application of Crash Modification Factors may show a large variation with many CMF estimates near 0 and a large value of the CMF for any site that happened to have a crash in the after period. For sites with no after-period crashes, the CMF would be equal to 0, and it is not possible to estimate the standard error of the CMF because one of the terms in the variance equation would involve a division by 0. Figure C13 provides a histogram of CMF values. There is a good range of data although there are 10 sites with an estimated CMF of 0. If it is desired to conduct regression analysis with each site’s CMF weighted by the inverse of the variance of CMF these sites could not be used on their own since the variance cannot be determined. This omission would bias the estimated CMFunction. If the meta-regression to develop a CMFunction is unsuccessful at the site level, then this difficulty can be resolved by aggregating sites into groups. The question then is how to do this aggregation such that the estimated CMFs for each group will be sufficiently precise and useful and logical insights can be obtained into how the CMF can be expected to vary between sites. It is logical to define groups based on similar values of those variables that may influence the expected value of the CMF. Further complicating the issue is how to represent the variables used to define groups in the CMFunction. For categorical variables such as number of circulating lanes this is not an issue as all sites within a group have the same value. For continuous variables, such as entering AADT, the resolution of this issue is less clear. One option is to consider the grouped data as a categorical variable. For example, if entering AADT were divided into 5 groups by range of entering AADT then the entering AADT variables would be categorical with 5 values. The negative impact of this is that the estimated CMF will suddenly jump between levels of the categorical variable, which is not logical for what is truly a continuous variable. Another option is to determine the weighted average of the continuous variable within each group. For example, the average entering AADT can be calculated for each group and that average value used in the model. Issue 2 Whether to Use Weighted or Non-Weighted Regression In conducting the meta-regression, either a weighted or non-weighted regression may be applied. If weighted regression is pursued the weights for each CMF should be equal to the inverse of the variance of the CMF. This weighting will give CMF estimates that are based on more data and thus are more precise more influence on the model. On the other hand, information may be lost if the less precise CMFs are at the extreme ends of the values for the explanatory variables. For example, there may be few sites with a high entering AADT and therefore these estimates will be less precise and would receive less weight in the model, and by weighting the model important information about the impact of high entering AADTs on the CMF may be lost. 0 2 4 6 8 10 12 14 16 18 20 Histogram of CMF Values Figure C13. Frequency of CMFs for site-level data.

Guidelines for Developing Crash Modification Functions C-55   4 Meta-Regression CMFunction Exploration As noted earlier, this is a heuristic approach illustration. In this, the first step taken was to look at the impact of entering AADT on the CMF estimate, followed by exploration of the impacts of expected number of crashes and other variables. Priority was given to entering AADT, since traffic exposure is the most significant factor affecting the expected crash frequency. A secondary priority was given to expected crash rate, with the understanding that crashes cannot be reduced when there is no crash problem and the belief that a CMFunction for converting intersections to roundabouts that accounts for the expected crash rate of the intersection before conversion would be welcomed by practitioners. Other variables were examined after these two main factors were considered. First, models were attempted for each type of previous traffic control and using each site as an observation. Following that, sites were grouped by the variables of interest and a second set of models fit for these data. 4.1 Analysis Using Site-Level CMF Estimates The following two graphs are scatterplots of the estimate of the CMF versus entering AADT before conversion and the EB estimate of expected crashes per year before treatment. As seen in Figure C14 and Figure C15, there appears to be a positive relationship between entering AADT and the CMF. Although the data are a bit noisy, it also appears that this relation- ship could possibly be approximated by either a linear or power model, so those forms were considered for the purposes of this illustration. However, it should be pointed that exploring other forms, such as a parabola, would be justified and is recommended for future CMFunction development. Guidelines on model form section is provided in Sections 1.2 and 2.3.2. Model 1. For the first model a weighted linear regression with a normal error distribution is fit to the data including entering AADT as the only explanatory variable. The weight given to each observation is the inverse to the variance of the CMF estimate. Entering AADT has been scaled by dividing by 10,000 so that the parameter estimates are not misleadingly small. Equation C34CMF intercept AADT 10,000 = + β × CMF 0.00 0.50 1.00 1.50 2.00 2.50 3.00 0 10000 20000 30000 40000 50000 Figure C14. Scatterplot for signal conversions CMF vs. entering AADT.

C-56 Guidelines for the Development and Application of Crash Modification Factors The model estimates are shown in Table C29. The parameter estimate for the AADT variable is statistically significant. The model indicates that the CMF is expected to increase as enter- ing AADT increases. A CMF value of 1.0 is reached at an AADT of approximately 27,600. The model would also indicate a negative CMF for entering AADT values below 4,766 which is an impossibility. This is made possible by the model form selected. The regression data in fact only includes one site with an entering AADT under this value. In an application the model would not be recommended for use for that site and any other with an entering AADT under 4,766. This may not be an issue as sites with low AADTs are not typical candidates for roundabout conversion. The following model uses unweighted regression. The higher standard errors of parameter estimates (relative to the estimate) indicate that weighting the regression data is desirable. Model 2. The second model considers the EB estimate of expected crashes per year before treatment. Again, a weighted linear regression model is fit: CMF intercept EBrate Equation C35a= + × where EBrate = the expected crash frequency per year prior to treatment Parameter Estimate Standard Error p-value Intercept 0.2468 0.2804 0.3845 0.3685 0.1234 0.0050 Table C30. Signalized conversions Model 1 unweighted estimates. CMF 0.00 0.50 1.00 1.50 2.00 2.50 3.00 0.00 10.00 20.00 30.00 40.00 50.00 Figure C15. Scatterplot for signal conversions CMF vs. expected crashes per year before. Parameter Estimate Standard Error p-value Intercept -0.3296 0.2686 0.2279 0.4817 0.1228 0.0004 Table C29. Signalized conversions Model 1 weighted estimates.

Guidelines for Developing Crash Modification Functions C-57   In this model the CMF increases as EBrate increases although the parameter estimate is insig- nificant. The indication of an increasing CMF with increasing EBrate is perhaps not surprising given that the expected crash frequency is correlated positively with traffic volumes. If the effects of traffic volume on the expected CMF were accounted for, it may be expected that for sites with the same traffic volume, roundabouts may be more effective at the site with a higher crash frequency. Model 3. The third model continues with the weighted linear regression model but now including both the entering AADT and crash rate terms. AADTCMF intercept 10,000 EBrate Equation C36a1 a2= + × + × With both variables included in the model, it appears that the CMF increases in value as AADT increases, as before, but with a greater slope. However, it decreases in value as the expected crash rate before conversion increases, although this effect is highly insignificant, as was the case for the positive crash rate parameter in Model 2. Model 4. From the scatterplots in Figure C14 and Figure C15 it was difficult to tell if a model form other than linear is appropriate. However, now the same model is estimated but with a power form for the entering AADT and expected crash rate variables shown in Equa- tion C38. Other possibilities could include a power model for AADT and linear model for crash rate or vice versa. The error distribution is now assumed to be lognormal because the equation is only linearized after taking the natural logarithm of both sides. Note that any logical model forms that may fit the data in Figure C14 and Figure C15 could be tested and the results com- pared for goodness-of-fit. An advantage of using the model form in Equation C38 is that nega- tive predictions are not possible. However, the disadvantage is that it forces the CMF towards zero as AADT approaches zero, which may not reflect reality. This disadvantage may not be an issue since intersections are typically not converted to roundabouts where traffic volume is very low considering that operational and safety benefits are negligible in such cases. The regression weight applied is equal to: w CMF s i i i =    Equation C37 2 Parameter Estimate Standard Error p-value Intercept 0.4733 0.1733 0.0097 β 0.0146 0.0106 0.1781 Table C31. Signalized conversions Model 2 weighted estimates. Parameter Estimate Standard Error p-value Intercept -0.3494 0.2611 0.1809 1 0.5389 0.1428 0.0002 2 -0.0076 0.0106 0.4711 Table C32. Signalized conversions Model 3 weighted estimates.

C-58 Guidelines for the Development and Application of Crash Modification Factors where wi = weight of CMF observation i CMFi = value of CMF observation i si = standard error of CMF observation i exp AADT EBrateintercept=    β β Equation C38CMF 10,000 1 2 The parameter estimates in Table C33 show the same relationships with the expected CMF as for Model 3 and the precision of the estimated parameters is much improved. To compare Models 3 and 4, scatterplots were prepared showing the actual CMF value and the CMF predictions versus the two predictive variables. These are shown in Figure C16 and Figure C17. The performance of the two models appears similar. For some observed CMF values Model 3 predicts a closer value while for others Model 4 appears to do better. The chi-square homogeneity statistics were calculated for each model. For Model 3, which has a linear form, this is calculated as: w CMF CMFi i i Equation C395 2∑ ( )−  For Model 4, which has the exponential, form this is calculated as: w CMF CMFi i i Equation C40ln ln 2∑ ( )( )−  The degrees of freedom equal 35 (= 38 − 3). The chi-square for Model 3 is equal to 291.55 and is 209.87 for Model 4. The p-value for both is less than 0.05 indicating that there is still variation between CMF estimates that is not being explained by the models. However, Model 4 is performing better than Model 3 as shown by the lower value of the chi-square statistic. Given the results of this comparison it was decided to continue with the exponential model form from Model 4 for CMFunction development. Model 5. It was also of interest to see if additional variables may explain some of the varia- tion in the CMF value between sites. These additional variables are setting (urban, suburban, rural), number of approaches and number of circulating lanes. The power model form for entering AADT and crash rate was retained and the categorical variables were entered using an expo- nential model form: exp AADT EBrate expintercept Rural Suburban Appr Lanes Equation C41CMF 10,000 4 5 6 1 2 3=    β β β × +β × +β × +β × where Appr = 1 if 3 approaches; 0 if 4 or more approaches Lanes = 1 if one circulating lane; 0 if more than 1 circulating lane Parameter Estimate Standard Error p-value Intercept -0.4084 0.1260 0.0025 1 1.1304 0.1441 <0.0001 2 -0.1488 0.0595 0.0169 Table C33. Signalized conversions Model 4 weighted estimates.

Guidelines for Developing Crash Modication Functions C-59   CMF Model 3 Model 4 -0.50 0.00 0.50 1.00 1.50 2.00 2.50 3.00 0.00 10.00 20.00 30.00 40.00 50.00 Figure C17. Scatterplot for signal conversions—CMF and predicted CMFs vs. expected crash rate. Parameter Estimate Standard Error p-value Intercept 0.4853 0.3271 0.1483 1 1.3753 0.2226 <0.0001 2 -0.4680 0.0822 <0.0001 3 -0.1970 0.2291 0.3968 4 -0.7883 0.1486 <0.0001 5 -0.2995 0.1890 0.1235 6 -0.2073 0.2138 0.3401 Table C34. Signalized conversions Model 5 weighted estimates. -0.50 0.00 0.50 1.00 1.50 2.00 2.50 3.00 0 10000 20000 30000 40000 50000 CMF Model 3 Model 4 Figure C16. Scatterplot for signal conversions—CMF and predicted CMFs vs. entering AADT.

C-60 Guidelines for the Development and Application of Crash Modification Factors The model indicates that the CMF is smaller for rural locations and even smaller at suburban locations compared to urban locations but only the difference for suburban was statistically significant at a level below 10%. It also indicates that the CMF is smaller for roundabouts with three approaches and for roundabouts with only one circulating lane. However, these parameter estimates are not statistically significant, even at the 10% level. At this point the analyst needs to decide what level of statistical significance is acceptable and any variables not meeting this criterion should be dropped from the model. 4.2 Analysis with Data Aggregated into Groups In this analysis, instead of using each converted site on its own, data for similar sites are combined before a CMF and its variance are calculated for each group. By grouping data, the individual CMF values being modeled should be more stable and have a smaller variance. Groups for entering AADT and the EB estimate of crashes per year in the before period are created for ranges of these variables. Then for each category the average value is determined and used as a predictive variable. Sites were grouped by the range of entering AADT expected crash rate in an iterative fashion such that each group had a reasonable number of sites and observed after period crashes. Sites were grouped by entering AADT and EB expected crash rate before treatment as follows: If the data are grouped only by entering AADT then there will be only 4 data points. If grouped only by crash rate there will be 3 data points. If grouped by both entering AADT and crash rate there will be 12 (= 4 × 3) data points. Note that it is not practical to create additional groups using setting, number of approaches and number of circulating lanes in addition to entering AADT and EB crash rate since the number of sites within each group would be very small, given that the 39 sites are already in 12 groups. Model 1. The first model considers the entering AADT variable and only considered enter- ing AADT when creating groups. As shown in Figure C18, by combining sites into groups of similar entering AADT much of the variation between sites has disappeared. For the purposes of this illustration, weighted regression was done, but unweighted regression could also be con- sidered given that the number of data points in a group may be too small for a reliable estimate of the weights. In the site-level modeling it was found that a power model form was preferred, and it still appears appropriate from Figure C18. The form of this model is: exp AADTintercept Equation C42CMF 10,000 1 =    β 15,000≤Entering AADT<20,000 Volcat2 17,974 1,396 20,0000≤Entering AADT<25,000 Volcat3 21,554 1,030 25,000≤Entering AADT Volcat4 32,623 5,422 0<EBrate<3 Ratecat1 1.93 0.68 3≤EBrate<5 Ratecat2 4.12 0.54 5≤EBrate Ratecat3 13.85 10.93 Criteria Category Mean value Standard deviation 0<Entering AADT<15,000 Volcat1 10,509 4,215 Table C35. Site grouping for aggregate analysis.

Guidelines for Developing Crash Modification Functions C-61   where AADT used in the model estimation in this case is the mean AADT for each group The parameter estimates for this model are shown in Table C36. The results for the AADT variable are consistent with site-level Model 5. Model 2. Now we consider the EB estimate of crashes per year prior to conversion in addi- tion to volume. The categories are defined by both the entering AADT and crash rate. Figure C19, which plots the CMFs versus EB crash rate for each volume category, suggests that these variables though correlated, could be used as separate terms in that the CMF is seen to decrease with crash rate for each AADT group. The power model was again used. The form of this model is: exp AADT EBrateintercept Equation C43CMF 10,000 1 2=    β β where EBrate used in the model estimation in this case is the mean EBrate for each group The parameters for the model are provided in Table C37. The model indicates that the CMF increases as entering AADT increases and decreases as the crash rate increases which is consistent with site-level Model 5 and the indication from Figure C19 that the CMF decreases with crash rate for each AADT group. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 5000 10000 15000 20000 25000 30000 35000 CMF Figure C18. Scatterplot for grouped signal conversions CMF vs. average entering AADT for each volcat. Parameter Estimate Standard Error p-value Intercept -1.0554 0.1201 0.0009 β 1.3216 0.1177 0.0004 Table C36. Model 1 weighted estimates—grouped data.

C-62 Guidelines for the Development and Application of Crash Modification Factors 5 Discussion of Findings This case study sought to explain the variation in CMF findings for the conversion of con- ventional signalized intersections to roundabouts. The data consisted of site-level results from several empirical Bayes before-after analyses using data from various States. The case study served a very useful purpose in illustrating the considerable difficulties in esti- mating a CMFunction through meta regression, in particular the need for substantial numbers of treatment sites spanning the range of the variables that potentially affect the value of a CMF. Several variables thought to potentially impact the expected value of the CMF at a site were available: entering AADT, the EB expected crash rate without treatment, area type, number of approaches and number of circulating lanes. It was found that site-level Model 5 (Table C34) was consistent with Model 2 calibrated using grouped data (Table C37) in terms of the sign and order of magnitude of the coefficients of the primary variables, AADT and EB crash rate before treatment. However, the former has the advantage of allowing additional variables to be considered in estimating a CMF in an applica- tion context. While linear and non-linear model forms were both found to provide reasonable results, the non-linear form showed a superior fit as measured by the chi-square statistic and was adopted for the final model at the site-level and all group level models. An advantage of the non-linear model form is that negative CMF predictions are not possible. The comparison of weighted versus non-weighted models indicated that weighted regression models provided more precise parameter estimates. Figure C19. Scatterplots for grouped signal conversions CMF vs. crash rate for each volcat. 1.72 1.73 1.74 1.75 1.76 1.77 0 4 0.00 0.50 1.00 0 1 2 3 4 1.72 1.73 1.74 1.75 1.76 1.77 0 1 2 3 1 2 3 4 1.72 1.73 1.74 1.75 1.76 1.77 0 1 2 3 4 volcat1 volcat2 volcat3 volcat4 Parameter Estimate Standard Error p-value Intercept -0.3654 0.1596 0.0451 1 2.0654 0.2190 <0.0001 2 -0.5565 0.1106 0.0005 Table C37. Model 2 weighted estimates—grouped data.

Guidelines for Developing Crash Modification Functions C-63   Case Study 3 Safety Effects of Flattening a Horizontal Curve Preamble This is one of a series of four case studies to demonstrate the proposed guidelines for develop- ing CMFunctions from either cross-sectional data or before-after data from actual safety treat- ment applications. The purposes of this specific case study are to: 1. Illustrate a heuristic methodology a researcher may follow to derive a CMFunction from cross-sectional regression analysis. 2. Illuminate some of the considerable issues and challenges that may be encountered in the process and, in so doing, address what it may take for future researchers to resolve them. The following list of potential issues identifies, in bold text, the ones that are highlighted in this case study: – Selection of candidate influencing variables (Section 1.4) – Accounting for interactions among candidate variables (Section 1.8) – Bias due to aggregation, averaging or incompleteness in data – Functional form for effects of independent variables (Section 1.2) – Considerations in choosing between linear or nonlinear forms – Application of hierarchical modeling – Tools for assessing model fit and choosing among, or amalgamating information from competing models (Section 1.4) – Including estimates from previous studies in the estimation methodology through Full Bayes methods (Section 1.5) – Addressing the co-linearity among explanatory variables – Addressing endogeneity – Estimating precision of CMFs from CMFunctions derived 3. Illustrate the considerable data requirements for estimating a robust CMFunction so that future research planning will endeavor to assemble appropriate datasets. Given this scope, the purpose is not to derive robust CMFunctions, although that would have been a bonus objective worth achieving. In any case, as highlighted in the demonstration, estimating CMFs from cross-sectional data is difficult at best. Indeed, the best, and perhaps only defendable way forward in deriving CMFunctions for some treatments may well be the appli- cation of quasi-randomized trials or propinquity designs that approach such trials, for which the case is made in Appendixes F and G. 1 Introduction The treatment of interest is the flattening of a horizontal curve on a rural two-lane road. The objective is to explore the development of a CMFunction that would be applicable in the scenario where a road designer is contemplating increasing the radius (i.e., flattening the curve) for an existing horizontal curve. In this scenario, the deflection angle of the curve is fixed. The CMFunction could also be applied for decreasing the radius of an existing curve. The discussion though will be presented under the assumption that curve flattening is the goal. To assess the entire impact on safety of flattening a curve it is necessary to consider a study area that extends beyond the limits of the smaller radii curve. Figure C20 illustrates this need. When changing from a smaller to larger radius the tangents on either side of the curve are removed. The study area needs to include the entire roadway travelled to get from the beginning to ending of the largest curve radius under consideration. The approach taken in this illustra- tion is to predict the expected number of crashes for the entire study area using a single model.

C-64 Guidelines for the Development and Application of Crash Modification Factors Figure C21 illustrates the geometric characteristics of horizontal curves, where PI = Point of tangent intersection PC = Point of curve (beginning of curve) PT = Point of tangent (ending of curve) T = tangent length R = curve radius Δ = deflection angle in degrees The length of curve, Lc is directly related to the chosen radius, R. L Rc Equation C44 180 = ∆π where Lc = length of curve R = radius of curve Δ = deflection angle measured in degrees The tangent length is determined by: T R tan Equation C45 2 = ∆     An important consideration in developing a CMFunction for curve flattening is that at larger deflection angles, the curve length is larger for the same radius than at a curve with a smaller deflection angle, and that the difference in travel path lengths between two radii is much larger than at small deflection angles. Figure C20. Illustration of two curves for a given central angle. PC PI PT T R Figure C21. Illustration of curve geometry.

Guidelines for Developing Crash Modification Functions C-65   The data used are from Washington State and were acquired from the Highway Safety Information System (HSIS). The analysis focuses on target crashes, which include all non- intersection-related crashes. 2 Identified Issues There were several issues identified at the outset of the CMFunction development analysis that needed to be considered. Issue 1 Minimizing Confounding Factors One of the most difficult aspects of developing CMFunctions through cross-sectional data is to minimize the differences between sites in variables that affect crash risk other than the variable(s) of interest. Ideally, the only difference between the two groups would be in the variable(s) under study. In observational data this ideal is difficult, if not impossible, to achieve. Regression modeling seeks to account for these other factors that vary between sites. Minimiz- ing the number of confounding factors that the modeling needs to control for will increase the chance of success. This may be achieved in the so-called “propinquity” approach that is sug- gested in Appendixes F and G as a step-down alternative to fully randomized trials. However, as noted, this becomes more challenging in CMFunction development, since there is a need to know and separate variables that impact the CMF that still need to be included from those that may still affect safety but do not impact the CMF. Issue 2 Determining the Appropriate Model Form In developing a cross-sectional model, the appropriate model form defines the relationship between explanatory variables and crashes. If the model form is misspecified then CMF esti- mates will be incorrect. The model form selected should be based on both logical considerations and evidence provided by exploratory analysis of the data. A final selection may also consider GOF statistics for competing models. 3 Data The data acquired included roadway geometry, including horizontal curvature data, traffic volume and crash data for 2008 to 2012 from Washington State. All sites are rural two-lane roads. To remove the confounding factor of deflection angle, the curves were grouped by similar deflection angle as follows: Strictly speaking, it would be ideal to group sites that have the exact same deflection angle. However, this is not practical due to limited sample sizes. The grouping applied provided a reasonably narrow range of deflection angle within each group while still providing a suitable number of curves within each group. Because the grouping does use a range of deflection angle values it cannot be said that the confounding factor of deflection angle is completely removed but this should come close. To determine the study area within each angle group, the tangent length, T, was first deter- mined for the longest curve within the group using Equation C45. Then for smaller curves within the same group, their tangent length was calculated, and the difference between that and Angle Group Deflection Angle Range (degrees) 1 <=5 2 >5 and <=10 3 >10 and <=20 4 >20 and <=30 5 >30 and <=40 6 >40 and <=50 Table C38. Definition of angle groups.

C-66 Guidelines for the Development and Application of Crash Modification Factors the T for the longest radii curve in the group was used to add on segments of roadway before and after the curve for those curves with smaller radii. In this way, the study area for all curves within an angle group is the same because the differences between deflection angles within an angle group are relatively small. Another potential confounding factor is the distance from a curve to the adjacent curve. For this illustration, it was decided to consider only isolated curves to avoid any confounding factors due to potential overlapping influence areas or the effects of closely spaced curves on expected crashes. To do so, only curves where the distance to the next curve is greater or equal to 0.5 miles were used. The distance of 0.5 miles was somewhat arbitrary but did consider the time to travel 0.5 miles. At a travel speed of 55 mph (most sites posted speed was 55 mph or greater), 0.5 miles would be traversed in 33 seconds. This seemed like a reasonable definition for an isolated curve. The distance of 0.5 miles also considered how many curve sites would be available for analysis if a longer or shorter cut-off were selected. Hauer summarized research on safety and degree of curve and Table C39 provides summary statistics for the data for the six angle groups, including the total crash count for the five-year period. The length includes the curve length and tangent lengths for each curve’s study area. 4 CMFunction Development by Regression Modeling 4.1 Investigation of Model Form and Implications for CMFs The main explanatory variables of interest are AADT and the curve radius. The illustration of CMFunction development focuses on these variables initially but does consider additional Table C39. Summary statistics. Angle Group Statistic Radius Deflection Angle Length Speed Limit Left Shoulder Width Right Shoulder Width Lane Width AADT Crashes 1 N 50.00 50.00 50.00 50.00 50.00 50.00 50.00 50.00 50.00 1 MIN 1910.00 0.51 0.56 25.37 1.00 1.00 10.00 251.80 0.00 1 MAX 11460.00 4.92 0.69 65.00 10.83 10.95 17.59 9357.38 7.00 1 MEAN 7328.64 2.92 0.57 55.90 5.35 5.47 11.84 2939.03 1.24 1 STD 2907.22 1.23 0.02 7.80 2.37 2.25 1.14 2408.48 1.55 2 N 74.00 74.00 74.00 74.00 74.00 74.00 74.00 74.00 74.00 2 MIN 1433.00 5.07 0.46 42.83 1.00 1.00 11.00 281.60 0.00 2 MAX 11460.00 9.95 0.47 65.00 10.00 10.00 14.14 12239.40 11.00 2 MEAN 5255.47 7.40 0.46 57.62 5.21 5.29 11.63 2441.07 1.27 2 STD 2685.36 1.49 0.00 5.39 2.34 2.37 0.58 2128.88 1.98 3 N 146.00 146.00 146.00 146.00 146.00 146.00 146.00 146.00 146.00 3 MIN 573.00 10.07 0.67 28.01 0.00 0.00 10.00 246.40 0.00 3 MAX 11460.00 19.93 0.69 65.00 12.90 12.90 17.24 23661.74 11.00 3 MEAN 3917.74 14.93 0.68 55.83 5.35 5.50 11.63 3189.85 2.08 3 STD 2245.72 2.95 0.00 7.38 2.43 2.37 0.97 3473.36 2.35 4 N 99.00 99.00 99.00 99.00 99.00 99.00 99.00 99.00 99.00 4 MIN 229.00 20.39 0.97 41.79 1.06 1.06 10.00 344.60 0.00 4 MAX 11460.00 29.99 0.99 65.00 10.00 10.00 18.66 19965.01 11.00 4 MEAN 3283.06 24.78 0.99 57.11 5.00 4.99 11.70 2790.70 2.35 4 STD 2280.82 3.01 0.00 5.00 2.19 2.16 0.98 2725.92 2.28 5 N 52.00 52.00 52.00 52.00 52.00 52.00 52.00 52.00 52.00 5 MIN 350.00 30.05 1.08 33.11 1.00 1.00 10.00 252.57 0.00 5 MAX 9550.00 39.83 1.11 65.00 10.00 10.00 14.65 16327.58 63.00 5 MEAN 2346.58 34.39 1.10 55.09 5.03 4.93 11.60 3675.19 5.25 5 STD 1888.57 2.85 0.01 6.84 2.64 2.64 0.82 3958.21 9.39 6 N 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 6 MIN 286.00 42.25 3.69 49.08 2.00 2.00 10.00 211.31 0.00 6 MAX 5730.00 91.75 3.79 65.00 6.08 6.06 13.78 9745.72 44.00 6 MEAN 1462.40 66.54 3.76 56.00 3.85 3.85 11.58 3367.16 13.20 6 STD 1769.37 21.37 0.03 4.18 1.51 1.51 1.02 3225.37 15.54

Guidelines for Developing Crash Modification Functions C-67   0 1 2 3 4 5 6 7 8 0 5000 10000 15000 Crashes vs. Radius crashes Figure C22. Plot of crashes vs. radius for Angle Group 1. variables. Experience has shown that the power model for AADT is suitable. For curve radius however several model forms have been attempted by previous researchers including linear, power and exponential. A plot of crash frequency versus radius for sites in Angle Group 1 is shown in Figure C22. It is difficult to ascertain an orderly relationship from such a graph. The Integrate-Differentiate method described in Section 1.2 was applied to better study this issue. The ID plots for each angle group are shown in Figure C23 through Figure C28. Figure C23. Integrate-Differentiate plot for Angle Group 1.

C-68 Guidelines for the Development and Application of Crash Modification Factors Figure C25. Integrate-Differentiate plot for Angle Group 3. Figure C24. Integrate-Differentiate plot for Angle Group 2.

Guidelines for Developing Crash Modification Functions C-69   Figure C26. Integrate-Differentiate plot for Angle Group 4. Figure C27. Integrate-Differentiate plot for Angle Group 5.

C-70 Guidelines for the Development and Application of Crash Modification Factors The ID plots are not particularly informative, although they do seem to suggest that crash frequency does decline as the radius increases. This is most visible in the plots for Groups 2, 3, and 4. The leveling of the plot indicates this relationship. The exact nature of this negative relation ship between crashes and radius is, however, not perceptible. It was decided to explore two alternate assumptions based on the review of literature and the limited insights provided by the ID plots. The radius variable, R, was scaled by dividing by 1,000 to avoid very small parameter estimates. Crashes per year exp AADT expa b c R Equation C461000= × Crashes per year exp AADT c d Ra b Equation C471000( )= + × where AADT = the average annual daily traffic R = curve radius in feet a, b, c, d are parameters to be estimated It is important to acknowledge that the model form selected for the relationship between radius and crashes impacts the nature of the CMFunction to be estimated. Assuming no other variables were included in the model other that AADT and radius, for Equation C46 the CMF for flattening a curve from R1 to R2 is equal to: CMF exp exp exp c R c R c R R Equation C48 2 1000 1 1000 2 1000 1 1000= = ( ) × × − Figure C28. Integrate-Differentiate plot for Angle Group 6.

Guidelines for Developing Crash Modification Functions C-71   As Equation C48 shows, if the parameter c is negative (i.e., fewer crashes occur with a larger radius), as the difference between the smaller and larger curve radii increases, the CMF gets smaller. This is intuitive, meaning that the flatter a curve becomes in comparison to the original radius the smaller the CMF value. This also implies that the CMF value is dependent on the difference between the two radii but not the values of the radii themselves. Therefore, the CMF for changing from a radius of 250ft to 500ft will be the same as changing from 500ft to 750ft. In practice this may not logically apply if the smaller radius were less than the minimum radius for the design speed. While the CMF is dependent on the differences between curves but not the actual value of radii, the model formulation in Equation C46 does mean that the predicted frequency of crashes reduced by curve flattening is dependent on both the differences in radii and the radii values and that the frequency reduced will be greater for flattening smaller radii curves. The change in crash frequency, delta is given by: Delta exp AADT exp expa b c R c R Equation C491000 10002 1[ ]= −× × For Equation C47, if d is negative (i.e., fewer crashes occur with a larger radius), the CMF for flattening a curve from R1 to R2 is equal to: CMF c dR c dR Equation C50 1000 1000 2 1 = + + With this model formulation the CMF value is dependent on the values of R1 and R2, not just their differences. This implies that the CMF becomes smaller at larger radii when the difference between R1 and R2 is constant. Therefore, the value of the CMF for changing from 250ft to 500ft will be larger than for changing from 750ft to 1000ft. While the CMF is dependent on both the difference between radii and the values of the radii, the frequency of crashes reduced is dependent only on the differences in radii between the two curves but not the actual values of the radii. Delta exp AADT d R Ra b Equation C51 1000 1000 2 1= −         To summarize, by Equation C46 the CMF value is dependent on the difference between R1 and R2 but not their values while the frequency of the change in crashes does depend on the actual values of R1 and R2. By Equation C47 the CMF is dependent on the actual values of R1 and R2 while the frequency of the change in crashes is only dependent on the difference between R1 and R2. It is hoped that in the modeling to be done some light will be shed on which of these two model formulations is more appropriate. 4.2 Separate Models by Angle Group The preferred option was to develop models separately for each angle group. In this fashion, all sites within an angle group have roughly the same deflection angle so the potential confound- ing factor of deflection angle is controlled for. By using isolated curves, we are also controlling for the confounding factor of distance to next curve. Equation C46 was fit to the data for each angle group using generalized linear modeling with a negative binomial error distribution. The results are shown in Table C40.

C-72 Guidelines for the Development and Application of Crash Modification Factors The models in Table C40 show mixed success. Only two groups (1 and 4) had a statistically significant result for the radius term. For Groups 3 and 6, the results indicate an increasing crash frequency with increasing radius contrary to what is expected, although the results are not statistically significant. We may have expected the radius parameter, c, to be increasingly negative, and thus provide smaller CMF values, at higher angle groups but this is not seen. One of the reasons may be that in the dataset there are few curves with a large deflection angle. The implied CMFs per 100 ft. increase in radius using the models estimated in Table C40 are shown in Table C41. Equation C47 was also fit to the data for each angle group using full Bayes MCMC techniques that allow for the additive term in the model, with a negative binomial error distribution. The results are shown in Table C42. The models in Table C42 are not very satisfactory. The parameter estimates c and d are not sta- tistically significant and only three of six groups show a negative relationship between increased radius and crashes. The model for angle group 6 is quite poor in that the parameter estimate for AADT is negative. 4.3 Modeling Using Combined Data Since the separate modeling of angle groups produced few statistically significant estimates of the parameter associated with radius, the data are now combined. It is hoped that the increased size of the sample data will allow for better models to be estimated. However, because the data are now combined, the confounding factor of deflection angle will need to be considered. Models for both Equation C46 and Equation C47 are again estimated. Angle Group a b c Dispersion parameter 1 -7.0540 (1.5530) 0.8269 (0.1782) -0.1357 (0.0524) 0.0286 (0.1426) 2 -9.5188 (1.4862) 1.0920 (0.1924) -0.0759 (0.0616) 0.4713 (0.2616) 3 -6.7720 (0.7548) 0.7397 (0.0907) 0.0039 (0.0348) 0.3175 (0.1120) 4 -4.9453 (0.9727) 0.5931 (0.1253) -0.1382 (0.0444) 0.3018 (0.1114) 5 -7.3864 (0.9209) 0.9226 (0.1114) -0.0719 (0.0642) 0.3369 (0.1265) 6 -8.8695 (1.7497) 1.1807 (0.2050) 0.0842 (0.0882) 0.1164 (0.0989) Table C40. Equation C3 modeling results by angle group. Table C41. Implied CMFs from equation C3 models by angle group. Angle group c parameter CMF/100 ft increase in radius CMF/500 ft increase in radius 1 -0.1357 0.99 0.93 2 -0.0759 0.99 0.96 3 0.0039 1.00 1.00 4 -0.1382 0.99 0.93 5 -0.0719 0.99 0.96 6 0.0842 1.01 1.04

Guidelines for Developing Crash Modification Functions C-73   4.3.1 Exponential Models (Equation C3) Model 1. To consider the deflection angle, within each angle group the average deflection angle is determined. The average deflection angle within the group was used because the study area for each site was defined by assuming the deflection angle within a group was constant. For lack of a better hypothesis the parameters for the radius term in Equation C46 is modeled as a multi-level model where the parameter has a linear relationship with deflection angle. This is consistent with the hypothesis that the impact of curve flattening is more beneficial for curves with larger deflection angles. The model fit is: Crashes per year exp AADT expa b c d avgangle R Equation C521000= ( )+ × × where AADT = average annual daily traffic avgangle = average deflection angle for the angle group as shown in Table C42 R = radius in feet The parameter estimates are shown in Table C43. The implied CMFs per 100 ft. increase in radius are shown in Table C44. The model indicates that a longer radius is generally associated with fewer crashes, but the percentage reduction is smaller when the deflection angle is higher. In fact, for the two largest groups of deflection angle the increase in radius appears to be associated with more crashes. This is obviously not the expected result. This finding is likely due in part because of how the sites were defined within each group. For each group of deflection angle, the study area includes the entire travel distance for the longest curve (i.e., the curve with the largest radius). For smaller radii curves in the group the study area thus includes tangent segments on either side of the curve. Sites with larger deflection angles Angle Group a b c d Dispersion parameter 1 -11.3600 (2.5890) 0.8304 (0.1872) 106.3000 (82.7200) -5.4700 (4.9940) 0.0054 (0.0116) 2 0.4523 (99.9900) 0.4352 (0.2383) 0.0896 (0.6997) 0.0041 (0.0154) 0.7018 (0.7308) 3 -0.1354 (100.3000) 0.6218 (0.0863) 0.0038 (0.0029) 0.000009 (0.000155) 0.2869 (0.1316) 4 0.3443 (99.8400) 0.3606 (0.1469) 0.0676 (0.0777) -0.0045 (0.0054) 0.2803 (0.1377) 5 0.9356 (100.4000) 0.7143 (0.1196) 0.0053 (0.0065) -0.0001 (0.0006) 0.4207 (0.1652) 6 -0.5570 (99.9200) -0.3710 (0.1325) 59.4200 (55.8900) 62.9600 (59.5200) 2.4432 (1.1359) Table C42. Equation C4 modeling results by angle group. a b c d Dispersion parameter -7.2312 (0.4658) 0.8622 (0.0574) -0.1871 (0.0222) 0.0057 (0.0010) 0.4489 (0.0667) Table C43. Combined data exponential Model 1. Angle group CMF/100 ft. increase in radius 1 0.983 2 0.986 3 0.990 4 0.995 5 1.001 6 1.024 Table C44. CMFs implied from combined data exponential Model 1.

C-74 Guidelines for the Development and Application of Crash Modification Factors tend to have larger study areas by this definition. The result being that without accounting for this fact it is not surprising to see that more crashes are associated with the larger deflection angle sites for a given radius. Model 2. To attempt to account for the difference between the sizes of study area between angle groups we first estimate a model with a constant parameter for radius but with different intercept terms for each deflection angle group. As expected, the intercept terms indicate that the higher angle groups, which tend to have larger study areas, are expected to have more crashes for a given radius and level of traffic. Also, as expected, the term for radius is negative indicating that the expected crash rate decreases as radius is increased. With this model formulation the CMF is not dependent on deflection angle because the relation- ship between radius and crash risk is constant. There is some logic to this in that the crash risk per unit length traversing a curve of a given radius should not be dependent on the deflection angle. The model fit is: Crashes per year exp AADT expb c Ranggrp Equation C531000= α × where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic avgangle = average deflection angle for the angle group as shown in Table C42 R = radius in feet The parameter estimates are shown in Table C45. Table C46 illustrates that while the CMF is constant for a given change in radius, the frequency of crashes reduced is larger for the larger deflection angle groups. This is because for a given radius the length of curve is larger for a larger deflection angle. For this model the CMF value for a given change in radius does not change dependent on the deflection angle. However, we can see that the change in the number of crashes does and the reduction in crash counts from curve flattening increases as the deflection angle increases. So, safety, as measured by crash frequency, is highly dependent on deflection angle when con- sidering the choice of radius. It is also evident from this model formulation that the crashes reduced is higher for small radii curve being flattened by the same amount than at a larger radii curve. Angle Group αanggrp b c Dispersion parameter 1 -7.5284 (0.5777) 0.8255 (0.0548) -0.0613 (0.0213) 0.3583 (0.0618) 2 -7.4863 (0.5651) 3 -7.2097 (0.5530) 4 -6.9833 (0.5542) 5 -6.6179 (0.5571) 6 -5.6999 (0.4973) Table C45. Combined data exponential Model 1.

Guidelines for Developing Crash Modification Functions C-75   Model 3. As a check on the relationship between radius and crashes for different levels of deflection angle, for this model we also allow the parameter for radius to be dependent on the deflection angle as was done in Model 1. The variable modeled is the average deflection angle within each deflection angle group. The model is: Crashes per year exp AADT expb c d avgangle Ranggrp Equation C541000= ( )α + × × where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic avgangle = average deflection angle for the angle group as shown in Table C42 R = radius in feet The parameter estimates are shown in Table C47. The parameter estimates have not changed significantly from Model 2. The parameter d has a standard error larger than the estimate so is not statistically significant re-affirming that the relation- ship between radius and crash risk per unit of travel does not depend on the central angle. It must be remembered that this does not mean that safety as measured by crash frequency does not depend on deflection angle, but that the CMF is not dependent on deflection angle. As shown in Table C46, the frequency of crashes saved by curve flattening is very much depen- dent on deflection angle. 4.3.2 Linear Models Model 1. Now we estimate models for Equation C47 for the combined data. Taking our cue from what was learned when developing exponential models in Section 3.3.1, we estimate a Angle Group αanggrp b c d Dispersion parameter 1 -7.4219 (0.6807) 0.8248 (0.0547) -0.0791 (0.0354) 0.0010 (0.0015) 0.3573 (0.0616) 2 -7.4257 (0.4715) 3 -7.1897 (0.4515) 4 -6.9945 (0.5395) 5 -6.6445 (0.4641) 6 -5.7990 (0.5221) Table C47. Combined data exponential Model 1. Angle Group Crashes with 150-ft radius Crashes with 1,000- ft radius Crashes per year saved in going from 150 ft to 1,000 ft CMF/100-ft increase in radius 1 0.603 0.572 0.031 0.949 2 0.628 0.596 0.032 0.949 3 0.829 0.787 0.042 0.949 4 1.039 0.986 0.053 0.949 5 1.498 1.422 0.076 0.949 6 3.750 3.560 0.190 0.949 Table C46. Illustration of crashes saved assuming AADT = 5000.

C-76 Guidelines for the Development and Application of Crash Modification Factors separate intercept term for each angle group to account for the fact that the study areas vary in size. The model is: Crashes per year exp AADT c d Rbanggrp Equation C551000( )= + ×α where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic R = radius in feet The parameter estimates are shown in Table C48. The results still show that as radius increases the expected crash frequency decreases. The parameter estimates c and d are statistically significant at approximately the 90th percentile level. Model 2. If the parameter defining the slope of the relationship between crashes and radius is allowed to vary by angle group, the additional parameter is not statistically significant. The model is: Crashes per year exp AADT c d e avgangle Rbanggrp Equation C561000( )( )= + + × ×α where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic avgangle = average deflection angle for the angle group as shown in Table C42 R = radius in feet The parameter estimates are shown in Table C49. The same trend is seen as for exponential Model 3 which also shows that the parameter estimates have not changed significantly from Model 2. The parameter e has a standard error larger than the estimate so is not statistically significant re-affirming that the relationship between radius and crash risk per unit of travel does not depend on the central angle. 5 Selection of Preferred Model The models developed separately for each angle group and documented in Chapter 3.2 were unsatisfactory. For the exponential form given by Equation C46 only two groups (1 and 4) had Table C48. Combined data linear Model 1. Angle Group αanggrp b c d Dispersion parameter 1 -12.3500 (0.8306) 0.8381 (0.0561) 127.3000 (70.4300) -5.3630 (3.4530) 0.3701 (0.0510) 2 -12.3100 (0.8162) 3 -12.0200 (0.8080) 4 -11.7800 (0.7968) 5 -11.4100 (0.8097) 6 -10.4800 (0.8275)

Guidelines for Developing Crash Modification Functions C-77   a statistically significant result for the radius term. For groups 3 and 6, the results indicate an increasing crash frequency with increasing radius, contrary to what is expected, although the results are not statistically significant. For the linear form given by Equation C47 the parameter estimates associated with the radius term are not statistically significant and only 3 of 6 groups indicated a negative relationship between increased radius and crashes. The model for angle group 6 was quite poor in that the parameter estimate for AADT is negative. The models developed using the combined data were more successful. While combining the data complicates matters by introducing the confounding factor of deflection angle the larger sample size is a positive. The question remains, which form of model, represented by Equation C46 and Equation C47 are more appropriate? Crashes per year exp AADT expb c R Equation C31000 ( )= α × Crashes per year exp AADT c d Rb 1000 Equation C4( ) ( )= + ×α The preferred exponential model is Model 2. Crashes per year exp AADT expb c Ranggrp Equation C571000= α × where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic avgangle = average deflection angle for the angle group R = radius in feet The parameter estimates are shown in the reproduced Table C50. The loglikelihood for this model is 551. The preferred linear model is Model 1: Crashes per year exp AADT c d Rbanggrp Equation C581000( )= + ×α where: αanggrp = intercept term specific to each angle group AADT = average annual daily traffic R = radius in feet Table C49. Combined data linear Model 1. Angle Group αanggrp b c d e Dispersion parameter 1 -12.7500 (0.6994) 0.8400 (0.0659) 176.7000 (62.7800) -8.6950 (4.3330) 0.1125 (0.1826) 0.3738 (0.0529) 2 -12.7500 (0.6575) 3 -12.5000 (0.6480) 4 -12.2800 (0.6396) 5 -11.9100 (0.6670) 6 -11.0200 (0.6985)

C-78 Guidelines for the Development and Application of Crash Modification Factors The parameter estimates are shown in the reproduced Table C51. The loglikelihood for this model is 443. There are several GOF measures available to help in selecting a preferred model. Dispersion Parameter The value of the dispersion parameter describes the relationship between the estimated mean and variance of the predicted crash rate. A smaller dispersion parameter indicates a better GOF. By the measures the exponential model is slightly improved with a value of 0.3583 compared to 0.3701 for the linear model. Precision of Parameter Estimates The parameter estimates of the exponential model are more precise showing a higher level of statistical significance. All parameter estimates for this model are statistically significant at the 95% confidence level. For the linear model, the parameter c is statistically significant at the 93% level (p-value of 0.07) and the d parameter at the 88% confidence level (p-value of 0.12). Akaike Information Criterion (AIC) The AIC value can be used to compare two or more models for their accuracy compared to each other. Smaller values of AIC indicated a better model fit. The AIC is calculated as Equation C59AIC 2 K 2 loglikelihood= × − × Angle Group αanggrp b c Dispersion parameter 1 -7.5284 (0.5777) 0.8255 (0.0548) -0.0613 (0.0213) 0.3583 (0.0618) 2 -7.4863 (0.5651) 3 -7.2097 (0.5530) 4 -6.9833 (0.5542) 5 -6.6179 (0.5571) 6 -5.6999 (0.4973) Table C50. Combined data exponential Model 1. Angle Group αanggrp b c d Dispersion parameter 1 -12.3500 (0.8306) 0.8381 (0.0561) 127.3000 (70.4300) -5.3630 (3.4530) 0.3701 (0.0510) 2 -12.3100 (0.8162) 3 -12.0200 (0.8080) 4 -11.7800 (0.7968) 5 -11.4100 (0.8097) 6 -10.4800 (0.8275) Table C51. Combined data linear Model 1.

Guidelines for Developing Crash Modification Functions C-79   where K = the number of parameters estimated in the model loglikelihood = the loglikelihood calculated for the model When determining the value of K, the number of parameters estimated in the model, if a negative binomial model is applied the overdispersion parameter is included in the count of estimated parameters. The loglikelihood is calculated by taking the natural logarithm of the likelihood function. The likelihood function of a fitted model is equal to the product of the probability of each observed dependent variable being equal to that observed value given the estimated parameters of the model. A larger value for the likelihood function, and loglikelyhood, indicates a better model fit. The AIC value for the exponential model is –1,084 and is –866 for the linear model. By this measure the exponential model is preferred. Cumulative Residual (CURE) Plots To evaluate the predictive performance of the two alternate models over the range of values of AADT and curve radius CURE plots were developed. Table C52 provides the max- imum absolute CURE deviation and percentage of observations exceeding the 95% confi- dence limits. By these measures the two models perform similarly although the exponential model does perform slightly better in terms of the maximum absolute CURE deviation. Fig- ure C29 through Figure C32 provide each CURE plot which do appear quite similar between the two models. Model Variable Maximum Absolute CURE Deviation % CURE Deviation Exponential AADT 39.53 6 Exponential Radius/1000 81.36 17 Linear AADT 43.50 6 Linear Radius/1000 87.06 17 Table C52. CURE plot results. Figure C29. CURE plot for Exponential Model and AADT.

C-80 Guidelines for the Development and Application of Crash Modification Factors Figure C30. CURE plot for Exponential Model and Radius/1000. Figure C31. CURE plot for Linear Model and AADT. Figure C32. CURE plot for Linear Model and Radius/1000.

Guidelines for Developing Crash Modification Functions C-81   Recommended Model Form Although the two models perform similarly for each of the goodness-of-fit measures, the exponential model in Equation C60 is consistently preferred even though the differences are not very large. Crashes per year exp AADT expb c Ranggrp Equation C601000= α × where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic avgangle = average deflection angle for the angle group R = radius in feet The authors tried to enhance Equation C60 by allowing the c parameter to vary by other explanatory variables in a multilevel fashion, including lane width and shoulder width under the hypothesis that the impact of increasing the curve radius may be greater on roadways with narrow lane and/or shoulder widths. However, no clear trend was observed. 6 Discussion of Results The purpose of the case study was to illustrate a heuristic methodology a researcher may follow to derive a CMFunction from cross-sectional regression analysis. The objective was to explore the development of a CMFunction that would be applicable in the scenario where a road designer is contemplating increasing the radius (i.e., flattening the curve) for an existing horizontal curve. In this scenario, the deflection angle of the curve is fixed. To assess the entire impact on safety of flattening a curve it was necessary to consider a study area that extends beyond the limits of the smaller radii curve. Limitations of this illustration include that only isolated curves, defined as those with no other curve within 0.5 miles, were used, as well that no curves with a deflection angle greater than 50 degrees were used. As such, the CMFunctions developed should be considered for applica- tion with caution and understanding these limitations. The results produced were intended to illustrate a process as opposed to developing a robust CMFunction. The effect of radius on crashes is addressed elsewhere, including Hauer1, who considered two alternate assumptions: where crash rate is constant along a curve and increases with a smaller radius; and, where crashes are due to a curve are associated with the curve entry and exit. A linear assumption between crashes and degree of curve (another measure of sharpness of curvature) was assumed. Hauer’s paper also discusses the impact of tangent length on expected crashes. The Highway Safety Manual, First Edition chapter for two-lane rural roads includes a CMF for curvature. The base condition is a tangent and the CMF for a curve calculated as: CMF L R S L Equation C61 1.55 80.2 0.012 1.55 ( ) ( ) ( )= +     − where L = length of horizontal curve in miles, including spiral transitions R = radius of curve in feet S = 1 if spiral transition present; 0 if not; 0.5 if present only on one end of curve 1 E. Hauer. (1992). Safety and the Choice of Degree of Curve. Transportation Research Record: Journal of the Transportation Research Board, No. 1665, pp. 22, 1999.

C-82 Guidelines for the Development and Application of Crash Modification Factors Two issues were identified as important considerations: Issue 1 Minimizing Confounding Factors Issue 2 Determining the Appropriate Model Form To handle the confounding factor of deflection angle sites were group by similar values of this variable. Based on the data and previous research, two alternate assumptions of model form were pursued, one with an exponential relationship between crashes and radius and the second with a linear relationship. It is important to acknowledge that the model form selected for the relationship between radius and crashes impacts the nature of the CMFunction to be estimated. For the exponential assumption in Equation C46, as the difference between the smaller and larger curve radii increases, the CMF gets smaller. This CMF value is dependent on the differ- ence between the two radii but not the values of the radii themselves. Therefore, the CMF for changing from a radius of 250ft to 500ft will be the same as changing from 500 ft to 750 ft. While the CMF is dependent on the differences between curves but not the actual value of radii, the model formulation in Equation C46 does mean that the predicted frequency of crashes reduced by curve flattening is dependent on both the differences in radii and the radii values and that the frequency reduced will be greater for smaller radii curves. For the linear assumption in Equation C47, the CMF is dependent on the values of the two radii, not just their difference. This implication of Equation C47 is that the CMF becomes smaller at larger radii when the difference between the two radii is constant. Therefore, the value of the CMF for changing from 250ft to 500ft will be larger than for changing from 750 ft to 1000 ft. While the CMF is dependent on both the difference between radii and the values of the radii, the frequency of crashes reduced is dependent only on the differences in radii between the two curves but not the actual values of the radii. Since the separate modeling of angle groups produced few statistically significant estimates of the parameter associated with radius, the data were combined. Using the combined data and comparing the exponential and linear assumptions for the relationship between radius and crashes, the two models performed similarly for several goodness-of-fit measures, although the exponential model in Equation C62 was consistently preferred. Crashes per year exp AADT expb c Ranggrp Equation C621000= α × where αanggrp = intercept term specific to each angle group AADT = average annual daily traffic R = radius in feet Using Equation C62 the CMF for a contemplated curve flattening is determined by: CMF = exp−0.0613(R2/1000−R1/1000) where R2 and R1 are the flattened and original radii in feet Case Study 4 Safety Effects of Left- and Right-Turn Lanes on Major Roads at Three-Legged Stop-Controlled Intersections Preamble This is one of a series of four case studies to demonstrate the proposed guidelines for develop- ing CMFunctions from either cross-sectional data or before-after data from actual safety treat- ment applications. The purposes of this specific case study are to:

Guidelines for Developing Crash Modification Functions C-83   1. Illustrate a heuristic methodology a researcher may follow to derive a CMFunction from cross-sectional regression analysis. 2. Illuminate some of the considerable issues and challenges that may be encountered in the process and, in so doing, address what it may take for future researchers to resolve them. The following list of potential issues identifies, in bold text, the ones that are highlighted in this case study: – Selection of candidate influencing variables (Section 1.4) – Accounting for interactions among candidate variables (Section 1.8) – Bias due to aggregation, averaging or incompleteness in data – Functional form for effects of independent variables (Section 1.2) – Considerations in choosing between linear or nonlinear forms – Application of hierarchical modeling – Tools for assessing model fit and choosing among, or amalgamating information from competing models (Section 1.4) – Including estimates from previous studies in the estimation methodology through Full Bayes methods (Section 1.5) – Addressing the co-linearity among explanatory variables – Addressing endogeneity – Estimating precision of CMFs from CMFunctions derived 3. Illustrate the considerable data requirements for estimating a robust CMFunction so that future research planning will endeavor to assemble appropriate datasets. Given this scope, the purpose is not to derive robust CMFunctions, although that would have been a bonus objective worth achieving. In any case, as highlighted in the demonstration, estimating CMFs from cross-sectional data is difficult at best. Indeed, the best, and perhaps only defendable way forward in deriving CMFunctions for some treatments may well be the applica- tion of quasi randomized trials or propinquity designs that approach such trials, for which the case is made in Appendixes F and G. 1 Introduction The treatment of interest is the addition of left- and/or right-turn lanes on the major road of 3-legged stop-controlled intersections on two-lane rural roadways. The objective is to explore the development of CMFunctions that would relate the CMF(s) to site characteristics such as major-road AADT. The addition of turning lanes is an operational/safety treatment that has been studied before. For the purposes of this illustration of CMFunction development, a full literature review is not warranted. However, to corroborate the results to some extent as well as to investigate the effects of including previous knowledge in estimating CMFs, we consider the relevant CMFs from the Highway Safety Manual, First Edition. The HSM includes the CMFs shown in Table C53 below in the chapter for rural two-lane roads for intersection related crashes. The CMFs apply to turning lanes installed on the major road, which is not controlled by a stop sign. 2 Identified Issues There were several issues identified at the outset of the CMFunction development analysis that needed to be considered. Left-Turn Lanes Right-Turn Lanes Number of Legs One approach Two approaches One approach Two approaches 3 0.56 0.31 0.86 0.74 Table C53. CMFs for turn lanes from HSM 1st Edition.

C-84 Guidelines for the Development and Application of Crash Modification Factors Issue 1 Minimizing Confounding Factors One of the most difficult aspects of developing CMFunctions through cross-sectional data is to minimize the differences between sites in variables that affect crash risk other than the variable(s) of interest. Ideally, the only difference between the two groups would be in the variable(s) under study. In observational data this ideal is difficult, if not impossible, to achieve. Regression model- ing seeks to account for these other factors that vary between sites. Minimizing the number of confounding factors that the modeling needs to control for will increase the chance of success. This can be done by including sites that do not have other safety treatments applied that are not of interest. This may be achieved in the so-called “propinquity” approach that has been sug- gested in Appendixes F and G as a step-down alternative to fully randomized trials. However, as noted this becomes more challenging in CMFunction development since there is a need to know and separate variables that impact the CMF which still need to be included, from those that may still affect safety but do not impact the CMF. Another method is the application of the propensity score (see Section 1.1) to attempt to ensure the similarity of treatment and control sites. Appendix F gives substantial coverage to the application of this technique. Issue 2 Selection of Model Form In developing a cross-sectional model, the appropriate model form defines the relationship between explanatory variables and crashes. If the model form is misspecified, then CMF estimates will be incorrect. The model form selected should be based on both logical considerations and evidence provided by exploratory analysis of the data. A final selection may also consider goodness-of-fit statistics for competing models. Issue 3 Consideration of Previous CMF Estimates Typically, researchers tend to calibrate statistical models solely using the data at hand. While previous knowledge may inform the choice of model form selected, the parameter estimates of existing models are typically ignored. For situations where the sites and treatments are similar this disregard for previous knowledge may be questionable. For small datasets, the inclusion of previous knowledge may result in more accurate models and CMFunction estimates. 3 Data The dataset used for the illustration consists of 3,543 three-legged stop-controlled intersections on rural two-lane roadways. These data were used for FHWA Report FHWA-RD-03-0372 that attempted to validate and recalibrate the crash prediction models for intersections in the HSM two-lane rural road chapter. The data includes sites from California, Minnesota, and Georgia. Table C54 provides descriptive statistics for the known variables shared by all sites in the dataset. These variables include: Crashes per year. The number of observed intersection crashes per year MAJAADT. Average daily traffic on major road (vehicles per day). MINAADT. Average daily traffic on minor road (vehicles per day). LTMAJ. 1 if left-turn lane exists on at least one approach of major road, 0 otherwise. LTMIN. 1 if left-turn lane exists on at least one approach of minor road, 0 otherwise. MEDIAN. 1 if median exists on major road, 0 otherwise. RTMAJ. 1 if right-turn lane exists on major road, 0 otherwise. RTMIN. 1 if right-turn lane exists on minor road, 0 otherwise. 2 Washington, S., B. Persaud, J. Oh., and C. Lyon (2002). Validation and Recalibration of Accident Prediction Models for Rural Intersections. Federal Highway Administration Final Report FHWA-RD-03-037. September 2002.

Guidelines for Developing Crash Modification Functions C-85   4 CMFunction Development by Regression Modeling The first step in the analysis was to estimate constant CMFunctions and explore whether the sites used should be constricted in some way to account for confounding factors prior to estimating the model. This is the focus of Chapter 3.1. In Chapter 3.2 the development of CMFunctions is explored. In Chapter 3.3 an illustration of including previous CMF estimates in the modeling framework is undertaken. 4.1 Estimation of Constant CMFs This first step was to estimate a basic model accounting for traffic volume exposure and the three possible treatment combinations. The model was estimating using generalized linear modeling with a negative binomial assumption for the error distribution. The model form is shown in Equation C63 and the parameter estimates in Table C55 along with the CMF estimates derived for the three treatment options. Crashes years exp MAJAADT MINAADT expa b c d LTMAJ e RTMAJ f LTMIN and RTMIN Equation C63 = × × × × ( )× + × + × The results in Table C55 are intuitive in that higher major and minor road volumes are expected to result in more crashes and that provision of turning lanes is expected to reduce crashes. The CMFs imply that the presence of a left- or right-turn lane on the major road both reduce crashes by approximately 23% and that the addition of both by approximately 31%. It is interesting to observe that the result for both left- and right-turn lanes is a smaller reduc- tion than if the individual CMFs for left- and right-turn lanes were to be multiplied (0.7656 × 0.7681 = 0.5880, a reduction of 41%) which is the current approach taken in the HSM for apply- ing multiple CMFs. While the results thus far appear reasonable, it is of interest to see if the results would change if more care were taken to control for possible confounding factors. The same basic model was fit but now the dataset excludes any sites that have a left- or right- turn lane on the minor road. Table C56 shows the parameter estimates for this model. The parameter estimates and derived CMFs are very close to those in Table C55. Variables Frequency Mean Median Minimum Maximum Crashes per year 3,534 0.26 0.13 0.00 6.38 MAJAADT 3,534 5,285 3,840 50 35,750 MINAADT 3,534 273 101 1 10,001 RTMAJ Total 0 1 3,534 3,203 (90%) 340 (10%) N/A RTMIN Total 0 1 3,534 3,481 (98%) 62 (2%) N/A LTMAJ Total 0 1 3,534 2,978 (84%) 565 (16%) N/A LTMIN Total 0 1 3,534 3,528 (99.6%) 15 (0.4%) N/A MEDIAN Total 0 1 3,534 3,423 (97%) 120 (3%) N/A N/A: not applicable Table C54. Summary statistics.

C-86 Guidelines for the Development and Application of Crash Modification Factors It was also attempted to separately estimate CMFs for LTMAJ, RTMAJ, and the combined LTMAJ and RTMAJ treatments by eliminating sites with any other turn lanes on the major or minor road. For example, for estimating the CMF for LTMAJ, the data set included sites with no turning lanes at all or sites with a left-turn lane on the major road but no other turn lanes on the major or minor road. These CMFs, shown below, are consistent with the previous estimates. • an LT lane (0.7478) • an RT lane (0.7480) • both an LT and RT lane (0.6766) With a similar goal, the propensity score (Chapter 1.1 and Appendix G) was explored to ensure the treatment and comparison sites are similar. Variables considered in the propensity score model included MAJAADT, MINAADT, State, LTMIN, and RTMIN. The results of the Parameter Parameter Estimate (SE) CMF Estimate a -10.6913 (0.2333) b 0.8697 (0.0264) c 0.3822 (0.0150) d -0.2671 0.7656 (0.0592) e -0.2639 (0.0868) 0.7681 f -0.3711 (0.0938) 0.6900 dispersion 0.6185 (0.0325) AIC 11,610 BIC 11,653 Abbreviation: SE, standard error. Table C55. Constant CMFs estimated using all sites. Parameter Parameter Estimate (s.e.) CMF Estimate a -10.7164 (0.2365) b 0.8756 (0.0267) c 0.3769 (0.0153) d -0.2958 (0.0604) 0.7439 e -0.3103 (0.0902) 0.7332 f -0.4029 (0.1012) 0.6684 dispersion 0.6189 (0.0331) AIC 11,610 BIC 11,653 Table C56. Constant CMFs estimated using sites with no minor road turn lanes.

Guidelines for Developing Crash Modification Functions C-87   propensity score indicated a very strong overlap of probability of treatment between the treat- ment and comparison sites indicating that there is no site selection bias to be considered. Based on the results of the propensity score and CMFs estimated after further restricting the data as to account for potential confounding factors, it was concluded that the results exhibited no significant change and that all sites could be combined for further model estimation. 4.2 Estimation of CMFunctions Now it is of interest to see if the expected CMFs vary by site characteristics by developing CMFunctions. To consider a site characteristic variable it must be shared by all sites. The only variables shared amongst the data, other than the treatment variables themselves are MAJAADT, MINAADT, LTMIN, RTMIN and Median. It seems reasonable to begin the investigation using the MAJAADT and MINAADT variables since traffic exposure is the main influence on crash risk and that one might expect the CMF to be smaller at higher traffic volumes. Two alternate forms were considered for this relation- ship building upon Equation C63, as demonstrated for the parameter for LTMAJ and only MAJAADT impacting the estimate of the CMF: Alternative a) CMF = exp (β1 + β2 × MAJAADT) × LTMAJ and, Alternative b) CMF = expexp(β1 + β2 × ln (MAJAADT)) × LTMAJ) In exploratory modeling it was found that the models for Alternative b) did not converge and so the form of Alternative a) was adopted. The model to be estimated is set up in a hierarchical structure as: Crashes years exp MAJAADT MINAADT expa b c d LTMAJ e RTMAJ f LTMIN and RTMIN Equation C64 = × × × × ( )× + × + × where: d = β1 + β2 × MAJAADT + β3 × MINAADT e = β4 + β5 × MAJAADT + β6 × MINAADT f = β7 + β8 × MAJAADT + β9 × MINAADT Table C57 shows the estimated parameters. For LTMAJ, the parameter estimate for MAJAADT is not statistically significant but does indicate that the CMF decreases as MAJAADT increases. A statistically significant estimate for β3 indicates that the CMF increases as minor road AADT increases. For RTMAJ, the parameter estimates for MAJAADT and MINAADT are not statistically significant but do indicate an increasing CMF as AADT increases. For sites having both a left- and right-turn lane on the major road, both parameter estimates are statistically significant at the 90th percentile limit or higher and indicate that the CMF decreases with increasing MAJAADT and increases with increasing MINAADT. The model from Equation C64 is now re-estimated eliminating the parameters that are not statistically significant at the 90th percentile or better and the following model is estimated. Crashes years exp MAJAADT MINAADT expa b c d LTMAJ e RTMAJ f LTMIN and RTMIN Equation C65 = × × × × ( )× + × + ×

C-88 Guidelines for the Development and Application of Crash Modification Factors where d = β1 + β3 × MINAADT e = β4 f = β7 + β8 × MAJAADT + β9 × MINAADT The parameter estimates are shown in Table C58. The relationships hold in that the CMF for LTMAJ increases with increasing MINAADT, the CMF for RTMAJ is constant and the CMF for LTMAJ+RTMAJ decreases with increasing MAJAADT and increases for increasing MINAADT. It is also of interest to look at other potential variables impacting the expected CMF value. These include Median, LTMIN and RTMIN. Models were attempted by adding these variables to the prediction of parameters d, e, and f from Equation C65 one at a time. To summarize the findings: • Median is a significant variable for the CMF for RTMAJ indicating that the CMF is 1.0 if a median is present • There was no statistically significant effect for LTMIN for any of the CMFs • RTMIN showed a statistically significant effect for the CMFs for LTMAJ and RTMAJ indicating that the CMFs are larger when right-turn lanes are present on the minor road A model estimated with all the statistically significant variables is: Crashes years exp MAJAADT MINAADT expa b c d LTMAJ e RTMAJ f LTMIN and RTMIN Equation C66 = × × × × ( )× + × + × Parameter Estimate (SE) p-value a -10.6796 (0.2570) <0.0001 b 0.8808 (0.0291) <0.0001 c 0.3596 (0.0164) <0.0001 β1 -0.2713 (0.1043) 0.0094 β2 -0.000006 (0.000008) 0.4310 β3 0.000119 (0.000049) 0.0145 β4 -0.3413 (0.1619) 0.0351 β5 0.000011 (0.000025) 0.6516 β6 0.000091 (0.000144) 0.5300 β7 -0.3134 (0.1762) 0.0754 β8 -0.00003 (0.000015) 0.1022 β9 0.000187 (0.000072) 0.0095 dispersion 0.6075 (0.0322) <0.0001 AIC 11,604 BIC 11,685 Table C57. Parameter estimates for Equation 2.

Guidelines for Developing Crash Modification Functions C-89   where: d = β1 + β2 × MINAADT + β3 × RTMIN e = β4 + β5 × RTMIN + β6 × Median f = β7 + β8 × MAJAADT + β9 × MINAADT The parameter estimates are shown in Table C59. The results in Table C59 are surprising as they indicate that the safety benefits of LTMAJ and RTMAJ are negated (i.e., CMF is greater than 1) if a right-turn lane is present on the minor road. It is not clear what mechanism could be behind such a finding. Such a finding is dubious as there is no reason to believe that adding a turn lane on the major road would increase crashes if a right-turn lane already existed on the minor road. For presence of median, it appears that the benefit of a right-turn lane on the major road is negated when a median is present. 4.3 Illustrating the Use of Prior Information There are no known CMFunctions for the addition of major road turn lanes. However, it was of interest to illustrate how the inclusion of previous information can be accomplished when calibrating statistical models. To do so, the model from Equation C63 that estimates constant CMFs is used. Crashes years exp MAJAADT MINAADT expa b c d LTMAJ e RTMAJ f LTMIN and RTMIN= × × × × ( )× + × + × For this illustration, the HSM CMFs from Table C53 are considered. To do so we will adopt the CMFs for a single left-turn or right-turn on the major road. To get the parameter estimate for the model the natural logarithm is taken of the CMF. For the presence of both a left- and right-turn lane on major road the CMFs for each individual treatment are multiplied to estimate a CMF for the dual treatment. Parameter Estimate (SE) p-value a -10.6555 (0.2381) <0.0001 b 0.8769 (0.0269) <0.0001 c 0.3616 (0.0160) <0.0001 β1 -0.3381 (0.0660) <0.0001 β3 0.000116 (0.000048) 0.0167 β4 -0.2400 (0.0865) 0.0055 β7 -0.3171 (0.1761) 00718 β8 -0.00002 (0.000015) 0.1075 β9 0.000186 (0.000072) 0.0100 Dispersion 0.6078 (0.0322) <0.0001 AIC 11,600 BIC 11,661 Table C58. Parameter estimates for Equation 3.

C-90 Guidelines for the Development and Application of Crash Modification Factors The modeling was performed using the Winbugs software to apply full Bayes MCMC estima- tion techniques. Each parameter in the model is assigned an assumed distribution, mean and variance. The parameter estimates for a, b, and c were given an initial mean value of 0 while the parameter estimates for d, e, and f were given the initial means from Table C60 of −0.58, −0.15 and −0.73 respectively. All parameters are assumed to follow the normal distribution. The first model uses non-informative priors for the existing CMF parameter estimates. This means that the assumed variance for the parameter is large. In this case a variance of 10 was selected. The estimated model is shown in Table C61. In examining the difference in the parameter estimates and CMFs estimated with and without the use of priors it is evident that very little has changed. In fact, the differences likely reflect some natural variation, in that the parameter estimates would not be the same for any two MCMC simulations. That the use of priors had little impact is not surprising given that the assumed variance for parameters e, f, and g was large. What this means is that the prior has little influence on the parameter estimate compared to the data used for modeling. Parameter Estimate (SE) p-value a -10.6289 <0.0001 (0.2376) b 0.8735 (0.0268) <0.0001 c 0.3619 (0.0160) <0.0001 β1 -0.3527 (0.0662) <0.0001 β2 0.000113 (0.000048) 0.0186 β3 0.4729 (0.2646) 0.0740 β4 -0.3311 (0.0905) 0.0003 β5 0.7001 (0.2973) 0.0186 β6 1.2869 (0.6139) 0.0361 β7 -0.3194 (0.1755) 0.0689 β8 -0.00002 (0.000015) 0.1114 β9 0.000186 (0.000072) 0.0097 Dispersion 0.6010 (0.0320) <0.0001 AIC 11,593 BIC 11,673 Table C59. Parameter estimates for Equation C24. Left-Turn Lanes Right-Turn Lanes Left- and Right-Turn Lanes Number of Legs One approach CMF One approach CMF One approach CMF One approach Parameter One approach CMF One approach Parameter 3 0.56 -0.58 0.86 -0.15 0.48=0.56x0.86 -0.73 Table C60. Prior information estimates.

Guidelines for Developing Crash Modification Functions C-91   In the next model more informative priors (i.e., priors with less variance) are applied. This time a variance of 0.001 (standard error of 0.03) is assumed for the parameters e, f, and g. All other parameters continue to use uninformative priors. The parameter estimates in Table C62 show that parameters for e, f, and g are estimated to be very close to those in the priors shown in Table C60. This appears to be another extreme in that the specified priors were given a variance that is so small that the influence on the prior to the final estimate is too large. The choice of variance for the prior is critical to balancing the influence between the observed data and previous knowledge. As another illustration of using prior information, a randomly selected subset of 353 out of 3,534 sites was used to develop the model. Using this subset and using uninformative priors resulted in the model shown in Table C63. Using the subset with uninformative priors results in parameter estimates for d, e, and f with high standard errors and estimates completely different than when the full dataset is used. It is Parameter Estimate with Priors (SE) CMF Estimate with Priors Estimate Without Priors (SE) CMF Estimate without Priors a -10.7100 (0.1470) -10.6913 (0.2333) b 0.8706 (0.0161) 0.8697 (0.0264) c 0.3851 (0.0147) 0.3822 (0.0150) d -0.2700 (0.0575) 0.7634 -0.2671 (0.0592) 0.7656 e -0.2691 (0.0864) 0.7641 -0.2639 (0.0868) 0.7681 f -0.3744 (0.0954) 0.6877 -0.3711 (0.0938) 0.6900 dispersion 0.6200 (0.0306) 0.6185 (0.0325) Table C61. Comparison of non-informative priors with conventional estimates. Parameter Estimate with Priors (SE) CMF Estimate with Priors a -11.0500 (0.1766) b 0.9056 (0.0205) c 0.4038 (0.0144) d -0.5146 (0.0279) 0.5977 e -0.1729 (0.0301) 0.8412 f -0.6989 (0.0301) 0.4971 dispersion 0.6337 (0.0315) Table C62. Comparison of non-informative priors with conventional estimates.

C-92 Guidelines for the Development and Application of Crash Modification Factors notable that the parameter estimates for a, b and c are similar between the subset data and full data which may be expected because MAJAADT and MINAADT are the largest predictor of crash risk and we may expect the constant term to be stable. Now the model using the same subset is estimated but including the parameter estimates that were estimated using the full dataset. For these the estimated means and variances from the full dataset are used to define the priors for parameters e, f, and g. The results in Table C64 show that the CMF estimates are quite close to the prior. Again, this illustrates the usefulness of including prior information but that the choice of variance for the prior has a very large impact on the results. How to determine the appropriate variance is unclear. Some further discussion is provided in Section 5. 5 Discussion of Results The purpose of the case study was to illustrate a heuristic methodology a researcher may follow to derive a CMFunction from cross-sectional regression analysis. The results produced were intended to illustrate a process as opposed to developing a robust CMFunction. Parameter Estimate with Subset and Non- Informative Priors (SE) CMF Estimate with Subset and Non- Informative Priors Estimate with Full Data and Uninformative Priors (SE) CMF Estimate with Full Data and Uninformative Priors a -10.3000 (0.6736) -10.7100 (0.1470) b 0.8284 (0.0753) 0.8706 (0.0161) c 0.3727 (0.0460) 0.3851 (0.0147) d 0.0215 (0.1865) 1.0217 -0.2700 (0.0575) 0.7634 e 0.0086 (0.2510) 1.0086 -0.2691 (0.0864) 0.7641 f -0.2106 (0.2694) 0.8101 -0.3744 (0.0954) 0.6877 dispersion 0.6337 (0.0850) 0.6200 (0.0306) Table C63. Parameter estimates for subset versus full data. Table C64. Parameter estimates for subset with informative priors. Parameter Estimate with Priors (SE) CMF Estimate with Priors Estimate with Full Data and Uninformative Priors (SE) CMF Estimate with Full Data and Uninformative Priors a -10.7500 (0.4832) -10.7100 (0.1470) b 0.8707 (0.0537) 0.8706 (0.0161) c 0.4046 (0.0438) 0.3851 (0.0147) d -0.2382 (0.0557) 0.7880 -0.2700 (0.0575) 0.7634 e -0.2467 (0.0796) 0.7814 -0.2691 (0.0864) 0.7641 f -0.3637 0.6951 -0.3744 0.6877 (0.0880) (0.0954) dispersion 0.6211 (0.0837) 0.6200 (0.0306)

Guidelines for Developing Crash Modification Functions C-93   The development of CMFunctions showed some success. The model estimated for Equation C64 appears to be most intuitive. In this model, the CMF for LTMAJ increases with increasing MINAADT, the CMF for RTMAJ is constant and the CMF for both a left- and right-turn lane on the major road decreases with increasing MAJAADT and increases with increasing MINAADT. While the model for Equation C66 does produce statistically significant estimates for RTMIN and Median, it is not clear why those relationships would be valid. These relationships indicated that the safety benefits of LTMAJ and RTMAJ are negated (i.e., CMF is greater than 1) if a right-turn lane is present on the minor road and that for presence of median, it appears that the benefits of a right-turn lane on the major road is negated when a median is present. To compare the CMFunctions from Equation C65 to the constant CMFs from Equation C63 several goodness-of-fit measures can be examined. Akaike Information Criterion (AIC) and Schwarz Bayesian Information Criterion (BIC) The AIC and BIC values can be used to compare two or more models for their accuracy com- pared to each other. Smaller values indicate a better model fit. The AIC is calculated as: Equation C67AIC 2 K 2 loglikelihood= × − × The BIC is calculated as: Equation C68BIC 2 loglikelihood K log numobs( ) ( )= − + × where K = the number of parameters estimated in the model loglikelihood = the loglikelihood calculated for the model numobs = number of observations in the data The AIC for constant CMFs is 11,610 and is 11,600 for the CMFunction, not a very large dif- ference but with a preference for the CMFunction model. The BIC for constant CMFs is 11,653 and is 11,661, again not a very large difference with a preference for the constant CMF model. Dispersion Parameter The value of the dispersion parameter describes the relationship between the estimated mean and variance of the predicted crash rate. A smaller dispersion parameter indicates a better good- ness of fit. The dispersion parameters for the constant CMF model and CMFunction model are virtually identical at 0.6185 and 0.6078 respectively. The illustration of the use of prior knowledge shows that it can be a powerful tool, particularly for small databases. However, its application is highly dependent on the variance assumed for prior parameter estimates. How to estimate the mean and variance for the prior is a critical deci- sion but guidelines are lacking. A reasonable start would seem to be conducting a meta-analysis to derive a mean and variance and then evaluate the sensitivity of the estimation based on vary- ing the assumed variance of the prior(s).

C-94 Aul, N. and G. Davis. 2006. Use of Propensity Score Matching Method and Hybrid Bayesian Method to Estimate Crash Modification Factors of Signal Installation. Transportation Research Record 1950 pp. 17–23. Austin, R. D. and J. L. Carson. 2002. An alternative accident prediction model for highway-rail interfaces. Accident Analysis and Prevention Volume 34 Number 1 pp. 31–42. Bahar, G., M. Parkhill, E. Hauer, F. Council, B. Persaud, C. Zegeer, R. Elvik, A. Smiley, and B. Scott. 2007. NCHRP Project 17-27: Prepare Parts I and II of the Highway Safety Manual. iTRANS Consulting, Ltd. (Appendix A). Bauer, K. and D. Harwood. 2014. Safety Effects of Horizontal Curve and Grade Combinations on Rural Two- Lane Highways. Publication No. FHWA-HRT-13-077. Federal Highway Administration. Bonneson, J. and M. Pratt. 2008. Procedure for Developing Accident Modification Factors from Cross-Sectional Data. Transportation Research Record 2083, Transportation Research Board, Washington, D.C., pp. 40–48. Bonneson, J., S. Geedipally, M. Pratt, and D. Lord. 2012. Safety Prediction Methodology and Analysis Tool for Freeways and Interchanges. Final Report Project 17-45. Transportation Research Board. Accessible at http://onlinepubs.trb.org/onlinepubs/nchrp/docs/NCHRP17-45_FR.pdf. Carson, J. and F. Mannering. 2001. The effect of ice warning signs on accident frequencies and severities. Accident Analysis and Prevention Volume 33 Number 1 pp. 99–109. Chen, Y. and B. Persaud. 2014. Methodology to develop crash modification functions for road safety treat- ments with fully specified and hierarchical models. Accident Analysis and Prevention Volume 70 Number 1 pp. 131–139. Christensen, P. and R. Elvik. 2007. Effects on accidents of periodic motor vehicle inspection in Norway. Accident Analysis and Prevention Volume 39 Number 1 pp. 47–52. de Leeuw, J., and I. G. G. Kreft. 1995. Questioning multilevel models. Journal of Educational and Behavioral Statistics 20 (2) pp. 171–189. Egger, M., G. D. Smith and D. G. Altman, eds. 2001. Systematic reviews in health care. Meta-analysis in context. BMJ publishing group, London, UK. Elvik, R., F. H. Amundsen, and F. Hofset. 2001. Road Safety Effects of Bypasses. Transportation Research Record 1758 pp. 13–20. Elvik, R. 2003. Effects on road safety of converting intersections to roundabouts: review of evidence from non-U.S. studies. Transportation Research Record 1847. Elvik, R. 2005. Introductory Guide to Systematic Reviews and Meta-Analysis. Transportation Research Record 1908. Elvik, R. 2009. Developing accident modification functions. Exploratory study. Transportation Research Record 2103 pp. 18–24. Elvik, R. 2011B. Assessing causality in multivariate accident models. Accident Analysis and Prevention Volume 43 Number 2 pp. 253–264. Elvik, R. 2013. International transferability of accident modification factors on horizontal curves. Accident Analysis and Prevention Volume 59 Number 1 pp. 487–496. Elvik, R. 2015. Methodological guidelines for developing accident modification functions. Accident Analysis and Prevention Volume 80 pp. 26–36. Fridstrøm, L., J. Ifver, S. Ingebrigtsen, R. Kulmala, and L. K. Thomsen. 1995. Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts. Accident Analysis and Prevention Volume 27 Number 1 pp. 1–20. Goldstein, H. 1995. Multilevel Statistical Models, second ed. Edward Arnold, London, UK. Hauer, E. 2004. Statistical road safety modeling. Transportation Research Record 1897 pp. 81–87. Hauer, E. 2010. Cause, effect and regression in road safety: A case study. Accident Analysis and Prevention Volume 42 Number 4 pp. 1128–1135. References

Guidelines for Developing Crash Modification Functions C-95   Hauer, E. 2015. The Art of Regression Modeling in Road Safety. Springer International Publishing, Switzerland. Hauer, E. and J. Bamfo. 1997. Two Tools for Finding What Function Links the Dependent Variable to the Explanatory Variables. Published in Proceedings of ICTCT 97 Conference, 1997, Lund, Sweden. Jones, A. P. and S. H. Jørgensen. 2003. The use of multilevel models for the prediction of road accident outcomes. Accident Analysis and Prevention Volume 35 Number 1 pp. 59–69. Jonsson, T. 2005. Predictive Models for Accidents on Urban Links: A Focus on Vulnerable Road Users. Depart- ment of Technology and Society, Lund Institute of Technology. Lund University, pp. 1–142. Karwa, V., A. Slavkovic, and E. Donnell. 2011. Causal inference in transportation safety studies: Comparison of potential outcomes and causal diagrams. The Annals of Applied Statistics Vol. 5, No. 2B, pp. 1428–1455. Kim, D.-G. and S. Washington. 2006. The significance of endogeneity problems in crash models: an examina- tion of left-turn lanes in intersection crash models. Accident Analysis and Prevention Volume 38 Number 6 pp. 1094–1100. Kim, D.-G., Y. Lee, S. Washington, and K. Choi. 2007. Modeling crash outcome probabilities at rural inter- sections: application of hierarchical binomial logistic models. Accident Analysis and Prevention Volume 39 Number 1 pp. 125–134. Koorey, G. 2009. Road Data Aggregation and Sectioning Considerations for Crash Analysis. Transportation Research Record 2103 pp. 61–68. Kumara, S. S. P. and H. C. Chin. 2005. Application of Poisson underreporting model to examine crash frequen- cies at signalized three-legged intersections. Transportation Research Record 1908 pp. 46–50. Liu, W. and J. Cela. 2008. Count Data Models in SAS, SAS Global Forum 2008: Statistics and Data Analysis (Paper 371-2008). Lord, D. 2006. Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accident Analysis and Prevention Volume 38 Number 4 pp. 751–766. Lord, D. and J. A. Bonneson. 2005. Calibration of predictive models for estimating the safety of ramp design configurations. Transportation Research Record 1908 pp. 88–95. Lord, D. and F. Mannering. 2010. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A 44 pp. 291–305. Ma, J. 2009. Bayesian analysis of underreporting Poisson regression model with an application to traffic crashes on two-lane highways. Paper #09-3192. Presented at the 88th Annual Meeting of the Transportation Research Board, Washington, DC. Maher, M. J. and I. Summersgill. 1996. A comprehensive methodology for the fitting predictive accident models. Accident Analysis and Prevention Volume 28 Number 3 pp. 281–296. Maycock, G. and R. D. Hall. 1984. Accidents at 4-Arm Roundabouts. TRRL Laboratory Report 1120, Transporta- tion and Road Research Laboratory, Crowthorne, UK. Park, P. and F. Saccomanno. 2007. Reducing Treatment Selection Bias for Estimating Treatment Effects Using Propensity Score Method. Journal of Transportation Engineering 133:112–118. Piegorsch, W. W. 1990. Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics 46 (3) pp. 863–867. Pietz, K. 2003. An Introduction to Hierarchical Modeling. Seminar, U.S. Department of Veteran Affairs, Houston, TX <www.hsrd.houston.med.va.gov/Documents/Linear%20Models.ppt> (accessed 05.11.09). Sakshaug, K. 1998. Effekt av overhøyde i kurver: Beskrivelse av datamaterialet. Notat av 2.11.1998. SINTEF, Bygg og miljøteknikk, Trondheim. Sasidharan, L. and E. Donnell. 2013. Application of propensity scores and potential outcomes to estimate effec- tiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data. Accident Analysis and Prevention 50, pp. 539–553. Skinner, C. J., D. Holt, and T. M. F. Smith. 1989. Analysis of Complex Surveys. Wiley, Chichester, UK. Srinivasan, R. and K. Bauer. 2013. A How-to Guidebook for States Developing Jurisdiction-Specific SPFs. Federal Highway Administration Project TPF-5(255). Taylor, M. C., A. Baruya, and J. V. Kennedy. 2002. The relationship between speed and accidents on rural single- carriageway roads. TRL report TRL511. Transport Research Laboratory, Crowthorne, Berkshire, UK. Torbic, D., J. M. Hutton, C. D. Bokenkroger, K. M. Bauer, D. W. Harwood, D. K. Gilmore, J. M. Dunn, J. J. Ronchetto, E. T. Donnell, H. J. Sommer, P. Garvey, B. Persaud, and C. Lyon. 2009. NCHRP Report 641: Guidance for the Design and Application of Shoulder and Centerline Rumble Strips. Transportation Research Board, Washington DC, 2009. Tukey, J. W. 1977. Exploratory Data Analysis. Pearson. Vanlaar, W., D. Mayhew, K. Marcoux, G. Wets, T. Brijs, and J. Shope. 2009. An evaluation of graduated driver licensing programs in North America using a meta-analytic approach. Accident Analysis and Prevention Volume 41 Number 5.

C-96 Guidelines for the Development and Application of Crash Modification Factors Washington, S., M. Karlaftis, and F. Mannering. 2011. Statistical and Econometric Methods for Transportation Data Analysis, Second Edition. Chapman & Hall/CRC, Taylor & Francis Group, Boca Raton FL. Washington, S., J. Leonard, D. G. Manning, C. Roberts, B. Williams, A. R. Bacchus, A. Devanhalli, J. Ogle, and D. Melcher. 2001. Scientific Approaches to Transportation Research Volumes 1 and 2. NCHRP Online Report 20-45, Transportation Research Board, Washington, DC. Available online at: http://onlinepubs.trb.org/ Onlinepubs/nchrp/cd-22/start.htm. Washington S., B. Persaud, C. Lyon, and J. Oh. 2005. Validation of Accident Models for Intersections. Federal Highway Administration, FHWA-RD-03-037, Washington, DC. Wood, G. R. 2002. Generalised linear accident models and goodness of fit testing. Accident Analysis and Pre- vention Volume 34 Number 4 pp. 417–427.

Next: Appendix D - User Guide for CMF Regression Software »
Guidelines for the Development and Application of Crash Modification Factors Get This Book
×
 Guidelines for the Development and Application of Crash Modification Factors
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Crash modification factors (CMF) provide transportation professionals with the kind of quantitative information they need to make decisions on where best to invest limited safety funds.

The TRB National Cooperative Highway Research Program's NCHRP Research Report 991: Guidelines for the Development and Application of Crash Modification Factors describes a procedure for estimating the effect of a proposed treatment on a site of interest.

Supplemental to the report are a CMF regression tool, a CMF combination tool, a slide summary, and an implementation memo.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!