The National Academies Press

Currently Skimming:

Pages 19-27

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 19... ... 3 2 ANALYSIS APPROACHES 2.1 SCOPE OF REPORT We report here two types of crash frequency models by crash type and crash severity. Base condition models are estimated using only sites that meet the "base condition" and include only traffic volume as an explanatory variable; these models support the HSM Part C predictive methodology. Average condition models are estimated using all sites and contain exposure‐related variables, such as average annual daily traffic (AADT) and driveways; they apply for average conditions of non‐exposure variables. For most facility types, we report base condition models to keep these models compatible with the methodology of the current HSM. For a few facility types, we needed to relax some of the base condition definitions to achieve a large enough sample size to estimate significant models. For a few facility types, the total sample size was much smaller, so we had to use all cases to estimate significant models; we report average condition models for these facility types, as well as for the rest of the facility types in Appendix A. This report does not contain probabilistic crash severity models or models that include both exposure and non‐exposure covariates. As will be discussed later, our efforts to estimate these types of models were unsuccessful. This section of the report documents our crash type definitions, our estimation approach for crash count models, our exploration of probabilistic crash severity models, and our exploration of improvements for the model calibration procedure. 2.2 CRASH TYPE DEFINITIONS Crash Types The selection of crash types for which models would be developed was based on several criteria: 1. Read the entire page →
From page 20... ... 4 characteristics, it is not clear how models predicting animal collisions would fit into the model framework. We note the existence of a large body of research into animal–vehicle collisions and suggest that body of work be consulted for consideration of this collision type in safety management procedures. We have defined the crash types shown in Figure 2‐1 to estimate models: Figure 2‐1: General Taxonomy of Crash Types The taxonomy shown in Figure 2‐1 provides for several levels of disaggregation of the crash types according to the number of vehicles involved, their direction of travel, and the manner of the collision. The justification for creating these categories is as follows:  Each crash type within each category involves vehicles colliding in the same way -- that is, front to front, front to rear, front to side, and so on. This results in similar crash severity profiles, as confirmed by Zhang et al. (2007) .  Each crash type within each category is associated with a similar distribution of contributing factors, as assigned by investigating officers (Zhang et al. 2007) Read the entire page →
From page 21... ... 5 turning same direction (TSD) , all intersecting direction (ID) Read the entire page →
From page 22... ... 6 Delineation of Intersection Versus Segment Crashes In the HSM methodology, roadway segment models are used to predict all crashes that occur on portions of roadway segments that are more than 250 feet from an intersection and non‐intersection‐related crashes that occur on portions of roadway segments that are within 250 feet of an intersection. Intersection models are used to predict all intersection and intersection‐related crashes that occur within 250 feet of the intersection. The models for two‐lane rural roads and for urban and suburban and suburban arterials apparently were developed to facilitate this application directly. For multilane rural roads in states where the crash records do not indicate "intersection" or "intersection‐ related," all crashes occurring within 250 feet of the middle of an intersection are assigned to that intersection. The calibration procedure is expected to allow models developed for such cases to be applied to cases specified in the HSM methodology, and vice versa. These models were developed to be as consistent with the HSM methodology as possible. In the Ohio database used for urban and suburban arterials and the California database used for multilane rural roads, however, crashes cannot reliably be identified as intersection or intersection‐related. Thus, the intersection models being developed for those two databases and facility types will pertain to all crashes occurring within 250 feet of the center of an intersection, and the segment models will apply to crashes occurring outside this boundary. As noted previously, the calibration procedure will allow these models to apply to cases where intersection and intersection‐related crashes can be identified in accordance with the HSM methodology. 2.3 MODEL ESTIMATION APPROACH Crash Count Models Because crash frequency is a count phenomenon, negative binomial (NB) regression models, or other count distribution estimation methods, are commonly used to build crash prediction models. Even though the NB model has some limitations (for example, it cannot overcome potential underdispersion problems, and the dispersion parameter may be biased for small sample sizes) Read the entire page →
From page 23... ... 7 during a certain time period) is always positive or zero. Another reason is that taking the log of both sides of the equation results in a linear combination of the predictor variables (that is, the X's) Read the entire page →
From page 24... ... 8 explain the number of crashes (Wang et al. 2017) . For some facility types, other model forms were used; this is explained in detail in the relevant sections below. Model Estimation and Fit Statistics SPFs for all facility types and crash categories were estimated using standard statistical packages, such as SAS®. As indicated above, the negative binomial distribution was used to start. When the negative binomial overdispersion parameter estimated by maximum likelihood (k) Read the entire page →
From page 25... ... 9 Washington et al. (2005) gives guidelines for interpreting fit statistics and evaluating the suitability of crash prediction models. Crash Severity Modeling In general, crashes are classified into five severity levels: fatal injury (K) Read the entire page →
From page 26... ... 10 The fractional split approach is not without limitations. In field data, there are often no crashes for some specific crash severities in a given case -- for example, fatal injury crashes. When this happens, such a segment cannot be used for modeling. To avoid cases with zero crashes for any of the severity levels, the research team aggregated roadway segments into extended super‐segments (or arterials) . To do this, the severity proportions had to be assumed to be consistent over all segments and intersections included in each super‐segment, which was not very practical. In addition, once we aggregated the segments, information specific to them was lost. For these reasons, the research team decided not to adopt the fractional split model for predicting crash severity. Instead, we recommend predicting crash severity using count models, as we do for crash type. 2.4 ESTIMATION AND VALIDATION DATA Estimating crash prediction models for the HSM requires datasets with adequate size, quality and scope of variables. Very few highway agencies have such data readily available. In order to limit the extent of the project budget expended on data collection, existing data sources were acquired to the extent possible for each facility type. It was also considered to be desirable to use data from the same states as were used to estimate models for the First Edition of the HSM for consistency. Two sources of readily available data were considered:  The Highway Safety Information System (HSIS) Read the entire page →
From page 27... ... 11  For four‐lane divided segments on multilane rural highways, data from two states are used for validation as all none of the three state databases were as large as would have been preferred, and having two states to validate against helped to better test the resulting models. Table 2‐2: Data Used for Estimation and Validation Facility Type Segments Estimation Segments Validation Intersections Estimation Intersections Validation Two‐lane rural highways Washington Ohio 3ST: Minnesota 4ST: Minnesota 4SG: Ohio 3ST: Ohio 4ST: Ohio 4SG: Minnesota Multilane rural highways 4U: Texas (2009‐11) 4D: California 4U: Texas (2012) Read the entire page →

From page 19...

... 3 2 ANALYSIS APPROACHES 2.1 SCOPE OF REPORT We report here two types of crash frequency models by crash type and crash severity. Base condition models are estimated using only sites that meet the "base condition" and include only traffic volume as an explanatory variable; these models support the HSM Part C predictive methodology. Average condition models are estimated using all sites and contain exposure‐related variables, such as average annual daily traffic (AADT) and driveways; they apply for average conditions of non‐exposure variables. For most facility types, we report base condition models to keep these models compatible with the methodology of the current HSM. For a few facility types, we needed to relax some of the base condition definitions to achieve a large enough sample size to estimate significant models. For a few facility types, the total sample size was much smaller, so we had to use all cases to estimate significant models; we report average condition models for these facility types, as well as for the rest of the facility types in Appendix A. This report does not contain probabilistic crash severity models or models that include both exposure and non‐exposure covariates. As will be discussed later, our efforts to estimate these types of models were unsuccessful. This section of the report documents our crash type definitions, our estimation approach for crash count models, our exploration of probabilistic crash severity models, and our exploration of improvements for the model calibration procedure. 2.2 CRASH TYPE DEFINITIONS Crash Types The selection of crash types for which models would be developed was based on several criteria: 1.

Read the entire page →

From page 20...

... 4 characteristics, it is not clear how models predicting animal collisions would fit into the model framework. We note the existence of a large body of research into animal–vehicle collisions and suggest that body of work be consulted for consideration of this collision type in safety management procedures. We have defined the crash types shown in Figure 2‐1 to estimate models: Figure 2‐1: General Taxonomy of Crash Types The taxonomy shown in Figure 2‐1 provides for several levels of disaggregation of the crash types according to the number of vehicles involved, their direction of travel, and the manner of the collision. The justification for creating these categories is as follows:  Each crash type within each category involves vehicles colliding in the same way -- that is, front to front, front to rear, front to side, and so on. This results in similar crash severity profiles, as confirmed by Zhang et al. (2007) .  Each crash type within each category is associated with a similar distribution of contributing factors, as assigned by investigating officers (Zhang et al. 2007)

Read the entire page →

From page 21...

... 5 turning same direction (TSD) , all intersecting direction (ID)

Read the entire page →

From page 22...

... 6 Delineation of Intersection Versus Segment Crashes In the HSM methodology, roadway segment models are used to predict all crashes that occur on portions of roadway segments that are more than 250 feet from an intersection and non‐intersection‐related crashes that occur on portions of roadway segments that are within 250 feet of an intersection. Intersection models are used to predict all intersection and intersection‐related crashes that occur within 250 feet of the intersection. The models for two‐lane rural roads and for urban and suburban and suburban arterials apparently were developed to facilitate this application directly. For multilane rural roads in states where the crash records do not indicate "intersection" or "intersection‐ related," all crashes occurring within 250 feet of the middle of an intersection are assigned to that intersection. The calibration procedure is expected to allow models developed for such cases to be applied to cases specified in the HSM methodology, and vice versa. These models were developed to be as consistent with the HSM methodology as possible. In the Ohio database used for urban and suburban arterials and the California database used for multilane rural roads, however, crashes cannot reliably be identified as intersection or intersection‐related. Thus, the intersection models being developed for those two databases and facility types will pertain to all crashes occurring within 250 feet of the center of an intersection, and the segment models will apply to crashes occurring outside this boundary. As noted previously, the calibration procedure will allow these models to apply to cases where intersection and intersection‐related crashes can be identified in accordance with the HSM methodology. 2.3 MODEL ESTIMATION APPROACH Crash Count Models Because crash frequency is a count phenomenon, negative binomial (NB) regression models, or other count distribution estimation methods, are commonly used to build crash prediction models. Even though the NB model has some limitations (for example, it cannot overcome potential underdispersion problems, and the dispersion parameter may be biased for small sample sizes)

Read the entire page →

From page 23...

... 7 during a certain time period) is always positive or zero. Another reason is that taking the log of both sides of the equation results in a linear combination of the predictor variables (that is, the X's)

Read the entire page →

From page 24...

... 8 explain the number of crashes (Wang et al. 2017) . For some facility types, other model forms were used; this is explained in detail in the relevant sections below. Model Estimation and Fit Statistics SPFs for all facility types and crash categories were estimated using standard statistical packages, such as SAS®. As indicated above, the negative binomial distribution was used to start. When the negative binomial overdispersion parameter estimated by maximum likelihood (k)

Read the entire page →

From page 25...

... 9 Washington et al. (2005) gives guidelines for interpreting fit statistics and evaluating the suitability of crash prediction models. Crash Severity Modeling In general, crashes are classified into five severity levels: fatal injury (K)

Read the entire page →

From page 26...

... 10 The fractional split approach is not without limitations. In field data, there are often no crashes for some specific crash severities in a given case -- for example, fatal injury crashes. When this happens, such a segment cannot be used for modeling. To avoid cases with zero crashes for any of the severity levels, the research team aggregated roadway segments into extended super‐segments (or arterials) . To do this, the severity proportions had to be assumed to be consistent over all segments and intersections included in each super‐segment, which was not very practical. In addition, once we aggregated the segments, information specific to them was lost. For these reasons, the research team decided not to adopt the fractional split model for predicting crash severity. Instead, we recommend predicting crash severity using count models, as we do for crash type. 2.4 ESTIMATION AND VALIDATION DATA Estimating crash prediction models for the HSM requires datasets with adequate size, quality and scope of variables. Very few highway agencies have such data readily available. In order to limit the extent of the project budget expended on data collection, existing data sources were acquired to the extent possible for each facility type. It was also considered to be desirable to use data from the same states as were used to estimate models for the First Edition of the HSM for consistency. Two sources of readily available data were considered:  The Highway Safety Information System (HSIS)

Read the entire page →

From page 27...

... 11  For four‐lane divided segments on multilane rural highways, data from two states are used for validation as all none of the three state databases were as large as would have been preferred, and having two states to validate against helped to better test the resulting models. Table 2‐2: Data Used for Estimation and Validation Facility Type Segments Estimation Segments Validation Intersections Estimation Intersections Validation Two‐lane rural highways Washington Ohio 3ST: Minnesota 4ST: Minnesota 4SG: Ohio 3ST: Ohio 4ST: Ohio 4SG: Minnesota Multilane rural highways 4U: Texas (2009‐11) 4D: California 4U: Texas (2012)

Read the entire page →

Key Terms

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.