Click for next page ( 23

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 22
22 Data analysis focused on developing models by design The models developed in this research predict the number element for assessing safety impacts from trade-offs among of crashes for a given condition. This decision was reached values of each design element and predicting the potential during the project panel meeting, during which the appropri- safety consequences expressed as number of crashes per unit ateness of crash rates and number of crashes was discussed. time. Models for predicting crashes by severity level were also The decision was based on the need to develop results that developed. However, models for specific crash types were not could be eventually used in the HSM. The rationale for this developed due to lack of available crash data. For the statisti- decision is that the current trend is to avoid the use of crash cal modeling, GLMs were used because they are considered rates because of potential problems arising from the implicit more appropriate for variables that are not normally distrib- assumption of linearity between volume and crashes as well as uted. Such models use a maximum likelihood function to the possible misuse by unaware users who may assume that a determine which variables are significant and how well the change in traffic volumes could proportionally affect the model fits the data. Crashes are considered random events number of crashes. It was therefore decided to separate the data that follow a Poisson distribution; therefore, the use of GLMs in divided and undivided segments and to develop separate is appropriate. Such models are derived using a relatively recent models for each group. statistical approach; the literature suggests they have been Models developed in this research were validated to deter- gaining popularity among researchers (3941). mine their goodness-of-fit. The available data were randomly The SAS statistical software was used to develop the divided into two sets: one was used in the model development, prediction models and to determine their coefficients (46). while the second was used for the evaluation of the strength The Generalized Modeling procedure (GENMOD) was imple- of the model to predict the number of crashes. This is an mented, and the model coefficients were estimated through accepted approach to determine the goodness-of-fit of a model, the maximum-likelihood method. This approach is well even though it reduces the data available for developing the suited to the development of models that have predictors that model by one-half. are either continuous or categorical2. The residual deviance statistics were used to assess the model's goodness-of-fit. Prediction Models Initially, all variables of concern were included in the model, and variables with coefficients that were not statistically Models were developed and evaluated for their applicability significant (at the 5% level) were removed from the model. and ability to produce predictors with reasonable coefficient This process was followed until a model was obtained in which signs. Initially, models were developed where the exposure all variables entered were statistically significant. The signs of was considered as the product of length and traffic volume. the coefficients were also evaluated to determine whether they However, these models produced consistently counterintuitive results: the coefficient signs were opposite to a priori expecta- reflected previously observed crash trends. tions based on past research. Therefore, a second round of A desirable outcome from such a model is the determina- models was produced that used volume as a predictor with the tion of the relative safety impact of specific geometric ele- goal of obtaining more robust models with coefficients more ments. This requires the availability of adequate data to in accordance with past work. These new models had a better establish such comparisons as well as the isolation of the fit, and most coefficients were in agreement with past research impact of each element. There are potential problems that findings. The general form of these models was as follows: should be considered when a model is developed. First, spe- cific elements may not be easily isolated and examined alone E [ N ]i = L e b - ln 12+b ln ADT +b X +b X + . . . + b X 0 1 2 1 2 2 n n (5 5) since the literature has indicated that there are elements that interact. Second, there is the potential for significant vari- where ability among the various roadway segments included in the E[N]i = expected crash frequency per year for Condition i; database such that, even if an element can be isolated, there L = segment length (mile); may be other variables (such as traffic volume, number of bi = model coefficients; lanes, and functional class) that could also require attention ADT = average daily traffic (vehicles/day); and and, thus, require an additional data classification, further Xi = predictors (various variables). reducing a model's strength in reaching statistically sound conclusions. The predictor variables varied for each condition--divided and undivided segments and single-vehicle, multi-vehicle, and all crashes--are discussed in the following paragraphs. The 2A categorical predictor variable is a variable whose categories identify class term ln 12 is included in each model to provide the results in or group membership, which is used to predict responses on one or more units of crashes per year (as 12 years of data were used for esti- dependent variables (from mating the model).