National Academies Press: OpenBook

Application of Crash Modification Factors for Access Management, Volume 2: Research Overview (2021)

Chapter: Appendix C - Overview of Goodness-of-Fit Measures

« Previous: Appendix B - Comments on Survey Responses
Page 182
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 182
Page 183
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 183
Page 184
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 184
Page 185
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 185
Page 186
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 186
Page 187
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 187
Page 188
Suggested Citation:"Appendix C - Overview of Goodness-of-Fit Measures." National Academies of Sciences, Engineering, and Medicine. 2021. Application of Crash Modification Factors for Access Management, Volume 2: Research Overview. Washington, DC: The National Academies Press. doi: 10.17226/26162.
×
Page 188

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

182 This appendix provides a summary of goodness-of-fit measures to assess the performance of SPFs, including the mean absolute deviation, modified R2, dispersion parameter, coefficient of variation (CV) of the calibration factor, and CURE. It is relatively straightforward to use these goodness-of-fit measures to compare the relative performance of competing SPFs considered for application. More challenging is the use in assessing whether a single SPF is adequate as there are no guidelines on acceptable thresholds except for the CV of the calibration factor and the CURE plot. Thus, some subjective judgment is required to supplement the assessment based on the CV with a consideration of the other goodness-of-fit measures. Mean Absolute Deviation Figure C-1 provides the equation for the mean absolute deviation (MAD), which provides a measure of the average magnitude of variability of prediction. Smaller values are preferred to larger values in comparing two or more competing SPFs. The MAD is the sum of the absolute value of predicted minus observed crashes, divided by the number of sites. The values of predicted and observed crashes are from the calibration data. A P P E N D I X C Overview of Goodness-of-Fit Measures Figure C-1. Mean absolute deviation. Variables in the equation shown in Figure C-1 are defined as follows: • yi = observed counts, • ŷi = predicted values from the SPF, and • n = validation data sample size. Modified R2 Figure C-2 shows the equation for the modified R2 value (Fridstrom et al. 1995). This goodness- of-fit measure subtracts the normal amount of random variation expected if the SPF were 100 percent accurate. Even with a perfect SPF, some variation in observed crash counts would be observed due to the random nature of crashes (Fridstrom et al. 1995). As a result, the amount of systematic variation explained by the SPF is measured. Larger values indicate a better fit to the data in comparing two or more competing SPFs. Values greater than 1.0 indicate the SPF is over- fit (i.e., the SPF is incorrectly explaining some of the expected random variation as systematic variation).

Overview of Goodness-of-Fit Measures 183   Figure C-2. Modified R2 value. Variables in the equation shown in Figure C-2 are defined as follows: • yi = observed counts, • ŷi = predicted values from the SPF, • ȳ = sample average, and • μ̂i = yi – ŷi. Dispersion Parameter The dispersion parameter, f(k), in the negative binomial distribution is reported from the variance equation. Figure C-3 provides the variance equation, which is rearranged in Figure C-4 to provide the equation for the dispersion parameter. Figure C-3. Variance of negative binomial distribution. Figure C-4. Dispersion parameter. Variables in the equations shown in Figures C-3 and C-4 are defined as follows: • f(k) = estimated dispersion parameter, • Var{m} = estimated variance of mean crash rate, and • E{m} = estimated mean crash rate. The estimated variance increases as dispersion increases, and consequently the standard errors of estimates are inflated. As a result, all else being equal, an SPF with less dispersion [i.e., smaller values of f(k)] is preferred to an SPF with more dispersion. Note that f(k) can be specified as a constant or as a function of site characteristics. The tool facilitates the estimation of the disper- sion parameter, either as a constant or from a function, as one goodness-of-fit measure. Coefficient of Variation of the Calibration Factor For a constant calibration factor, the CV of the calibration factor is useful to assess the good- ness of fit. Figure C-5 provides the equation for the CV of a constant calibration factor, which is the standard deviation of the calibration factor divided by the estimate of the calibration factor.

184 Application of Crash Modification Factors for Access Management Variables in the equation shown in Figure C-5 are defined as follows: • CV = coefficient of variation of the calibration factor, • V(C) = variance of the calibration factor, and • C = estimate of the calibration factor. Figure C-6 shows the equation for the variance of the calibration factor [V(C)]. The standard deviation of the calibration factor is the square root of the variance. Figure C-5. Coefficient of variation of a constant calibration factor. Figure C-6. Variance of calibration factor. Variables in the equation shown in Figure C-6 are defined as follows: • yi = observed counts, • ŷi = uncalibrated predicted values from the SPF, and • k = dispersion parameter (recalibrated). Appendix B of the User’s Guide to Develop Highway Safety Manual Safety Performance Function Calibration Factors provides guidance on estimating the accuracy of a calibration factor using the CV (Bahar 2014). This guidance is intended for application in assessing the sample size of the calibration dataset; however, it seems reasonable to also apply it to assess the accuracy of a cali- bration factor regardless of the sample size (Bahar 2014). The Guide suggests that a reasonable upper threshold for the CV is 0.10 to 0.15. This threshold can help to assess whether or not the SPF, and the estimated calibration factor based on the cali- bration dataset, are acceptable. If the CV exceeds this threshold, the cumulative residual plots, described in the following section, can help to determine if the SPF is acceptable. In any case, the CV can help with comparative evaluation of two or more SPFs where smaller values are preferred to larger values. Cumulative Residual Plots Another tool to assess goodness of fit is the CURE plot. A CURE plot is a graph of the cumula- tive residuals (observed minus predicted crashes) against a variable of interest sorted in ascending order (e.g., major road traffic volume). CURE plots provide a visual representation of goodness of fit over the range of a given variable and help to identify potential concerns such as the following: • Long trends. Long trends in the CURE plot (increasing or decreasing) indicate regions of bias that should be rectified through improvement to the SPF either by the addition of new vari- ables or by a change of functional form. • Percent exceeding the confidence limits. Cumulative residuals outside the confidence limits indicate a poor fit over that range in the variable of interest. Cumulative residuals frequently outside the confidence limits indicate notable bias in the SPF. The upper threshold for the percent of cumulative residuals exceeding the 95 percent confidence limits is 5 percent.

Overview of Goodness-of-Fit Measures 185   • Vertical changes. Large vertical changes in the CURE plot are potential indicators of outliers, which require further examination. For further discussion of outliers, refer to the section titled, Dealing with Outliers, in The Calibrator—An SPF Calibration and Assessment Tool Updated User Guide (Lyon et al. 2018). Consult Chapter 7 of The Art of Regression Modeling in Road Safety, for more advanced discussion (Hauer 2015). Figure C-7 shows an example CURE plot for the variable indicating major road traffic volume at an intersection. In this example, the SPF performs relatively well based on the general pattern and the 95-percent confidence interval. The pattern shows the cumulative residuals oscillating above and below zero. Note that a sustained increasing or decreasing trend would indicate a range of under- or over-prediction, respectively. In this example, the cumulative residuals also remain within the 95-percent confidence limits over most of the range, only exceeding the con- fidence limits for a short range of lower AADT. The areas outside the confidence limits indicate a poor fit, as indicated in the figure. Cumulative residuals frequently outside the confidence limits would indicate notable bias in the SPF. Another notable observation is the sharp increase in the value of cumulative residuals at an AADT of approximately 175,000 vehicles per day. This may indicate the presence of an outlier in the data. The Calibrator automatically provides a CURE plot similar to that shown in Figure C-7 for fitted values [after applying the calibration factor(s)] and allows a user to choose any other avail- able continuous variable for the x-axis. The tool calculates the maximum deviation as well as the percent of observations outside the 95-percent confidence limits. With this information, users can follow the procedure in The Art of Regression Modeling in Road Safety to determine whether an SPF is acceptable and in comparing multiple SPFs (Hauer 2015). While the guidance provided by Hauer (2015) for making these decisions is useful, it is largely subjective. The most objective consideration is a review of the CURE plot and 95-percent (2σ) confidence limits. As Hauer notes inasmuch as the CURE plot is a sum of many independent random variables, it is approximately normally distributed. For a normal distribution, about 95 percent of the probability mass is within two standard deviations from the mean. Thus, the CURE plot for an ‘everywhere unbiased’ SPF should only rarely go beyond the 2σ limits. -400.00 -300.00 -200.00 -100.00 0.00 100.00 200.00 300.00 400.00 0 50000 100000 150000 200000 250000 Cu m ul ati ve R es id ua ls Major Road Traffic Volume cumulative residuals upper 95% confidence limits lower 95% confidence limits Poor Fit Figure C-7. Example CURE plot.

186 Application of Crash Modification Factors for Access Management Hauer’s book also states, “the overall fit of the SPF is best judged by the CURE plot for fitted values” (Hauer 2015, p. 150). The following are general rules for assessing the percent of the CURE plot exceeding the 95 percent (2σ) confidence limit: 1. An upper threshold of 5 percent of CURE plot ordinates for fitted values [after applying the calibration factor(s)] exceeding 2σ limits is indicative of an SPF that calibrates well to the entire range of a jurisdiction’s data. 2. If the CURE plot exceeds the 95-percent confidence limits by more than 5 percent, then consider the CV of the constant calibration factor. If the CV is within acceptable limits, then the SPF may be acceptable for application, with due recognition for ranges of variables where significant bias is indicated. Another performance measure to compare two or more competing SPFs is the percent of CURE plot ordinates for fitted values (after applying the calibration factor) exceeding the 2σ limits, where lower values of “percent exceeding” are preferred. Akaike’s Information Criterion Figure C-8 shows the equation for Akaike’s Information Criterion (AIC). The AIC penalizes for the addition of parameters and thus helps to select an SPF that fits well but has a minimum number of parameters. AIC is not typically used as a goodness-of-fit measure but can be used to compare the relative fit of alternate SPFs. Smaller values are preferred to larger values in comparing two or more competing SPFs. Figure C-8. Akaike’s Information Criterion. Variables in the equation shown in Figure C-8 are defined as follows: • K = number of estimated parameters included in the SPF (i.e., number of variables plus the intercept), and • Loglikelihood = statistical output reflecting the overall SPF fit (larger values indicate a better fit). Schwarz Bayesian Information Criterion Figure C-9 shows the equation for the Schwarz Bayesian Information Criterion (BIC). The BIC is complementary to AIC in that it also penalizes for the addition of parameters and thus selects an SPF that fits well, but has a minimum number of parameters. BIC is not typically used as a goodness-of-fit measure but can be used to compare the relative fit of alternate SPFs. Smaller values are preferred to larger values in comparing two or more competing SPFs. Figure C-9. Schwarz Bayesian Information Criterion (BIC). Variables in the equation shown in Figure C-9 are defined as follows: • K = number of estimated parameters included in the SPF (i.e., number of variables plus the intercept), and • Loglikelihood = statistical output reflecting the overall SPF fit (larger values indicate a better fit).

Overview of Goodness-of-Fit Measures 187   Assessment Tables While the CURE plot method works well for continuous variables, it is not applicable to variables with few categories (e.g., a database with speed limits of 45, 55, and 65 mph). For such variables, it is useful to develop a table of “calibration bias factors” that include factors for each category of the variable, as in the example below. Calibration bias factors are the sum of the observed crashes for the category divided by the sum of the predictions obtained when the calibration factor is applied. If this bias factor is less than 1.0, then the calibrated SPF is over-predicting for the category. If this bias factor is greater than 1.0, then the calibrated SPF is under-predicting. These bias factors can support a comparative assessment of two or more SPFs in conjunction with CURE plots for other measures. In the example shown in Table C-1, there are three categories of speed limit with corresponding observed crashes and calibration factors. As shown by the calibration factors, the SPF is over- predicting crashes at lower speed limits and under-predicting crashes at higher speed limits. An assessment table can help to identify categories or levels of a given variable for which there is concern about the quality of the calibration process. Calibration bias factors less than 0.8 or greater than 1.2 indicate potential areas of concern, providing these factors are based on at least 100 crashes. Summary of SPF Assessment This section provides a quick reference summary of the key considerations in SPF assessment. Given a single SPF, the CURE plot and the CV of a constant calibration factor help to determine whether the calibrated SPF is acceptable. Given the choice from multiple SPFs, several goodness- of-fit measures can be used to determine the most suitable SPF for the local dataset and, sub- sequently, the CURE plot and CV of the constant calibration factor can be used to determine if the preferred SPF is acceptable. The Calibrator generates these goodness-of-fit measures, but it does not indicate the preferred SPF or acceptability, given the need for further research in this area. Assessing the Acceptability of an SPF An SPF with a constant calibration factor may be acceptable if either of the following conditions is met: 1. 5 percent or less of CURE plot ordinates for fitted values (after applying the calibration factor) exceed the 2σ limits, or 2. The CV of a constant calibration factor is less than 0.15. A calibration function should then be estimated that provides a unique calibration factor for each site. The function may be preferable if either of the above conditions is met (i.e., the cali- brated SPF is acceptable based on a constant calibration factor), and the percent of CURE plot ordinates for fitted values (after applying the unique calibration factors) exceeding the 2σ limits is lower than that for the constant calibration factor. It is likely that the function will then be preferable by other assessment measures (MAD, AIC, and BIC). Variable 45 mph 55 mph 65 mph Observed crashes 200 320 275 Number of sites 30 35 40 Calibration bias factor 0.85 1.05 1.15 Table C-1. Example of categorical variable assessment.

188 Application of Crash Modification Factors for Access Management As a caution, if the constant calibration factor is not acceptable, but a calibration function shows less than 5 percent of CURE plot ordinates for fitted values exceeding the 2σ limits, this may be due to a small number of sites or crashes in the calibration dataset. Consider the sample size before adopting the calibration function. If there is a large sample, then the calibration func- tion may be acceptable. If both conditions previously mentioned are not met, consider increasing the calibration sample. If, with the largest feasible calibration sample, both conditions are still not met for the constant calibration factor, and the first condition is not met with the calibration function, then consider calibrating another existing SPF or developing a jurisdiction-specific SPF. Comparing Multiple SPFs Table C-2 presents seven measures for comparing the performance of multiple SPFs. For this comparison, there is a need to first estimate a constant calibration factor for each SPF. For each measure, the SPFs are ranked numerically from 1 to n, where 1 represents the best SPF with respect to the given measure and n represents the number of alternative SPFs. To determine the aggre- gate ranking based on all seven measures, there is a need to sum the numeric rankings over the seven measures. The SPF with the lowest sum of ranks is the preferred SPF for calibration to a jurisdiction’s data. There is an opportunity to then refine the preferred SPF based on a constant calibration factor by estimating a calibration function and assessing the SPF performance with this refinement. Note that there is still a need to determine if the preferred SPF is acceptable as previously outlined, based on the CURE plot and CV for the calibration factor or the CURE plot for the calibration function. Goodness-of-Fit Measure Preferred Values Ranking Method Mean Absolute Deviation (MAD) Smaller values Smallest value is ranked number 1 Modified R2 Larger values Largest value is ranked number 1 Constant Dispersion Parameter* Smaller values Smallest value is ranked number 1 Coefficient of variation of the constant calibration factor (CV) Smaller values Smallest value is ranked number 1 Percent of CURE plot ordinates for fitted values (after calibration) exceeding 2σ limits Smaller values Smallest value is ranked number 1 Akaike’s Information Criterion (AIC) Smaller values Smallest value is ranked number 1 Bayesian Information Criterion (BIC) Smaller values Smallest value is ranked number 1 *Criterion is only considered where all original candidate SPFs have a constant dispersion parameter. Table C-2. Summary of goodness-of-fit measures for ranking SPFs.

Next: References »
Application of Crash Modification Factors for Access Management, Volume 2: Research Overview Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The 1st Edition, in 2010, of the AASHTO Highway Safety Manual revolutionized highway engineering practice by providing crash modification factors and functions, along with methods that use safety performance functions for estimating the number of crashes within a corridor, subsequent to implementing safety countermeasures.

The TRB National Cooperative Highway Research Program's NCHRP Research Report 974: Application of Crash Modification Factors for Access Management, Volume 2: Research Overview documents the research process related to access management features. The research project is also summarized in this presentation.

NCHRP Research Report 974: Application of Crash Modification Factors for Access Management, Volume 1: Practitioner’s Guide presents methods to help transportation practitioners quantify the safety impacts of access management strategies and make more informed access-related decisions on urban and suburban arterials.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!