National Academies Press: OpenBook

Validation of Urban Freeway Models (2014)

Chapter: Chapter 3 - Existing Model Validation

« Previous: Chapter 2 - Data
Page 18
Suggested Citation:"Chapter 3 - Existing Model Validation." National Academies of Sciences, Engineering, and Medicine. 2014. Validation of Urban Freeway Models. Washington, DC: The National Academies Press. doi: 10.17226/22282.
×
Page 18
Page 19
Suggested Citation:"Chapter 3 - Existing Model Validation." National Academies of Sciences, Engineering, and Medicine. 2014. Validation of Urban Freeway Models. Washington, DC: The National Academies Press. doi: 10.17226/22282.
×
Page 19
Page 20
Suggested Citation:"Chapter 3 - Existing Model Validation." National Academies of Sciences, Engineering, and Medicine. 2014. Validation of Urban Freeway Models. Washington, DC: The National Academies Press. doi: 10.17226/22282.
×
Page 20
Page 21
Suggested Citation:"Chapter 3 - Existing Model Validation." National Academies of Sciences, Engineering, and Medicine. 2014. Validation of Urban Freeway Models. Washington, DC: The National Academies Press. doi: 10.17226/22282.
×
Page 21
Page 22
Suggested Citation:"Chapter 3 - Existing Model Validation." National Academies of Sciences, Engineering, and Medicine. 2014. Validation of Urban Freeway Models. Washington, DC: The National Academies Press. doi: 10.17226/22282.
×
Page 22

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

18 Overview Following this overview section, which describes the metrics used to perform the validation, this chapter presents the vali- dation results for the data-rich and the data-poor models. The complete results are included in Appendix C (Data-Rich Validation) and Appendix D (Data-Poor Validation). The validation was performed by assessing the following two questions: 1. What is the model error? 2. Does the model meet the assumptions of generalized regression? Model Error The model error was quantified through the root mean square error (RMSE). The specific calculation of RMSE depends on the model form, so calculation details for the data-rich and data-poor models are contained in Appendix C and Appendix D. Generalized Regression Model Assumptions The following checks were performed to evaluate whether each model satisfied the assumptions of generalized regression: Assessing whether each model adhered to a generalized regression model was performed quantitatively and qualita- tively using (1) residual plots, to find any non-random patterns; (2) Student’s t-test, to evaluate whether the means of the residuals are statistically different from zero; (3) histograms, to visually assess whether residuals are normally distributed; and (4) the Shapiro-Wilk test, to statistically assess whether residuals are normally distributed. Each of these tools is described below. Residual Plots To satisfy regression assumptions, the plot of residuals versus the predicted values must show that (1) the variance of the resid- uals is constant across all predicted values (homoscedastic) and (2) the mean of the residuals is constant across all predicted values (unbiased). This principal is illustrated in Figure 3.1 (1), which compares the ideal residual plot [unbiased and homo- scedastic, in plot (a)] to other patterns. This comparison indi- cates that the assumptions of regression are not being met. Student’s t-Test The one sample Student’s t-test can be used to determine if the mean of the residuals is significantly different from zero in a statistical sense, which tests for systematic bias. With an unbiased model, the difference should be statistically insig- nificant. The t-value is calculated as = − µ0t r s n where r– is the residual mean, s is the standard deviation of residuals, n is the sample size, and µ0 is the specific mean value for comparison, set here to be zero. To draw a conclusion, if the calculated t value is larger than some threshold ta (e.g., a = 5%) using a two-tailed t distribution table, the null hypothesis that the residuals have a mean of zero can be rejected with (1 - a) level of confidence. Or we say that the residual mean is signifi- cantly different from zero at a level of probability. If the cor- responding p value is used to draw a conclusion, it means that if the null hypothesis were correct, then we would expect to obtain such a large t value on at most p percentage of occasions. The data-poor validation used a confidence level of 95%, and the data-rich validation used a confidence level of 90%. C h a p t e r 3 Existing Model Validation

19 50th-percentile, and 10th-percentile TTI, resulting in a total of 24 data models to be validated. The independent variables used in each model were described in Chapter 2, and the equa- tions for each model are contained in Appendix C. Table 3.1 shows the total number of freeway section-years for which data was available within each region and time slice; the cells Shapiro-Wilk Normality Test The Shapiro-Wilk test was used in the data-rich validation to test whether the distribution of residuals is significantly dif- ferent from the normal distribution in a statistical sense. The null hypothesis in this test states that the residuals are nor- mally distributed. To draw a conclusion, if the p-value is less than a threshold, the null hypothesis that the residuals are normally distributed can be rejected with (1 - a) level of confidence. The threshold used here is a = 10%, which cor- responds to a confidence level of 90%. The question of normal- ity was also visually investigated using normality plots and residual histograms. Data-rich Validation Process There are six L03 data-rich models per analysis time slice (peak period, peak hour, weekday, and midday) to predict the mean TTI and the 99th-percentile, 95th-percentile, 80th-percentile, x = Predicted Value y = R es id ua l (a) Unbiased and homoscedastic. The residuals average to zero in each thin vertical strip and the standard deviation is the same all across the plot. (b) Biased and homoscedastic. The residuals show a linear pattern, probably due to a lurking variable not included in the experiment. (c) Biased and homoscedastic. The residuals show a quadratic pattern, possibly because of a nonlinear relationship. Sometimes a variable transform will eliminate the bias. (d) Unbiased and heteroscedastic. The standard deviation is small to the left of the plot and large to the right: the residuals are heteroscedastic. (e) Biased and heteroscedastic. The pattern is linear. (f) Biased and heteroscedastic. The pattern is quadratic. Figure 3.1. Residual plot examples. Table 3.1. Data-Rich Validation Freeway Section Sample Sizes Region Peak Period Peak Hour Midday Weekday California 43 43 140 142 Minnesota 19 25 60 60 Salt Lake City 3 4 32 30 City of Spokane 0 0 9 11 All Data 65 72 241 243 Note: Bold numbers = regions and time periods used in the data-rich validation.

20 in bold indicate regions and time periods used in the data- rich validation. In the Salt Lake City and Spokane regions, very few of the roadway sections experienced traffic condi- tions that qualified under the peak period and peak hour definitions established in L03. Due to these small sample sizes, these region time periods were excluded from the validation results. Results Table 3.2 presents the data-rich root mean square errors (RMSEs) measured for each time period, model, and region. The main conclusion of the data-rich validation is that the average prediction errors measured by the RMSE for each model are not acceptable across many of the regions. From a regional perspective, for all time slices except the weekday time period, the RMSEs are the highest when the models are applied to the California data set. During the weekday time period, the RMSEs are the highest when the models are applied to the Minnesota data set, and the lowest when applied to the Salt Lake City data set. When the RMSEs are interpreted by the predicted measure, we see that, across all of the time periods, the highest RMSEs occur for the prediction of the 99th-percentile TTI. The RMSEs tend to decrease as the predicted TTI measure lowers (i.e., the RMSEs for the 50th-percentile models are lower than for the 80th-percentile models, which are lower than for the 95th-percentile models, and so on). This is to be expected, as there is naturally more variability among the validation data sections at the higher moments of the travel time distribution. Table 3.2. Summary of Data-Rich RMSE Values by Model and Region Model Details RMSE Value by Region Analysis Time Slice Model All Data CA MN Salt Lake City Peak period Mean TTI 96.94% 127.55% 21.59% na 99th Percentile 403.44% 607.76% 63.67% na 95th Percentile 251.95% 359.19% 45.85% na 80th Percentile 151.95% 206.54% 30.95% na 50th Percentile 89.55% 116.63% 23.15% na 10th Percentile 12.13% 14.43% 6.23% na Peak hour Mean TTI 25.45% 26.97% 24.68% na 99th Percentile 50.74% 52.78% 47.46% na 95th Percentile 38.38% 40.19% 37.27% na 80th Percentile 35.13% 36.89% 34.06% na 50th Percentile 28.85% 32.41% 24.22% na 10th Percentile 18.50% 22.24% 12.14% na Midday Mean TTI 6.24% 7.57% 4.07% 3.52% 99th Percentile 32.32% 34.95% 25.86% 34.01% 95th Percentile 15.62% 17.29% 14.01% 12.55% 80th Percentile 8.99% 10.86% 6.61% 3.60% 50th Percentile 5.43% 6.93% 2.09% 2.08% 10th Percentile 1.81% 2.20% 0.80% 1.33% Weekday Mean TTI 19.74% 12.81% 35.99% 5.95% 99th Percentile 72.91% 50.04% 141.72% 30.87% 95th Percentile 83.82% 40.46% 197.82% 22.85% 80th Percentile 29.28% 14.84% 59.43% 5.75% 50th Percentile 4.68% 5.92% 1.71% 2.16% 10th Percentile 0.81% 0.74% 0.48% 1.30% Note: na = not applicable.

21 From a time period perspective, the highest RMSEs are seen during the peak period, which is defined specifically for each section to cover time periods of at least 75 min, during which the average speeds fall below 45 mph. The RMSEs are lower, though still high, for the peak hour models. Both the peak hour and peak period models are predicted by the critical D/C ratio, the incident lane-hours lost and, for some of the models, the precipitation factor. The RMSEs are the lowest during the midday period (11:00 a.m. to 2:00 p.m.), during which congestion tends to be minimal. The midday period TTIs are predicted only by the critical D/C ratio. RMSEs during the weekday period (predicted by the average D/C ratio and, for some of the models, the incident lane-hours lost), are slightly higher than they are for the midday period. Results also indicate that the models violate many of the assumptions of generalized regression and thus have room for enhancement. Generally, a good regression model is expected to present randomly scattered residuals without obvious trends. However, increasing trends and other non-random patterns were observed in the residual plots of many of the models. This indicates that the models may not be able to sufficiently describe the relationship between the independent variables and the dependent variable. Table 3.3 summarizes the results of the t-test and normality test for each model, as applied to the AllData set. In the majority of cases, the null hypotheses for these tests were able to be rejected with 90% confidence, particularly in all but the peak hour time periods. Data-poor Validation Process The seven L03 data-poor models validated in this task were 1. 95th-percentile TTI = 1 + 3.6700  ln(meanTTI) 2. 90th-percentile TTI = 1 + 2.7809  ln(meanTTI) 3. 80th-percentile TTI = 1 + 2.1406  ln(meanTTI) 4. Standard deviation of TTI = 0.71  (meanTTI - 1)0.56 5. PctTripsOnTime50mph = e(-0.20570[meanTTI-1]) 6. PctTripsOnTime45mph = e(-1.5115[meanTTI-1]) 7. PctTripsOnTime30mph = 0.333 + [0.672/(1 + e(5.0366[meanTTI-1.8256]))] Validation was performed using data collected on week- days during the midday period (11:00 a.m. to 2:00 p.m.) and the peak period (a continuous time period of at least 75 min during which the space mean speed is less than 45 mph). This is consistent with the time periods that L03 used to calibrate the data-poor models. Table 3.4 shows the number of freeway section-year data points available for validation within each region. As with the data-rich validation, the number of peak period data points available in Salt Lake City and Spokane were too few to be used in the analysis. Table 3.3. Data-Rich Statistical Test Results by Model, All Regions Model Details Statistical Test Results Analysis Time Slice Model t-Test Wilkes-Barr Peak period Mean TTI Reject Reject 99th Percentile Reject Reject 95th Percentile Reject Reject 80th Percentile Reject Reject 50th Percentile Reject Reject 10th Percentile Reject Reject Peak hour Mean TTI Cannot Reject Cannot Reject 99th Percentile Cannot Reject Reject 95th Percentile Cannot Reject Cannot Reject 80th Percentile Reject Cannot Reject 50th Percentile Reject Cannot Reject 10th Percentile Cannot Reject Reject Midday Mean TTI Reject Reject 99th Percentile Reject Reject 95th Percentile Reject Reject 80th Percentile Reject Reject 50th Percentile Cannot Reject Reject 10th Percentile Reject Reject Weekday Mean TTI Reject Reject 99th Percentile Reject Reject 95th Percentile Reject Reject 80th Percentile Reject Reject 50th Percentile Cannot Reject Reject 10th Percentile Reject Reject Table 3.4. Data-Poor Validation Freeway Section Sample Sizes Time Period CA MN Salt Lake City Spokane All Data Midday 144 60 42 12 258 Peak Period 43 19 3 0 65 Total 187 79 45 12 323

22 Results Table 3.5 summarizes the root mean square error estimates for each model on all sections (All Data column) and by region. Overall, the RMSEs are generally acceptable for most of the models and regions. The model error is larger for the prediction of higher moments of the TTI distribution. This makes sense because the 95th-percentile TTIs are likely asso- ciated with very rare events (like a major incident or bad weather). We would expect these TTIs to vary greatly from section to section, making them harder to accurately model based solely on the mean TTI. The main concern with the data-poor models following the validation effort is that they violate many of the assumptions of generalized regression. The t-test results of the AllData set for each model are shown in Table 3.6; for nearly all of the models, it was possible to reject the null hypothesis of zero residual mean with 95% confidence. The systematic bias of overpredicting or underpredicting the residuals varied regionally, with the models tending to show better-than- measured reliability measures in Minnesota and poorer-than- measured reliability measures in California. This lends support for building regional models rather than cross-sectional models. reference 1. DePaul University. Linear Regression. http://condor.depaul.edu/ sjost/it223/documents/regress.htm. Accessed February 15, 2014. Table 3.5. Summary of Data-Poor RMSE Values by Model and Region Model RMSE Value by Region All Data CA MN Salt Lake City Spokane 95th Percentile 0.1820 0.2064 0.1716 0.0833 0.0688 90th Percentile 0.1189 0.1187 0.1502 0.0483 0.0604 80th Percentile 0.0684 0.0660 0.0896 0.0290 0.0447 Standard Deviation 0.0855 0.0839 0.1028 0.0586 0.0672 PctTripsOnTime50mph 0.0784 0.0891 0.0617 0.0552 0.0721 PctTripsOnTime45mph 0.0602 0.0681 0.0480 0.0433 0.0553 PctTripsOnTime30mph 0.0254 0.0247 0.0329 0.0134 0.0065 Table 3.6. Data-Poor t-Test Results by Model, All Regions Model t-Test Results 95th Percentile Reject 90th Percentile Reject 80th Percentile Reject Standard Deviation Reject PctTripsOnTime50mph Reject PctTripsOnTime45mph Reject PctTripsOnTime30mph Cannot Reject

Next: Chapter 4 - Enhanced Models and Application Guidelines »
Validation of Urban Freeway Models Get This Book
×
 Validation of Urban Freeway Models
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s second Strategic Highway Research Program (SHRP 2) Report S2-L33-RW-1: Validation of Urban Freeway Models documents and presents the results of a project to investigate, validate, and enhance the travel time reliability models developed in the SHRP 2 L03 project titled Analytical Procedures for Determining the Impacts of Reliability Mitigation Strategies.

This report explores the use of new datasets and statistical performance measures to validate these models. As part of this validation, this work examined the structure, inputs, and outputs of all of the L33 project models and explored the applicability and validity of all L03 project models. This report proposes new application guidelines and enhancements to the L03 models.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!