The primary objective of the Federal Motor Carrier Safety Administration (FMCSA) is to reduce the frequency and severity of commercial motor vehicle (CMV) crashes in the United States. One way of achieving this objective would be to predict, for each carrier, the number of reportable1 crashes in which its vehicles would be involved in the near future. Such a prediction would take into consideration multiple factors, such as the length and nature of the carrier’s routes, schedule, cargo, and driver characteristics. Unfortunately, such a program is not feasible, in part, because in any given time period, the incidence of observed crashes is small even though the risk of a crash for a carrier might still be high, and because FMCSA does not have access to motor carrier, vehicle, and driver data that could be used to calculate crash risk. FMCSA has instead adopted a different approach, known as the Safety Measurement System (SMS).
FMCSA’s SMS uses Motor Carrier Management Information System (MCMIS) data to produce metrics to identify, for interventions, carriers
1 Throughout this report, to be included in MCMIS, a “reportable crash” has to involve a fatality, a person transported from the scene for immediate medical attention, or a vehicle towed from the scene due to disabling damage suffered in the crash. Also, the vehicle factors for reportability are a vehicle greater than 10,000 lbs., or a bus with seats for nine or more passengers including the driver.
that have patterns of noncompliance with FMCSA’s safety regulations greater than their peers. FMCSA has data on the frequency that various violations of safety regulations are found during roadside inspections of CMVs. FMCSA uses the identification of violations from these inspections to identify carriers that are systematically out of compliance with federal regulations and therefore are assessed to be at higher risk of future crashes than their peers. FMCSA argues that the presence of such patterns of violations is an indicator of the degree to which a carrier gives priority to safety. Therefore FMCSA’s objective becomes a problem of discrimination between those carriers that do and do not act in accordance with regulations for safe operation.
FMCSA intervenes with those carriers that are frequently in violation to encourage them to adopt practices that will result in fewer violations. FMCSA has evaluations that indicate that doing so prevents many future crashes by notifying carriers when they are engaging in unsafe practices and also hopes to minimize the targeting of carriers unlikely to have violations. Until recently, information on which carriers had percentiles above established intervention thresholds were made public, which incentivized carriers to make changes since a carrier could lose business and/or have its insurance rates raised as a result.
There is an important difference between the two objectives of predicting carriers with high future crash risk versus identifying carriers with current high frequency of violations. To predict future crash involvement, FMCSA would not want to find all violations, but rather just violations closely linked with future crash risk. However, if the objective is that of prevention, there does not need to be a direct causal link between the frequency of occurrence of many of the violations and future crash risk, since their productive use in SMS only depends on the assumptions that carriers that are frequent violators engage in unsafe practices, and carriers that engage in unsafe practices are also likely to have a high frequency of future crashes. So, for example, while a truck or bus often cited for “minor lights out” may not as a direct result be involved in crashes, carriers that are not meticulous about such things may have more crashes due to a general poor approach to vehicle maintenance that impacts crash risk.
It is true that the statistical models would be very similar for models of both prevention and prediction. However, for prediction, more weight would be given to current and previous crash rates, using only violation rates established as strongly predictive of future crash rate. In contrast, for prevention, less weight could be given to current and previous crash rates, with a broader collection of violation rates used that were found to be effective at identifying carriers that have a lack of emphasis on safe practices, not necessarily predictive for crash rate.
The panel understands the reason for the approach taken by FMCSA
in trying to prevent future crashes, rather than predict them. It is a common approach taken in other transportation industries. Later in this chapter, we describe FMCSA’s evaluations of the degree to which the assumption of the linkage between the frequency of inspection violations and future crash risk obtains.
The data used in SMS are those primarily collected during inspections by Motor Carrier Safety Assistance Program (MCSAP) officials (usually state or local law enforcement) who are certified by the Commercial Vehicle Safety Alliance (CVSA). Depending on the level of inspection, trucks and buses are checked for up to 899 possible violations. The violations recorded during inspections are transferred to the MCMIS database. MCMIS also includes records of crashes involving CMVs and records of investigations of motor carriers. In addition, MCMIS contains a census file of all motor carrier companies with self-reported data on the number of power units they have and total mileage traveled, and the results of any investigations. (See Chapter 6 for a more detailed discussion of the issues and limitations of MCMIS data.) It is to FMCSA’s credit that it used this administrative database to produce metrics to monitor well over 500,000 active CMV carriers engaged in interstate commerce or in the intrastate transportation of hazardous materials.
As mentioned, the inputs to MCMIS are crashes, inspections, and violations, mostly reported by the states, investigation information from FMCSA or states, and basic self-reported carrier information. In particular, since crash data are collected by the states and the District of Columbia, there are 51 different crash report forms, 51 different coding manuals, and different training protocols. Crash reports are issued by between 650,000 to 700,000 police officers, ranging from state police to village sheriffs. Therefore, the data are not collected by people trained in CMV data collection nor supervised by data users, but instead by police officers whose main job is enforcing the law and protecting lives and property. FMCSA has access only to these limited data, which were collected for administrative purposes.
Another complication is the diverse world of CMV carriers, including carriers transporting freight cross-country, custom harvesters that transport crops, and buses that take people to church outings, among many others. In addition, carriers come in sizes that differ by many orders of magnitude, from single vehicle owner-operator carriers to carriers that own tens of thousands of vehicles.
The program that uses MCMIS data to identify potentially unsafe carriers is referred to as Compliance, Safety, Accountability (CSA), which
contains SMS as the specific tool to identify carriers for intervention. The Carrier Safety Measurement System (CSMS) replaced SafeStat (Federal Register, 2010), which was FMCSA’s initial attempt to use MCMIS data to evaluate motor carriers’ safety performance. SafeStat, implemented in 1997, consisted of four Safety Evaluation Areas (SEAs): Accident, Driver, Vehicle, and Safety Management. These four SEA numbers were combined into an overall assessment, referred to as a SafeStat score. The scores became public in 1999 until 2004, when the Accident SEA was made confidential due to problems with completeness of the data, and because there was no attempt to distinguish between crashes that were and were not preventable from the point of view of the involved motor carrier.
FMCSA’s goal of reducing the frequency and severity of CMV crashes implies a set of objective measures that are in line with those congressionally mandated in the Moving Ahead for Progress in the 21st Century legislation (MAP-21). In particular, safety in surface transportation was to be measured based on four performance measures: (1) total fatalities, (2) total injuries, (3) fatality rate, and (4) serious injury rate. SMS contributes to responses by FMCSA, the U.S. Department of Transportation, and the states to MAP-21 requirements by focusing regulatory and enforcement attention on factors in motor carrier operations that are related to serious crash risk. Related strongly to this objective is the statement, from FMCSA, that the SMS exists to “change unsafe behavior” (Federal Register, 2010). Not only are carriers with poor safety data subject to intervention (of various types), but also the measures of carriers have, until recently, been made public so that the motor carrier industry and other safety stakeholders would have access to comprehensive and regularly updated safety performance data. In addition, the hope is that by doing this, motor carriers will have an incentive to improve their SMS measures (relative to their peers), and, in the process, safety will improve. Industry stakeholders informed the U.S. Government Accountability Office (GAO) that SMS has contributed to a greater awareness of safety and safety performance data by motivating carriers to improve their safety scores to gain an advantage over their competitors (U.S. Government Accountability Office, 2014).
How SMS Operates
SMS produces percentile ranks for each carrier along seven different dimensions: (1) Unsafe Driving, (2) Hours of Service, (3) Vehicle Maintenance, (4) Controlled Substances/Alcohol, (5) Hazardous Materials, (6) Driver Fitness, and (7) Crash Indicator (see Box 2-1). These seven dimensions are referred to as BASICs (an acronym for Behavior Analysis and Safety Improvement Categories). Besides Crash Indicator—which is an
assessment of the current rate of crashes—they are areas of related violations identified in roadside (and other) inspections. Again leaving aside Crash Indicator, the six other BASICs are aggregates of different subsets of 899 separate possible violations that are associated with each BASIC’s purpose. Each carrier with sufficient data, therefore, is assigned seven possible measures; carriers with few or no crashes, inspections, or violations are not given SMS measures.
These seven ratios are referred to as a carrier’s SMS measures. For the six noncrash BASICs, the numerators are the sums of the product of two weights for each violation relevant to that BASIC that a carrier has received during the past 2 years. There is a time weight, with more recent violations receiving a higher weight, and a severity weight, with violations viewed as being more critical to safety being given a higher weight. The denominators of these ratios are, with one exception, also weighted sums, but of the number of relevant inspections, where the weights only involve time weighting. The exception is that for the Unsafe Driving BASIC, the denominator is essentially an estimate of vehicle miles traveled, which is arrived at by multiplying the number of power units a carrier has by a utilization factor (see Chapter 6 for more detail). This calculation accounts for the fact that some carriers operate for more miles a year than other carriers, and therefore have more exposure to violations, such as speeding.
The Crash Indicator BASIC is also a ratio. The numerator is a weighted sum of the crashes that a carrier has experienced in the past 2 years, with time and crash severity weights (which give higher weight to crashes with a fatality or injury). The denominator is again a measure of the number of power units a carrier has, with some allowance for more vehicle miles traveled.
CMV carriers, for both buses and trucks, are characterized as belonging to one of two strata, either straight (vehicles with permanently mounted bodies) or combination (vehicles that pull trailers). The strata were determined by FMCSA by observing how patterns of vehicle miles traveled vary by fleet composition. If a carrier is comprised of greater than 70 percent combination trucks or motorcoaches, it is placed in the combination stratum, with the remainder of carriers placed in the straight stratum. Truck and bus carriers are not separate in SMS. Carriers within these two strata are then placed into peer groups defined roughly by a measure of the size of the carrier based on the number of inspections, which are referred to as safety event groups. Within a safety event group, the measures for each BASIC for all carriers in the group are sorted from low to high. The ranks for each carrier, divided by the number of carriers in the group, are then associated with each carrier and referred to as percentile ranks. For example, if a carrier has the 112th lowest score for a particular
BASIC out of 400 carriers in an event group, it is given the percentile rank of 100 × (112/400), or 28 percent, meaning that 72 percent of the carriers had higher measures for that BASIC, or worse safety performance.
For each BASIC, and for each combination and straight segment, FMCSA has derived thresholds for use with the associated percentile ranks that are the same across safety event groups, which comes to a
total of 7 × 2 = 14 thresholds.2 If a carrier has a percentile rank above the threshold for an individual BASIC, it may receive an intervention ranging from a warning letter, to an investigation, to further monitoring. If its percentile rank is below the threshold, there is no intervention based on the SMS information. FMCSA may still intervene with the carrier due to a crash, complaint, or other nonsafety violation of the agency’s regulations.
2 Given the fact that combination and straight segmentation is only applied to Crash Indicator and Unsafe Driving categories, there are only nine different thresholds.
A number of organizations have reviewed various aspects of FMCSA’s SMS. They include the Volpe National Transportation Center (2014), American Transportation Research Institute ([ATRI] 2012, 2014, 2015), U.S. Government Accountability Office (2014), Independent Review Team (2014), and Green and Blower (2011). The remainder of this chapter summarizes evaluations of the SMS carried out by FMCSA. We then examine various issues raised by various external agencies and researchers to (1) provide our own synthesis and evaluation of the issues, and (2) assess how the current SMS successfully addresses the concerns or how proposed modifications to SMS could address those concerns. Before proceeding, we provide some thoughts on how one might compare SMS with an alternative in Box 2-2.
The Carrier Safety Measurement System (CSMS) Effectiveness Test
Starting with FMCSA/Volpe’s evaluations of SMS,3 the input to compute SMS for this analysis was MCMIS data from 2009 and 2010 (with a few exceptions), and the SMS results were compared to future crash data from January 2011 to June 2012. In this evaluation, FMCSA quantified the effectiveness of SMS for the identification of carriers for interventions.
Crash Rates for Carriers with Alert Status for Various BASICs
The national average for crash rates for the subset of CMV carriers that were active and that had average power units (APU) and vehicle miles traveled (VMT) data (which are likely to be larger than average) was 3.43 crashes per 100 power units over a 2-year period. (Concerns about the use of power units as a primary component of a measure of exposure have been raised, which we discuss later in this report.) In comparison, carriers with at least one BASIC in alert status (where “alert status” is identical with being above the FMCSA threshold and therefore issued an intervention for the associated BASIC) had crash rates greater than the national average. Table 2-1 shows that the more BASICs in alert status a carrier has, the greater the carrier’s current crash rate.
Further, Table 2-2 shows that separately for each BASIC, the crash rate for carriers with alerts is considerably higher than the national aver-
3 Carrier Safety Measurement System (CSMS) Effectiveness Tests by Behavior Analysis and Safety Improvement Categories (BASICs), January 2014, Federal Motor Carrier Safety Administration, and CSA Effectiveness Measures. U.S. Department of Transportation, June 30, 2016.
age, with the exception of Driver Fitness, where the crash rate for those carriers with alert status is lower than the national average. Though the measure of total crashes per 100 power units could be improved upon, these data suggest that SMS is identifying an appropriate group of carriers for interventions, and the data support use of six of the seven BASICs.
It is important to point out that Tables 2-1 and 2-2 are affected by the fact that one BASIC is the crash rate. Having an alert for that BASIC is equivalent to having a high historical crash rate, though not a future crash rate, which is what is examined here. Also, a concern is that the inference from such analyses (and other findings in these evaluations) is likely affected by the selection effect of which vehicles are chosen for inspection. Even so, this result provides support linking the frequency of inspection violations to crash rate (using 100 power units as the denominator). One improvement that should be considered is to carry this analysis out separately within each safety event group to show that within safety event groups, SMS identifies those carriers that have a higher crash rate for interventions than the remainder. By doing that, the aggregate crash rate (with power units as the measure of exposure) for those carriers with interventions would be compared with that for those carriers without interventions within safety event groups. In such an analysis, the comparison would not be the national statistic, it would be the statistic for the remaining carriers in the safety event group.
A surprising result is that carriers with alert status in the Driver Fitness BASIC have lower crash rates than the national average. This result can be better understood if attention is restricted to For-Hire Combination carriers only. FMCSA shows that when doing so, a higher crash rate is obtained (see Table 2-3). The crash rate for Driver Fitness for this subgroup of carriers is now 7.21, much larger than the national combination segment average of 5.20 crashes per year per 100 power units. An argument to condition on For-Hire Combination carriers is that doing so eliminates carriers that carry out many short trips, which are unlikely to result in serious crashes. However, the fact that conditioning can change directions of relationships illustrates a larger point identified by the panel as important. Since there are often several contributory factors for crashes, the value of an individual factor may only be evident in conjunction with other factors as a sort of interactive effect. A full analysis, including all of the important contributory factors simultaneously, is necessary before making decisions on which violations and which BASICs are or are not playing an important role in identifying unsafe carriers. This is a point that we will discuss in more detail in Chapter 5.
FMCSA has established “data sufficiency” standards, which are minimum numbers of crashes, inspections, and violations necessary to support the calculation of SMS measures. (The specific definition of these standards is provided later in this chapter.) To better understand the implications of data sufficiency standards, FMCSA examined the percentage of carriers that were later involved in crashes that had sufficient data to compute BASIC percentiles (see Table 2-4).
We can see from Table 2-4 that about 39 percent of active carriers have sufficient data for SMS; of those, about one-fourth have at least one BASIC above the threshold. However, the number of power units associated with the carriers with sufficient data for SMS measures is 82 percent. Therefore, the coverage of the industry using the current data sufficiency
TABLE 2-1 Crash Rates for Carriers with Different Number of Alerts
|Number of Alerts||0||1||2||3-4||5+|
SOURCE: Carrier Safety Measurement System Effectiveness Test by Behavior Analysis and Safety Improvement Categories (January 2014).
TABLE 2-2 Crash Rates for Carriers with Alert Status in the Seven BASICs
|BASIC Identified for Interventions||Number of Carriers Identified||Total Power Units||Total Crashes||Crash Rate (per 100 power units)||% Increase in Crash Rate Compared to National Average (3.43)|
NOTE: BASICs, Behavior Analysis and Safety Improvement Categories; HM, hazardous materials; HOS, hours of service.
SOURCE: Carrier Safety Measurement System Effectiveness Test by Behavior Analysis and Safety Improvement Categories (January 2014).
standards is reasonably high, though it would be useful to see what this coverage was for different data sufficiency standards. Further, the carriers satisfying data sufficiency standards represent 92 percent of the crashes the month after the SMS percentile ranks were produced.
Size of Carrier
FMCSA has to apply SMS to a very large and disparate set of carriers, one dimension of which is the substantial difference in the size of those carriers. As shown in Table 2-5, crash rates differ substantially by carrier size, with smaller carriers appearing to have a higher crash risk.
TABLE 2-3 Crash Rates for Carriers with Alert Status in the Seven BASICs—For Hire (combination carriers only)
|BASIC||Unsafe Driving||Crash Indicator||Controlled Substances/Alcohol||HOS Compliance||Driver Fitness||Vehicle Maintenance||HM Compliance|
|Number of Carriers||6,245||2.826||1,587||17,684||2,329||10,528||276|
|Crash Rate (crashes per 100 power units)||8.34||8.02||8.00||7.42||7.21||6.97||5.87|
NOTES: BASIC, Behavior Analysis and Safety Improvement Categories; HM, hazardous materials; HOS, hours of service.
SOURCE: Compliance, Safety, Accountability Effectiveness Measures (June 30, 2016).
TABLE 2-4 Percentage of Carriers That Have Sufficient Data to Support the Safety Measurement System (SMS)
|Carrier Group||Number of Carriers||% of Carriers||Number of Crashes in Month after Snapshot||Number of Power Units||% of Power Units||% of Crashes in Month after Snapshot|
|All Carriers with Recent Activity||521,952||100.00||7,890||4,410,068||100.00||100.00|
|Carriers with Sufficient Data to Be Assessed in SMS||204,651||39.20||7,217||3,627,065||82.20||91.50|
|Carriers with at Least One BASIC above Threshold||54,674||10.50||3,645||2,099,101||24.70||46.20|
NOTE: Data from Motor Carrier Management Information System, March 2016 snapshot.
SOURCE: Compliance, Safety, Accountability Effectiveness Measures (June 30, 2016).
TABLE 2-5 Carriers Identified in One or More BASIC Alerts, Size of Carrier, and Crash Rates
|Carriers and PUs||# of Carriers Prioritized||% Carriers with at Least One BASIC Prioritized||Total PUs||Total Crashes||Crash Rate (per 100 PUs)||% Increase in Crash Rate|
|5 or Fewer PUs||24,647||12||56,731||4,336||7.64||137 (7.64-4.82)/7.64|
|More Than 500 PUs||269||49||469,384||17,451||3.72||60|
NOTES: BASIC, Behavior Analysis and Safety Improvement Categories; PUs, power units.
SOURCE: Carrier Safety Measurement System Effectiveness Test by Behavior Analysis and Safety Improvement Categories (January 2014).
Decrease in Percentage of SMS Violations per Inspection, 2009–2016
There is evidence that the percentage of inspections with violations has decreased over the time that SMS has been in use. FMCSA argues that this is due, at least in part, to the introduction of SMS (see Table 2-6). It is important to point out that this evidence is not completely compelling, since during that time period there may have been other confounding factors that were also changing. Having said that, there are no confounding factors that easily come to mind other than a general easing of investigator standards across states.
As shown in Table 2-7, FMCSA also looked at the effectiveness of interventions over time by calculating the crashes prevented by assuming that the observed crash reductions for carriers receiving interventions were due to receiving interventions and not due to other factors (which could include general improvements in safer operations either through management changes, generally safer conditions on the road, or use of technology).
Intervention Effectiveness over Time
Investigations (which are a more intensive form of intervention than a warning letter) seem to have an impact on violation rate, as shown in Table 2-8. It is encouraging to see that carriers that have been investigated have endeavored to reduce their violation rate after the year of the inspec-
TABLE 2-6 Reductions in Violations per Inspection, 2009–2016
|Fiscal Year||SMS Violations||Percentage Reduction|
|Percentage Reduction in Violations, 2009–2016||18.5|
NOTES: Data from Motor Carrier Management Information System, the period from 2009 through June 2016. SMS, Safety Measurement System.
SOURCE: Compliance, Safety, Accountability Effectiveness Measures (June 30, 2016).
TABLE 2-7 Carriers Receiving Warning Letters and Their Reduction in Crashes*
*It should be noted that an alternative reason why there might be a decrease in the crash rate for carriers that have alerts for the BASIC of Crash Indicator is regression to the mean.
SOURCE: Compliance, Safety, Accountability Effectiveness Measures (June 30, 2016).
TABLE 2-8 Reduction in Violation Rate Following an Investigation
|Fiscal Year||Number of Carriers||Violation Rate, 1 Year Prior to Investigation||Violation Rate, 1 Year after Investigation||Average Violation Rate in the 1 Year after Investigation (2-year average)|
SOURCE: Compliance, Safety, Accountability Effectiveness Measures (June 30, 2016).
tion, though it is possible that some of this decline is a result of regression to the mean.
General View of the Approach Taken in SMS
Given the assumption that carriers that violate safety provisions more frequently are also those that have a higher future crash risk, it is reasonable to use the percentage of (weighted) inspections that have violations that are associated with a specific type of safety deficiency, as metrics for each carrier. This is, in fact, what the six noncrash BASICs are. The sev-
enth BASIC is weighted crash frequency, which is obviously relevant to future crash frequency. The weights used are severity weights and time weights for the six noncrash BASICs, and time and crash severity weights for the Crash Indicator BASIC.
We find that carriers are generally supportive of a system that reliably discriminates between safe and unsafe carriers, and that motivates unsafe carriers to improve their safety practices, leaving safe carriers alone. The goal of SMS was not only to reliably identify unsafe carriers, but also to reliably identify safe carriers and thereby not subject those carriers to interventions. This is important since the intention is to make SMS percentile ranks public. While there are no studies that we are aware of that have attempted to measure the size of the economic impact from making SMS ranks public, it is reasonable to expect that the impact would be substantial. Poor percentile ranks presumably would hamper a carrier’s ability to attract business. FMCSA interventions typically begin with warning letters, but can progress to investigations, and at the extreme FMCSA can place an unsafe carrier out of business. Therefore, the accuracy and fairness of the inferences based on SMS are of great importance, not only to identify unsafe carriers, but also to ensure the process does not harm safe carriers.
While sensible, this approach does raise some questions, some of which have been discussed in the major critiques of SMS. The issues raised can be viewed as coming from two primary questions: (1) Does SMS do a good job of discriminating between unsafe and safe carriers? and (2) In doing so, is SMS fair to identifiable subgroups of carriers, such as buses, small-sized carriers, or carriers that travel predominantly in given states? To answer those two central questions, we addressed issues such as the following: (1) Does SMS account for state differences in the administration of commercial vehicle inspection? (2) How effective are the data sufficiency standards? (3) Are large and small carriers treated fairly? and (4) Should “nonpreventable” crashes be included in SMS computations? In addressing these and other issues, we at times suggest actions that FMCSA might consider taking to reduce the concerns expressed.
In the following, we review and synthesize the major concerns of FMCSA’s SMS raised in other recent reviews, as well as concerns based on our own analysis. These concerns are summarized in Table 2-9.
Panel Consideration of Concerns Raised about the Functioning of SMS in Practice
TABLE 2-9 Summary of Critiques of the Safety Measurement System
|Issue||Comments in Literature and by Speakers to Panel|
|Most, But Not All, BASICs Are Predictive||*ATRI (a) found the Driver Fitness and Controlled Substances BASICs were not very predictive.
*GAO made the same point.
*Green and Blower (2011) made the same point.
|Data Sufficiency Standards||*ATRI (a) argued this is a serious problem that can be partially addressed through the use of wireless roadside (partial) inspection data.
*ATRI (b) pointed out that many roadside inspections with zero violations were not reported.
*GAO argued that it is difficult to compare metrics that are so highly variable, providing research that showed that if the data sufficiency standards were raised, SMS percentiles would be better at discriminating between safe and unsafe carriers.
|Absolute vs. Relative Measure||*Independent Review Team supported use of an absolute rather than a relative measure for SMS.|
|Use of Data from Nonpreventable Crashes||*ATRI (c) argued that nonpreventable crashes should not be used for SMS, stating that doing so makes substantial differences and that raters can have high reliability in assessing preventability.|
|Differences in State-Specific Rates of Inspections and Violations||*ATRI (b) showed strong differences in inspection frequency and violation frequency by state.
*GAO showed similar findings.
|Stratification of Types of Carriers||*The stratification of SMS was raised by several of the speakers during meetings of the panel.|
|Better Measures of Exposure||*ATRI (b) showed substantial state differences in crash rates, which suggests that some states are riskier to drive in, which should be taken into account when forming the denominators for Crash Indicator and possibly for Unsafe Driving.|
|Quality of Existing MCMIS Data||*GAO noted delays in reporting crash data, and the quality of vehicle miles traveled and APU data are low due to misresponse and nonresponse.
*Independent Review Team was concerned with the degree of incomplete and missing data in MCMIS.
*Green and Blower (2011) said that crash data in MCMIS were substantially underreported.
*However, many praised FMCSA for its efforts, including State Safety Data Quality assistance (see Chapter 6 for further discussion).
|Issue||Comments in Literature and by Speakers to Panel|
|Appropriateness of Severity Weights||*ATRI (a) tried to validate severity weights through use of a logistic regression model of crash risk as a function of data on violations, crashes, and carrier characteristics. It then examined which violations were most important in modeling crash risk and looked to see whether those were the violations that had the highest severity weights. The fit of the resulting model was poor.
*GAO, to validate severity weights, also used a logistic regression of crash risk as a function of violations, crashes, and carrier characteristics.
|Currently Uncollected Variables that Might Substantially Improve SMS||*Independent Review Team said that FMCSA should try to identify carrier variables related to carrier management (see Chapter 6 for further discussion).|
|Predictive Strength and Sparsity of Violations: Aggregation of Violations into BASICs||*ATRI (b) showed that certain violations are not very predictive of crash risk.
*GAO pointed out that 593 of 750 violations occurred for less than 1 percent of carriers, and only 13 of the 750 had a clear association with future crash risk.
*Independent Review Team said FMCSA should distinguish between violations that are causal for crashes and those that are indicative of management behaviors that may or may not lead to high crash risk.
|Clean Inspection Reports||*ATRI (a) discovered that a carrier not having a percentile rank due to having sufficient inspections to be scored, but not a sufficient number of violations, had fewer crashes than those without a sufficient number of inspections. Therefore, clean inspections are worth including.
*ATRI (b) pointed out that a large percentage of clean inspections are not reported.
|Selection Effects||*The panel is concerned that biases on the part of the inspectors in selecting vehicles for inspections might therefore bias SMS.|
|Transparency of SMS Algorithm||*Independent Review Team argued for the importance of greater transparency in SMS (see Chapter 3 for further discussion).|
|Making Percentile Ranks Public||*Independent Review Team supported the publication of percentile ranks. This was also true for several of the presenters to the panel, though other presenters argued for the percentile ranks to remain private (see Chapter 3 for further discussion).|
|Issue||Comments in Literature and by Speakers to Panel|
|Comparing Carriers of Different Sizes||*GAO argued that the smallest carriers in each safety event group are most likely to fall into alert status due to the variability of their measures.|
NOTES: In the right-hand column of the table, ATRI (a): Compliance, Safety, Accountability: Analyzing the Relationship of Scores to Crash Risk, M.D. Lueck, October 2012; ATRI (b): Evaluating the Impact of Commercial Motor Vehicle Enforcement Disparities on Carrier Safety Performance; July 2014. A. Weber and D. Murray; ATRI (c): Assessing the Impact of Non-Preventable Crashes on CSA Scores, November 2015. C. Boris and D. Murray. GAO: Federal Motor Carrier Safety: Modifying the Compliance, Safety, Accountability Program Would Improve the Ability to Identify High Risk Carriers; GAO-14-114; S. Fleming, February 2014.
Independent Review Team: Blueprint for Safety Leadership: Aligning Enforcement and Risk. William R. Voss (chair), J.P. Dudley, N.R. Eisner, L.B. Judd, W.O. McCabe, and C.B. Raley, 2014.
ATRI, American Transportation Research Institute; BASICs, Behavior Analysis and Safety Improvement Categories; FMCSA, Federal Motor Carrier Safety Administration; GAO, U.S. Government Accountability Office; MCMIS, Motor Carrier Management Information System; SMS, Safety Measurement System.
SOURCE: Green and Blower (2011).
impact of state effects, data sufficiency limitations, and the use of nonpreventable crashes along with preventable crashes. FMCSA has done an excellent job of issuing responses to these concerns, which is important since they need to be addressed in order for SMS to maintain trust as a reliable discriminator between safe and unsafe motor carriers. Here, we provide our own views about these issues, discussing the degree to which the above concerns are justified. As part of this, we will examine whether SMS is fair, by which we mean that even though motor carriers differ in various respects—as a result of the risk environment that they operate in, the nature and size of their business, and the areas in which they operate—SMS should, to the extent feasible, compare carriers in a way that takes such differences into account. We now proceed to discuss issues that the panel itself or other observers have raised about the performance of SMS, as summarized in Table 2-9.
Most, But Not All, BASICs Are Predictive
Most of the BASIC percentile ranks have been found to correlate strongly to future crash risk. However, one or two BASICs have been shown to have weak or negative correlations. Specifically, ATRI (2012),
TABLE 2-10 Log-Linear Negative Binomial Regressions Models for BASICs
NOTE: BASICs, Behavior Analysis and Safety Improvement Categories; HOS, hours of service.
SOURCE: American Transportation Research Institute (2012), adapted from Table 4-8.
GAO (2014), and Green and Blower (2011) have shown some of the BASICs, especially Crash Indicator and Unsafe Driving, have very strong correlations with future crash risk, and three more BASICs have moderately strong correlations. However, Driver Fitness has been shown to have a negative correlation with future crash frequency. Given that, some critics have suggested that Driver Fitness be considered for removal or refinement.
ATRI (2012) studied a subset of 471,306 motor carriers from a sample of 772,281 registered interstate and intrastate hazardous material carriers that had evidence of recent activity in the 24 months from April 12, 2010, to April 11, 2012. It focused on the five BASICs then available to the public: (1) Unsafe Driving, (2) HOS Compliance, (3) Vehicle Maintenance, (4) Driver Fitness, and (5) Controlled Substances/Alcohol. To determine whether percentile ranks were related to crash frequency, ATRI (2012) fit a log-linear negative binomial regression model with dependent variable crash frequency and predictors’ percentile ranks for each BASIC. It should be noted that the crash data were contemporaneous, so ATRI (2012) was not evaluating SMS in a predictive environment. Its results, provided in Table 2-10, showed a strong correlation between crash frequency and SMS percentile ranks for the Unsafe Driving, Hours of Service, and Vehicle Maintenance BASICs, respectively. However, ATRI (2012) found a negative relationship for both Driver Fitness and for Controlled Substances/Alcohol BASICs. That is, in those two cases, higher (worse) percentile ranks were associated with lower crash frequencies. ATRI (2012) raised the concern that this was due to inclusion of violations that were not associated with safety deficiencies that contributed to crashes. Further,
Green and Blower (2011) looked at scatterplots of Crash Rates by BASIC percentile ranks. These plots showed strong positive relationships for Unsafe Driving, Fatigued Driving, Vehicle Maintenance, and Controlled Substance/Alcohol. However, the plots for Driver Fitness and for Loading/Cargo of the association between crash rate and percentiles showed a negative association. (Note that SMS has defined slightly different BASICs over time.)
A related question is whether those receiving alerts from SMS have higher crash frequencies than those not receiving alerts. To answer this question, ATRI (2012) classified carriers into one of two groups depending on whether they received an alert for each BASIC. They computed the average crash rates for each group (alerts and nonalerts) and took the ratio. Values greater than 1.0 correspond to carriers that received an alert having a higher average crash risk, and vice versa. The results are given in Table 2-11.
Table 2-11 suggests that, with the exception of Driver Fitness, intervening with carriers with alerts is sensible. ATRI (2012) posited that one problem with Driver Fitness might be that the associated severity weights might not all be assigned such that violations that are more associated with future crash frequency are given higher weights.
ATRI (2012) also developed negative binomial regression models to compare carriers with alerts against carriers without alerts. The coefficients for the indicator variable for alert status are given in Table 2-12. This analysis includes standard errors with the regression coefficients, which helped the panel assess the magnitude of the effect. Again we see
TABLE 2-11 Relative Crash Risk for Carriers Above versus Below Alert Threshold in Each BASIC
NOTE: BASIC, Behavior Analysis and Safety Improvement Categories; HOS, hours of service.
SOURCE: American Transportation Research Institute (2012), adapted from Table ES-1.
TABLE 2-12 Log-Linear Negative Binomial Regressions for Univariate Models Comparing Carriers with Alerts against Carriers without Alerts
|BASIC||Parameter Estimate||Standard Error|
SOURCE: American Transportation Research Institute (2012), adapted from Tables 10, 12, 14, 16, and 18.
that Driver Fitness is the only BASIC where the indicator variable for alert status is inversely related to crash risk.
The study carried out by GAO (2014) used data from December 2007 through June 2011, the first 2 years to fit models and the last 18 months for purposes of evaluation. GAO (2014) pointed out that for Driver Fitness and for Controlled Substances/Alcohol, the association with crash risk is not very strongly positive.
Green and Blower (2011) evaluated the effectiveness of SMS by comparing the crash rates for the carriers with BASIC percentile ranks that exceeded the SMS thresholds to the carriers with BASICs not exceeding the SMS thresholds. The results, given in Table 2-13, showed that the carriers SMS selected for interventions had higher crash rates than those that SMS did not select, though for Driver Fitness and Improper Loading the evidence was weaker.
We on the panel believe that while the evidence against the Driver Fitness BASIC is worrying, eliminating it based on the currently available information would be premature. As an example, FMCSA has shown that when focusing on for-hire, combination truck carriers, Driver Fitness became a much better predictor. Further, ATRI has shown that by using the number of total alerts as a metric, there was a monotonic, positive relationship with crash risk. This suggests that all of the BASICs have unique contributions to assessments of safe operations. As CMV safety is multidimensional (many factors contribute to making a carrier safe), a BASIC percentile rank in one category that is not strongly correlated with future crash risk may still be predictive when combined with other BASIC percentile ranks. For example, regressing crash rate on BASIC percentile ranks with interaction terms can reveal underlying relationships between the BASICs. We are not necessarily arguing for retention of Driver Fitness
TABLE 2-13 Crash Rates for Carriers Identified by the Safety Measurement System Compared to Those Not Identified
|BASIC Threshold Exceeded||Carriers||Crashes||Power Units||Crash Rate per 100 Power Units||Ratio to Not Identified|
|Controlled Substance and Alcohol||1,013||6,860||104,799||6.55||3.14|
|Improper Loading/Cargo Securement||9,409||16,747||421,670||3.97||1.90|
SOURCE: Green and Blower (2011).
in its current form. However, the consensus of our panel is that the evaluations carried out by FMCSA support the judgment that six of the seven BASICs are positively (sometimes very strongly) associated with future crash frequency, and that the unconditional correlation of Driver Fitness’s percentile ranks with future crash frequency is insufficient to remove it from SMS. We describe a new approach to SMS in Chapter 4 that is more natural to the problem and the dataset used and can naturally address the question of modification of BASICs to enhance their predictive strength.
Data Sufficiency Standards
Data sufficiency standards must trade off the reliability of SMS measures and percentile ranks with the percentage coverage of the carrier population that is given SMS measures. As data sufficiency standards are relaxed, resulting in less reliable SMS measures, it is possible to provide SMS percentile ranks for a larger fraction of the active CMV carriers. (Appendix B provides the data sufficiency standards for each BASIC.) Also, since most carriers operate at most a few vehicles that are therefore inspected infrequently, it is difficult to compare small carriers against each other because their measures are so variable. GAO has carried out research that demonstrates that if FMCSA raised its data sufficiency standards, SMS would better discriminate between carriers that have lower and higher future crash risk, though GAO acknowledges that would result in a small number of carriers for which SMS can provide percentiles.
There is no getting around the point that providing BASIC measures to carriers that have very infrequent inspections will result in highly variable assessments of such carriers. This is simply because not much is known about the frequency of violations for small carriers. Such high variance measures can result in mischaracterizing the nature of a carrier—the high variability could result in the carrier being given alerts more or less often than what would be warranted given its behavior. On the other hand, the industry is highly skewed, being comprised of a very large number of small carriers. If the data sufficiency standards were raised, a high percentage of the industry would be excluded from measurement by SMS and therefore monitoring by FMCSA. We believe that this issue should be further investigated. Our preferred model, described in Chapter 4, will have some ability to reduce the variance of these measures through use of smoothing with the measures of a carrier’s peers. Ultimately, this is a policy decision for FMCSA to make, but one that can be informed by additional research.
Use of Absolute versus Relative Measure
The Independent Review Team (2014, p. 9) recommended that FMCSA: “Continue to identify and implement methods for emphasizing absolute rather than relative individual motor carrier rankings so that it does not undermine industry’s willingness to innovate and share best practices.” The team based its recommendation on the following conclusion:
The relative SMS percentile ranks motor carriers based on their SMS scores relative to their peers. In this system, it is possible for a motor carrier’s rating to rise or fall based on the actions of its peer carriers and may be unrelated to any action by the rated carrier. For the investigators, the relative nature of the BASIC scores makes it difficult for them to discern if changes in percentile ranks are occurring because of: (a) aging of violations, (b) changes in the peer group’s performance with no change in operator performance, (c) real changes in a carrier’s operating performance. For the motor carrier, the Independent Review Team found that the relative scoring actually can discourage the sharing of leading safety practices because any increase in the score of a peer may result in a reduction in the relative rating of the motor carrier that shares it. It is possible the competitor subsequently achieves a better percentile score while the first carrier’s own relative rating decreases without any actual change in safety performance.
As the Independent Review Team pointed out, the use of percentile ranks, rather than the SMS measures, is a relative metric that has the following disadvantage. It is possible for a carrier to lower its SMS score from one time period to the next and still have its percentile rank increase as a result of larger improvements on the part of the remaining carriers in its safety event group. On the other hand, using an absolute standard of performance, as the entire industry gets progressively safer, the standard will at some point become irrelevant. Having a relative metric enables FMCSA to keep pressing for better performance. Also, a relative metric is natural since CSA/SMS operates on a fixed budget. The program can only support a fixed number of interventions of various types, which is consistent with looking for the worst percentiles of carriers for interventions. Since there are advantages to both relative and absolute measures, we believe that FMCSA should strongly consider use of a two-dimensional metric that takes into consideration both the SMS score and the percentile rank, using some objective formula, to decide on which carriers will receive interventions. Further, given that a safety event group could be a subset of the active carriers that are very safe performers, there might be an advantage in seeing how a carrier ranks over all active carriers. Lastly,
given that the only reason for safety event groups is to compare measures with similar variances, it might be beneficial to see how a carrier’s measures compare to the entire population of carriers with SMS measures. This is because, while a relatively small safety event group could presumably have widespread improvement, it would be more difficult for this improvement to occur across the industry.
RECOMMENDATION: Given that there are good reasons for both an absolute and a relative metric on safety performance, Federal Motor Carrier Safety Administration should decide on the carriers that receive Safety Measurement System (SMS) alerts using both the SMS percentile ranks and the SMS measures, and the percentile ranks should be computed both conditionally within safety event groups and over all motor carriers.
Use of Data from Nonpreventable Crashes
There are crashes considered not preventable by many in the CMV community. Examples include colliding with an animal in the roadway, being hit while legally parked, being struck by another driver who ran a red light or a stop sign, being hit by another driver who was under the influence of drugs or alcohol, or a truck-assisted suicide by a pedestrian or driver. ATRI (2015) showed large changes to the Crash Indicator BASIC when removing the contribution to MCMIS from such crashes. The study that ATRI carried out focuses some of its analyses on the above five situations, and is therefore conservative since there are, of course, many other types of nonpreventable crashes. The suggestion is that SMS will be more effective at identifying unsafe motor carriers if such crashes are removed from calculation of the Crash BASIC.
Put another way, the suggestion is that nonpreventable crashes should not be included in SMS because any carrier placed in that same circumstance would have been involved in a crash, and so including them in the Crash BASIC does not help in discriminating between safe and unsafe carriers. This is an important issue, especially for small carriers, since such events can be extremely damaging, possibly putting some small carriers out of business.
However, some considerations complicate the proposal that such crashes be set aside. First, a large percentage of such crashes might have been prevented by drivers who took a more defensive approach to operating their vehicles. For example, they might have given themselves a larger distance from a swerving driver, deaccelerated slower when approaching a stoplight or a crash scene, parked in a more well-lit location, and so on. This is supported by the high correlation between the Crash Indicator
BASIC and future crash risk, making it likely that there is some predictive value from data on most crashes, not just the preventable ones. It might also be the case that clearly nonpreventable crashes will make up such a small percentage of overall crashes that removing them would make little difference in the utility of the Crash Indicator BASIC.
Second, it would be difficult to create an algorithm that would take as input the evidence at the scene of a crash and determine which crashes were and were not preventable. If an algorithm could not be created, a subjective element of the determination would have to be part of the decision rule at the state level. Also, additional data (beyond the FMCSA required data elements) are recorded in different states for crashes that meet the reporting criteria based on the contents of the state’s own standard crash report form or crash reporting software. (National guidelines—the Model Minimum Uniform Crash Criteria [MMUCC] and the American National Standards Institute ANSI D16.1 standard—are not mandatory but provide guidance that states may choose to use in designing their own data requirements for crash reporting.) Even in the most obvious cases (a single-vehicle crash), the causal attribution may not be simple enough that it could be assigned using a software algorithm. Expert investigation, including postcrash investigation, is required in order to be reasonably certain that all of the causal contributing factors are accounted for, and fault is apportioned as accurately as possible. This process is difficult to manage even in a single state. Doing so across all states, with multiple datasets, would require extraordinary, sustained, and costly efforts. In addition, FMCSA has no authority to ensure uniform application of such a guideline. Further, the lack of a uniform dataset standard adopted by all states means that expanding the data available to examine other vehicles and drivers involved in the crash, as well as to potentially assess the crash circumstances to reliably apportion fault, is not feasible or practical at this time.
Third, there is the question of the reliability of such evaluators. In a 2012 FMCSA report (Craft, 2012), researchers using police accident report data coded 1,221 crash records across five severity categories, with 93.2 percent agreement with assessments provided by researchers using data from the large truck crash causation study (Blower and Campbell, 2002), which was considered to be a reasonable surrogate for truth. This study is certainly encouraging for the position that such assessments would be reliable. However, ATRI (2015), based on findings in FMCSA (2012), stated that, “The reliability of PARS [police accident reports] was tested by comparing them to FARS [Fatal Analysis Reporting System] records. There were significant inconsistencies between PAR and FARS data for areas critical to determining culpability; 82 percent of the PARS were
missing driver contributory factors and 47.5 percent of the PARS were missing the first harmful event.” So the reliability remains unclear.
FMCSA has initiated a research project to look into the costs and benefits of setting aside nonpreventable crashes. We believe that such research is of interest. While we are skeptical about setting such data aside, there might very well be schemes in which downweight crashes judged to be nonpreventable—even if the method for arriving at such a determination is error-prone—could result in an SMS percentile rank for Crash Indicator that is preferable to the current version. One way of doing this would be to downweight the vehicle struck in a collision relative to the striking vehicle. However, we do not believe that additional research along the same lines should be given a high priority since we do not believe that such an appreciable change will make a large difference in percentile ranks. There is the separate question of whether to use such a metric as the dependent variable for objective functions evaluating the other BASICs, and this is also worthy of study.4
Differences in State-Specific Rates of Inspections and Violations
ATRI (2014) and GAO (2014) showed that there are strong differences by state with respect to the frequency of inspections and the frequency of particular violations. With respect to differences in the frequency of inspections, ATRI (2014, p. 11) found: “In 2011, on average, CMV enforcement personnel conducted 12.2 RIs [roadside inspections] per MVMT [million vehicle miles traveled] and issued 22.8 violations per MVMT. . . . Maryland had the highest inspection rate with 27.9 RIs per MVMT, which was 128.7 percent greater than the national average. In comparison, Oklahoma conducted the fewest RIs with 3.7 per MVMT, which was 69.7 percent less than the national average.”
Second, with respect to differences in the frequency of violations, ATRI (2014) found that: “among all driver violations reported in 2010, the share of violations for speeding varied significantly from state to state, representing 31.7 percent of all driver violations in Indiana, 16.9 percent in Ohio and 4.2 percent in Arizona.” Further, “While the national average was 11.97 light violations for every speeding violation, the ratio varied from a low of 1.91 in Indiana to 321.02 in Texas.” Also, the “national
4 Some have proposed that carriers’ internal data be used to assess preventability of crashes through appeals to FMCSA. This idea has several major disadvantages: (1) carriers’ crash records are subject to great variability in quality and detail; (2) FMCSA would have almost no influence on the quality and completeness of such data; (3) it is unlikely that a sufficient number of carriers will have appropriate data; and (4) assembling and managing the data would be extremely complicated and costly. Further, some carriers may be incentivized to bias their classification.
average for ‘windshield wipers inoperative or defective’ . . . violations per 100 relevant RIs was 2.0 . . . Texas issued 12.2 windshield violations per 100 relevant RIs, which was 510.0 percent greater than the national average and ranked first nationally. In comparison, North Dakota issued 0.19 windshield violations per 100 relevant RIs, which was 90.5 percent lower than the national average. . . .” Further, ATRI showed that if these state differences were eliminated, SMS percentile ranks would change appreciably.
ATRI (2014, p. 5) makes it clear that this is not something that FMCSA can unilaterally change.
While FMCSA sets guidelines on the adoption and enforcement of Federal Motor Carrier Safety Regulations (FMCSRs), each state enforcement agency has the discretion to emphasize specific enforcement foci and activities in order to accomplish FMCSA’s overall safety goals, with this privilege extending even further to local jurisdictions. For example, FCMSA acknowledges that different enforcement jurisdictions may utilize differing methods to select or screen a commercial motor vehicle (CMV) for inspection. Likewise, it is the decision of the enforcement officer to issue a citation, violation, or both during a roadside inspection (RI). Finally states have the discretion to vary their enforcement foci, for instance, taking a close look at driver issues as opposed to vehicle defects, or focusing more attention on certain failures (e.g., brakes) versus behaviors (e.g., speeding).
The report also provides statistics that support the assertion that the driving environment is more challenging in some states than others. The reports states that the national average for large truck crashes per MVMT in 2011 was 0.26. Wyoming had 0.52 large truck crashes per MVMT, which was twice the national average and ranked first nationally. Conversely, New Mexico had the lowest rate with 0.08 large truck crashes per MVMT, which was 69.2 percent less than the U.S. average.
To summarize, there are state effects in how frequent crashes are from state to state. Also, the frequency that inspections take place and various violations are cited depends on the state in which a truck or bus is traveling. Let us first discuss crash frequency, which is essentially one of the seven BASICs. There are many known causal factors for crashes, including two-lane highways as opposed to interstate highways, and frequency of ice and snow on the roads, congestion, visibility, etc. As a result, carriers that travel more often in states with a greater frequency of these and other causal factors will likely have higher crash frequencies than other carriers, assuming that their propensity for crashes is otherwise identical. Therefore, the SMS measures for carriers that travel more often in those states will likely be higher as a result. To eliminate this bias, FMCSA
would have to develop a model for exposure that took into consideration not only vehicle miles traveled but also the relative risk of the environment traveled through. Unfortunately, the information needed for input for such a model, which would include the time and location for all trips a carrier makes, is not available.
The situation is similar for violations. It is reasonable to believe that there are violations that are obvious and there are violations that are more borderline. For instance, depth of tire tread can be slightly past what the regulations require, and as a result whether a violation is issued can depend on whether the inspector is focused on such violations, and this might depend on the environments encountered when traveling on that state’s roads. The current exposure measure for most violations (besides Unsafe Driving) is the number of inspections. One could argue, analogously to crash frequency, that carriers that travel more often in states that issue borderline violations with greater frequency will have a greater propensity for violations, independent of their safety performance. Again, one could develop a model that could correct for this difference in propensity for issuing violations by developing an exposure measure. However, here again, the inputs necessary to develop such a model, including the time and location for all trips, does not exist. Given that, there will be a bias for carriers whose CMVs travel to states with greater propensity for issuing violations. If such data were available in the future, the new approach to SMS described in Chapter 4 could accommodate the new information to better understand state effects.
Stratification of Types of Carriers
In order to provide for a fair comparison of carriers, SMS would benefit from stratification that formed peer groups of carriers that were undertaking trips of similar risks. This is somewhat taken care of by standardizing by vehicle miles traveled for the Crash Indicator BASIC, but, as discussed above, trips through some states and some roads at some times of the year and at some times of the day are riskier than others. More importantly, some truck and bus tasks are much riskier than others, such as transporting logs. This is related to the discussions of measures of exposure for crashes below, and in the above discussion of state differences in inspections and violations. A similar argument can be made for the noncrash BASICs, which are normalized by the weighted number of inspections.
Currently, SMS makes use of two factors to stratify carriers: (1) carriers that have a certain percentage of combined versus straight trucks (and a similar stratification for buses), and (2) safety event groups, which is essentially stratification by size. Combination trucks tend to have sub-
stantially higher annual VMT than straight trucks. Since trucks with more VMT are more exposed to crashes, it makes sense to treat carriers that operate primarily combination trucks separately from those that operate primarily straight trucks. However, that argument is somewhat offset by the use of denominators that are similar to VMT. A better argument points to studies that support the view that combination trucks are riskier to drive than straight trucks (National Highway Traffic Safety Administration, 2013, Table 48).
There are currently only a few additional variables that FMCSA could stratify on if desired. The primary ones potentially of use are business/operation data, type of cargo carried, and hazardous materials shipped. The key suggestion that is heard is to stratify SMS by trucks or bus carriers, then by type of business within the truck and bus strata. For instance, one might wish to divide buses up into school buses and other local transportation, and other, or divide trucks up into interstate and local carriers. It might also be interesting to consider stratification of hazmat vehicles and nonhazmat vehicles, and to consider more delineations in the separation of straight and combined carriers. The general advantage of stratification in SMS would be to form peer groups where the risk of travel is more comparable for the members of the stratum or peer group. Otherwise, crashes might be occurring simply as a result of having a riskier set of trips. Unfortunately, the quality of the responses to type of business is suspect since the carriers themselves identify their types of business and types of cargo they carry. These responses are often fairly wide ranging, possibly to support any possible business opportunity.
In addition to the quality of the characteristic on which to stratify, two considerations need to be traded off in deciding whether additional stratification is desirable. First, there is no reason to employ additional stratification if the resulting cells are not clearly more homogeneous with respect to risk. Therefore, empirical work needs to be carried out to determine whether the better carriers (for example, eliminating those outside of the thresholds) with those differing characteristics have clearly different frequencies of crashes. (It is useful to look at the better carriers since those carriers that need to improve their operations would be excluded.) The result would be better discrimination between safe and unsafe carriers. On the other hand, further stratification results in having fewer peers, and the fewer peers that a carrier has, the more difficult it is to be certain that a carrier’s performance is atypical by being among the highest by a certain percentage.
Trading off these two considerations is difficult, assuming the desire to retain the same number of safety event groups throughout the stratification. A collection of carriers with very high-risk businesses should be given their own cell even if very few carriers have the characteristics.
Sometimes, such situations are handled well by statistical models of the risks involved. Absent use of such an approach, FMCSA needs to examine whether some additional stratification results in a collection of carriers for intervention that is preferable to the current stratification.
Better Measures of Exposure
The goal of SMS should be to sum the risks of crashing for each trip that all the CMVs make for a carrier; to compare the expected number of crashes, assuming reasonable efforts to operate safely, with the observed number; and to intervene with those carriers that have a large multiplicative or additive positive difference between observed and expected crashes. Unfortunately, as noted several times in this report, this objective is not attainable given the current data that are available.
A measure of exposure sums the risks of all trips and creates a way of standardizing that allows numbers of crashes (or numbers of violations) for carriers to be comparable. A first attempt would be to divide the number of crashes by total VMT by a carrier to arrive at crashes per miles traveled (or violations per mile traveled) as comparable statistics. The problem with using total VMT is that this quantity in MCMIS is self-reported with substantial nonresponse and likely substantial misresponse. The reported VMT in MCMIS is a poor-quality measure of exposure. FMCSA instead uses APU, averaging over three possible responses in a 2-year period, for a carrier, multiplied by a utilization factor, which accommodates carriers that travel more and less than others, but truncates very low and very high values for reported VMT. However, APU is itself of uncertain quality, and likely goes out of date for some carriers fairly quickly due to growth or diminishment of business, mergers, or other factors. APU also has some nonresponse, though much less than for VMT. Given that this factor can clearly be off by a substantial amount and has a direct impact on the Crash Indicator BASIC and the Unsafe Driving BASIC, it is vitally important for FMCSA and CMV associations to work collaboratively on an improvement. SMS cannot be any better than the data that are input into it.
Further, as ATRI (2014) showed, there are substantial state differences in crash rates, which suggests that some states are riskier to drive in than others. This could be due to quality of roads, congestion, terrain, placement of rest stops, types of weather, and other factors that differ by state. Trucks used for long-distance transport tend to operate on the safest roads, such as interstate-quality highways. In comparison, logging operations may use unpaved roads, with variable and uncertain loading, while delivery operations to highly urbanized areas encounter substantial
congestion. Similar safety-relevant operational distinctions between different types of bus operators could be made.
If possible, the most important of these factors, in addition to total VMT, should be taken into account when comparing the number of crashes for the Crash Indicator BASIC. With the required information, such factors could be accounted for in a number of ways. For instance, a statistical model of crash risk as a function of these factors could be developed and, given the model, either weight miles traveled given the estimated risk of each additional mile, or stratify by the total exposure risk. If the input data on VMT could be improved, the method for exposure that FMCSA currently uses would be satisfactory until the above information is available. In addition, such improvements will not only improve the Crash Indicator BASIC as an indicator of which carriers receive interventions, but also as an improved measure of future crash risk.
Appropriateness of Severity Weights
ATRI (2012) argued that certain violations suggested to have a stronger relationship to safety events may not be the best predictors of crash risk and that CMV enforcement strategies may need to shift focus to address other violations that may have stronger relationships to crash risk. ATRI developed a list of 10 violations that it referred to as Crash Predictor violations. These were the violations most strongly associated with future crash risk. ATRI then examined the state differences in issuing these violations, finding considerable differences. States with high frequencies of these 10 violations had higher crash rates than states with high frequencies of what FMCSA calls red flag violations that it viewed as the most indicative of a lack of safety behavior.
Further, the Independent Review Team (2014) included the following recommendation relevant to SMS: “Recommendation 2.3.1: FMCSA should expand its work with industry and stakeholders to develop SMS enhancements. These enhancements should enable FMCSA to better discern motor carrier management actions that lead to crashes and to allow more timely and appropriate investigation and enforcement actions.”
ATRI (2012) and GAO (2014) tried to validate severity weights by modeling crash frequency as a function of violations, previous crashes, and carrier characteristics. The hope was to find that the violations that were most predictive were generally the violations that had the highest severity weights. Unfortunately, the fit of the resulting models was poor, which is not that surprising. As mentioned before, crashes have many causes, some of which are very particular to the situation, and some of which a carrier has no control over. These factors add noise to such a
model and, as a result, the lack of fit of crash frequency will likely be substantial.
We believe that this information is of interest but insufficient to change severity weights at this time. The severity weights were derived starting with subject-matter expertise refined using empirical methods (Volpe, 2010). This research relied on sub-BASIC grouping, which again utilized subject-matter expertise. We do not believe that research into the relationship between severity weights and future crash risk should be a high priority for FMCSA since the algorithm is not extremely dependent on such weights. Also, such research is similar to the models that were argued to be too difficult to development earlier in this report. (For further discussion, see the sections below on predictive strength and sparsity of violations, and on violations and severity weights in Chapter 6.) In addition, a feature of the approach described in Chapter 4 has a natural way of adjusting such weights over time.
Predictive Strength and Sparsity of Violations: Aggregation of Violations into BASICs
Two questions raised by the data on violations are (1) whether all 899 violations are useful in discriminating between safe and unsafe carriers, and (2) whether aggregating them into the current six noncrash BASICs (the Crash Indicator BASIC is an obvious measure) makes sense, or whether different or more or fewer groups be considered. ATRI (2014) demonstrated that most violations are not individually very predictive of crash frequency. The GAO (2014) and the Independent Review Team (2014) reports also raised questions about the importance of all of the violations, with GAO determining that 593 out of the then 750 violations occurred for less than 1 percent of carriers, and only 13 of the 750 violations had a clear bivariate association with future crash risk.
FMCSA is well aware that individual violations have modest associations with future crash risk. That is the primary reason for aggregation into BASICs. Further, as pointed out by safety experts currently and formerly from Schneider National in presentations to the panel, some of the violations used in the BASICs are not clearly related to operational safety. For example, the paperwork accompanying a hazardous materials shipment can result in a violation when that paperwork is primarily the shipper’s responsibility. Also, whether a hazmat placard mounted at the correct angle contributes to safety can certainly be questioned. There are also violations for the various detail lights on the frame of a truck being out, and it is unclear how relevant such violations are to safety assessment.
It is important to point out that violations could appear to be uncor-
related with crash risk but be extremely predictive in particular situations. An example might be a CMV with tires with soft tread, which might not be generally predictive but might be very predictive when roads are icy. Confounding factors that affect the relationship between a violation and crash risk are referred to as moderators, factors that have important interaction effects with individual violations.
Also, the assumption on which SMS is based is not that violations are predictive of crash frequency, but rather that violations are indicative of carriers with poor safety operations. In addition, violations that only rarely occur could be very predictive of crashes conditional on less common circumstances. Therefore, the question of whether individual violations should be discarded is quite complex. It is difficult to lay out a decision rule for their inclusion or exclusion.
To add to this, Collin Mooney, executive director of the Commercial Vehicle Safety Alliance, provided examples to the panel whereby almost identical situations could result in different violations with substantially different severity weights, which would have substantially different impacts on SMS measures (see Chapter 6 for examples). Related to this, different software packages help road inspectors translate their observations into MCMIS violations, and these packages are not required to satisfy any national standards. There are currently efforts to mandate that these software packages be standardized to some extent to map the observations of inspectors to violations in the same way. Clearly, the fact that identical situations can result in very different violation codes with different severity weights and the lack of any national standardization of the coding that is done adds additional variability to the percentile ranks, which makes it more difficult to identify the carriers operating less safely than others.
Based on these considerations, it is undeniable that some of the current collection of violations, especially those that are not the responsibility of the driver, should not play a role in SMS. It is plausible that they may still be useful in identifying carriers that are giving insufficient attention to safety, but it seems more likely that they are not playing a beneficial role in SMS and should be dropped. In support of this, FMCSA should consider examining the current group of 899 violations and remove any that clearly are not indicative of a carrier operation that prioritizes safe operations. Data quality would benefit, since the collection of detailed information on all possible violations is a costly exercise; once it is determined which set of violations are indicative of safe operations, FMCSA can concentrate on collecting quality information on that subset. In addition, FMCSA should consider setting up the violations in a manner that reduces the possibility of alternate scoring of identical circumstances, which might include setting standards for how the software tools func-
tion, but which also may be related to redundancies in the current way the violations are formatted. Finally, the new approach to CSA/SMS we describe in Chapter 4 has an empirical method for analyzing the utility of individual violations.
The second issue raised above concerns the bundling of the violations into groups, which has up to now relied on a strong subject-matter understanding of which violations are indicative of related behaviors by carriers. For instance, it makes sense to have a separate Vehicle Maintenance BASIC because there is a prima facie case that the mechanical condition of trucks affects crash risk (though this does not defend all of the violations that make up the Vehicle Maintenance BASIC). Having said this, a large number of violations could be considered for deletion or for movement to other BASICs, and there is the question of whether to form new BASICs out of the components of existing ones. FMCSA is clearly open to the modification of the grouping of violations into BASICs, as regular changes have been made since the inception of SMS in 2010. An advantage of modestly increasing the number of BASICs is that it could provide more targeted information as to what safety issues a carrier needs to address. Also, it would be beneficial if the BASICs identified relatively separate sets of carriers for interventions, as otherwise the use of different BASICs offers little benefits. The panel analyzed this issue by forming 15 two-by-two tables of carriers, with columns defined by two different noncrash BASICs and the rows defined by not exceeding or exceeding the alert threshold. This analysis is flawed because the stratification by safety event groups complicates things, but it still provides some idea of the redundancy of BASICs. One measure of agreement is the percentage of those with alert status for a given BASIC also getting alert status for the BASIC that is being compared. This percentage is as high as 55 percent for Driver Fitness compared to Vehicle Maintenance, and 34 percent for Hours of Service and Unsafe Driving. These values do not support worries about the redundancy of the BASICs, though this analysis is very preliminary.
Succeeding in setting up somewhat different BASICs better at defining separate areas in which a carrier may need an intervention is a challenge. This is because the great majority of violations are rarely cited, and crashes are also rare, making such empirical research difficult. (The related question as to whether entire BASICs should be retained is addressed elsewhere in this report.) The new modeling approach described in Chapter 4 provides a natural way of examining the question of which violations are grouped into BASICs and whether the number of BASICs should be changed. In particular, the later discussion of a multidimensional item response theory (IRT) model is relevant to this question.
Clean Inspection Reports
ATRI (2014) pointed to “a 2012 study that found that only 10.4 percent of roadside inspectors ‘almost always’ completed a RI report when no violations were issued, while 6.8 percent ‘never’ completed a RI report [with no violations, presumably].” If this practice involving “clean” inspections is widespread, this is an important source of bias since such inspections will reduce the estimated frequency of violations. A clean inspection provides important information about the extent to which a carrier prioritizes safe operations. It would be helpful to report as many of the clean inspections as possible. One remedy is to make reporting of clean inspections mandatory.
Further, carriers that have only clean inspections for the relevant 2-year period are not given a score in SMS. We understand that a carrier with only clean inspections is an extremely safe carrier and not going to be issued an intervention. But such carriers are peers. Their performance as peers is relevant to understanding what is feasible, and to the relationship between SMS percentile ranks and future crash frequency, and therefore the information is critical to know. In addition, as ATRI (2014) pointed out, this group is different from the insufficient-data group because their future crash risk is considerably smaller. Therefore, FMCSA should change its data sufficiency standards to accept, once a set number of inspections is carried out, carriers with a sufficient number of only clean inspections.
The carriers pulled over for inspections are often chosen based on such observations as swerving, heavy braking, or speeding, or the trucks appear to be in disrepair. CVSA officials have no stated set of considerations on which to base their decision as to whether to pull a truck over for inspection; the decision is viewed as a matter of on-the-job expertise acquired over time. These selection effects are not viewed as a concern since it is thought that CVSA officials are generally inspecting the right trucks (although it is known that most inspections occur on limited access roads, so CMVs operating elsewhere could receive some modest “protection” from inspection). There would be a problem if the CVSA officials were using the wrong indications for identifying which trucks to inspect. Then, the SMS measures and percentile ranks for those carriers could be better or worse than those of their peers for reasons other than their safety performance. The best way to learn about such selection factors and their correlation with violation frequency would be to randomly inspect trucks that would not have been selected for inspections. Such a research
proposal has been made in the past but is difficult to get approved. We support such a project, since doing so would provide important information on which carriers are and are not inspected, and which indications are more and less important. Finally, we point out that the new approach to SMS described in Chapter 4 jointly models inspections and violations and therefore can make allowance for such effects.
Comparing Carriers of Different Sizes
It should be pointed out that safety event groups are not defined by the size of the carriers. The variables that define safety event groups are the number of relevant inspections for five BASICs, the number of crashes for the Crash Indicator BASIC, and the number of inspections with an unsafe driving violation for the Unsafe Driving BASIC. However, the difference between the current definition of safety event groups and size is fairly modest, so our analysis proceeded as if the categories were defined by size. Further, we see no advantage to defining safety event groups through use of number of crashes or number of inspections in comparison to defining safety event groups by the current APU.
However, if the stratification of carriers by size is carried out using APU, this does run into the increasingly important complication that carriers are using contractor drivers, and renting out vehicles to other carriers and the overall “Uberization” of the industry, but we have not considered how FMCSA should deal with this future problem. The usual justification for the use of safety event groups is that the size of a carrier can imply various aspects of its operation that represent different challenges and opportunities for ensuring safe operations. For instance, very large carriers can afford technology and fatigue management programs that can provide more assistance in developing safe practices. We do not feel that this is a reasonable justification for this peer group stratification. After all, the public has an interest in safe operations regardless of the size of the carrier.
However, as mentioned by GAO (2014), there is a justification for peer grouping by size. While the carriers in a safety event group are intended to be roughly of the same size, there remain substantial size differentials within some safety event groups, especially the group of largest carriers. In those cases, for carriers that have about the same frequency of violations below the threshold, the carrier with the measures with the highest variability will end up above the threshold more often due to the randomness of being selected for inspections and of finding violations. Therefore, formation of safety event groups that are as homogeneous by size as possible (and therefore homogeneous with respect to the variability of the measures) helps to promote fairness.
The problem is that fully homogeneous peer groups are not possible. Some additional size heterogeneity will always remain. To avoid harming the smaller carriers in each safety event group, instead of only publishing the percentile ranks, it might be preferable to include some type of confidence interval with the percentile ranks. In that way, insurance companies, shippers, and the public will be able to observe that a high percentile rank could have been due to random factors. Further, it would also be desirable for FMCSA to take the natural variability of the percentile ranks into consideration in the determination of which carriers receive interventions, rather than just treating the percentile ranks as fixed quantities. (A natural assessment of variability is one of the key advantages of the proposed alternative approach described in Chapter 4.)
We are aware of one other problem raised by safety event groups. We were told by the representative of a carrier that between 2 months, its BASICs did not appreciably change but its percentile ranks increased substantially. The company had grown slightly in size and, as a result, had been placed in the next highest safety event group, which was generally a safer group. To avoid this discontinuity at the boundaries of safety event groups, the groups can instead be defined dynamically as the closest to many carriers (which can be a function of the size of each carrier). (See Federal Motor Carrier Safety Administration, 2016b.) Such dynamically defined safety event groups would mostly eliminate boundary problems and would be easy to implement.
While interesting questions about the optimal number of safety event groups and where the boundaries should be set remain, we do not feel this issue is important to research at this time. However, it should be mentioned that in discussions about further stratification of SMS, there is the opportunity to have larger numbers of peers in safety event groups by combining some of them. The trade-off to evaluate is size heterogeneity versus heterogeneity of the driving risks associated with the type of CMV operation.
Conceptually, SMS is structured reasonably. Using the number of violations found during inspections, and the number of crashes, with violations bundled into groups that represent related areas of safe operations, weighting these frequencies by severity and time weights, properly standardizing these counts, stratifying carriers into similarly sized peer groups, and then seeing which carriers are doing worse than the others, is a reasonable approach to the identification of unsafe carriers. However, too much of the detail of what is done is ad hoc. Instead, it would be better to make use of the appropriate statistical model, which will help address
many of the issues that have been raised in a natural and interpretable way.
CONCLUSION: The Safety Measurement System (SMS) is structured in a reasonable way, and its method of identifying motor carriers for alert status is defensible. However, much of what is now done is ad hoc and based on subject-matter expertise that has not been sufficiently empirically validated. This argues for Federal Motor Carrier Safety Administration adopting a more statistically principled approach that can include the expert opinion that is implicit in SMS in a natural way.