Read "Individual Differences and the "High-Risk" Commercial Driver" at NAP.edu

« Previous: Appendix D - Other Expert Survey Form

Page 60

Suggested Citation:"Appendix E - Relevant Statistical Concepts." National Academies of Sciences, Engineering, and Medicine. 2004. Individual Differences and the "High-Risk" Commercial Driver. Washington, DC: The National Academies Press. doi: 10.17226/13770.

Page 61

Page 62

Page 63

Page 64

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

E-1 This appendix summarizes some basic statistical concepts relevant to individual differences, the association of various fac- tors with risk, and quantifying risk. E.1 BASIC STATISTICS OF DISTRIBUTIONS AND CORRELATIONS The concept of high-risk drivers and differential driver risk implies that there is significant variation in the occurrence of crashes or other incidents among drivers in a group and that differential risk may be predicted by various personal factors. Some basic statistical distribution types are shown in Figure 18. The ânormalâ distribution (part a) is bell-shaped and symmetrical. Both the mean (arithmetic average) and median (âmiddleâ) values of the distribution are located at the center of the distribution indicating that the largest number of subjects is located in the center, also. For example, height (within a gender) and IQ scores are two human traits that are generally normally distributed. If commercial driverâs crash risk were normally distributed, most drivers would have risk levels near the center, a few would be much lower, and a few would be much higher. However, this appears not to be the case. Instead, there are many commercial drivers with very low crash risks and some with elevated risk related to various factors to be discussed in this synthesis. In these situations, the mean is higher than the median since a few high-risk scores raise the overall arithmetic average. This is the distribution that seems to most frequently characterize driver risk within a group of drivers. Part b shows how the distribution would be âpositively skewedâ due to these high-risk drivers. In this distribution, using risk as an example, there are many low-risk scores but some very high-risk scores. Finally, part c shows a ânegatively skewedâ distribution. This distribution would apply if a few drivers had very low rela- tive risk, while the majority was at a relatively high level. In such a distribution, the mean is lower than the median. This dis- tribution does not seem to frequently describe the variation of driver risk or its associated factors. The second type of statistical relationship important for understanding and analyzing driver risk is the correlational relation- ship. Correlates range from â1.0 (a linear inverse relation) through zero (no relation) to +1.0 (a linear direct relation). Risk fac- tors or predictive measures (e.g., the score on a personality or performance test) are often stated in terms of their degree of correlation with a safety criterion measure (e.g., crashes or incidents). Scatter plots are shown in Figure 19 below illustrating (a) a moderate +0.5 correlation and (b) a high +0.9 correlation. Correlations of psychological variables are often in the moderate range such as the +0.5 correlation shown. E.2 DIFFERENTIAL RISK AND RANDOM EVENTS Crashes, violations, and incidents are examples of discrete events that can be described in relation to a Poisson distribution. This distribution applies to a situation where events occur by random chance. If observed variation does not follow this dis- tribution, there are factors other than chance impacting the outcome. A research example illustrates how chance variation could be distinguished from variation not explainable by chance alone. The FMCSA Driver Fatigue and Alertness Study (DFAS; Wylie et al. 1996) used instrumented vehicles equipped with in-cab videos that permitted driver drowsiness to be rated over a week of normal operational driving. Eighty drivers had an average of 6.7 drowsiness episodes each. There were pronounced individual differences in the number of observed drowsy episodes; 29 of the DFAS drivers (36%) were never judged drowsy while, at the other extreme, 11 of the drivers (14%) were responsible for 54% of all the drowsiness episodes observed in the study. Figure 20 shows the distribution of drowsiness episodes among the 80 drivers that one would expect if only chance variation were operating. It illustrates a Poisson statistical distribution used to evaluate a situation where there is variation in the occurrence of discrete events (e.g., crashes) for a group of subjects. The actual distribution differs very significantly from the âchance onlyâ Poisson distribution (chi-square test; p < 0.00001, meaning that there is less than one chance in 100,000 that the result could have occurred by chance alone). Figure 21 shows the chance and actual distributions. Most notably, many of the drivers had more drowsiness episodes than could likely have occurred by chance. Itâs clear that other factors besides chance impacted the number of drowsy epochs. Comparing the two distributions shows one driver group has a larger number of drowsy episodes than would be expected based on chance, while another driver group has a lower number than would be expected by chance. APPENDIX E RELEVANT STATISTICAL CONCEPTS

E-2 (a) âNormalâ Distribution (b) Positively Skewed Distribution (c) Negatively Skewed Distribution Figure 18. Three hypothetical risk distributions.

E-3 0 1 2 3 4 5 6 0 1 2 3 4 5 6 (a) r = +0.5 0 1 2 3 4 5 6 0 1 2 3 4 5 6 (b) r = +0.9 Figure 19. Hypothetical correlational relationships.

E-4 0 5 10 15 20 25 30 35 40 45 0 1-5 6-10 11-15 16-20 21-25 25+ Number of Drowsy Episodes (Predicted based on Poisson) N um be r o f D riv er s 0 5 10 15 20 25 30 35 40 45 0 1-5 6-10 11-15 16-20 21-25 25+ Number of Drowsy Episodes Nu m be r o f D riv er s Predicted Actual Figure 20. Predicted number of drowsy episodes for the sampled group based on the Poisson distribution (assuming that only chance is operating) Figure 21. Comparison of the actual number of drowsy episodes with the predicted number of episodes for the sample group.

E.3 DEMONSTRATING FACTORS AFFECTING RISK To determine whether correlating factors should be viewed as having true relationships with risk, and not as being the result of a chance occurrence, one must understand how researchers investigate and analyze the data. The determination of crash risk factors can be made using statistical models that predict the likelihood of crash involvement based on various factors (e.g., hours driving, drowsiness, safety belt usage, age). The outcomes of these models are numbers measuring the relative risk of crash involvement. These numbers have a probability associated with them that indicates whether the resulting risk is occurring due to chance alone. In most scientific studies, p-values of less than 0.05 are required to discount chance as a principal fac- tor underlying the results. A simple example, generated by a commonly used statistical software package (Statistical Analysis Software [SAS]) using NHTSA General Estimates System (GES) data (on the overall population of crashes), is shown in Table 12. This model was developed to predict the likelihood of being involved in a severe crash based on whether or not the driver was cited as being drowsy at the time of the incident and on safety belt usage. A severe crash was defined as a crash that resulted in a fatality or incapacitating injury. A non-severe crash was defined as a crash that resulted in no injury or injuries that were not inca- pacitating. In this example, drowsiness and safety belt non-use were found to be significant indicators of severe crash involve- ment. The negative coefficient for âuses safety beltâ indicates an inverse relationship, that is, use was associated with low crash severity, non-use with high severity. The probability of these variables having an impact on crash severity by chance are less than 0.0001 (or very slim) as indicated in the chi-square probability value that was generated. Writing these variables in the form of a predictive regression equation, the model would look like Probability (Severe Injury) 2.82 1.68 Drowsy 1.15 Safety Belt The coefficients of this model can be used to approximate the relative risk or odds ratio of being involved in a severe injury. These approximations based on logistic regression odds calculations would be: â¢ Drowsy â¢ Odds Ratio: 5.36 â¢ 95% Confidence Interval: 5.24 to 5.48 â¢ Safety Belt â¢ Odds Ratio: 0.32 â¢ 95% Confidence Interval: 0.31 to 0.32 What do these relative risks indicate? These values indicate that drivers who tend to be drowsy are more than 5 times more likely to be involved in a severe crash (versus a low-severity or non-injury crash). The corresponding confidence interval indi- cates that a driver who is drowsy will have a risk factor somewhere between 5.2 and 5.5. These regression-type models have been used to investigate the likelihood of crashes given driver age, gender, driving time and day, alcohol use, fatigue, and medical related conditions. Regression models can encompass both objective and subjec- tive variables, as long as they can be quantified. Objective predictive measures include driver physical characteristics, aver- age hours of sleep nightly, number of crashes or violations, and age. Subjective predictive measures include driving prefer- ences, self-assessments of fatigue, and self-assessments of driving stress. In addition to personal driver factors associated with risk, there are also many non-driver (vehicle and roadway) and situational (weather, traffic) factors that can be predictive of crash involvement. E-5 Driver Characteristic Coefficient Probability (chi-square) Identified as Drowsy 1.679 p < 0.0001 Uses Safety Belt -1.153 p < 0.0001 Intercept -2.82 TABLE 12 Variables from simple regression model predicting likelihood of severe crash

Next: Appendix F - Sample Tools for Improved Driver Selection and Monitoring »

Individual Differences and the "High-Risk" Commercial Driver (2004)

Chapter: Appendix E - Relevant Statistical Concepts

Welcome to OpenBook!

Get Email Updates