Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 60

E-1
APPENDIX E
RELEVANT STATISTICAL CONCEPTS
This appendix summarizes some basic statistical concepts relevant to individual differences, the association of various fac-
tors with risk, and quantifying risk.
E.1 BASIC STATISTICS OF DISTRIBUTIONS AND CORRELATIONS
The concept of high-risk drivers and differential driver risk implies that there is significant variation in the occurrence of
crashes or other incidents among drivers in a group and that differential risk may be predicted by various personal factors. Some
basic statistical distribution types are shown in Figure 18. The "normal" distribution (part a) is bell-shaped and symmetrical. Both
the mean (arithmetic average) and median ("middle") values of the distribution are located at the center of the distribution
indicating that the largest number of subjects is located in the center, also. For example, height (within a gender) and IQ scores
are two human traits that are generally normally distributed. If commercial driver's crash risk were normally distributed, most
drivers would have risk levels near the center, a few would be much lower, and a few would be much higher.
However, this appears not to be the case. Instead, there are many commercial drivers with very low crash risks and some
with elevated risk related to various factors to be discussed in this synthesis. In these situations, the mean is higher than the
median since a few high-risk scores raise the overall arithmetic average. This is the distribution that seems to most frequently
characterize driver risk within a group of drivers. Part b shows how the distribution would be "positively skewed" due to
these high-risk drivers. In this distribution, using risk as an example, there are many low-risk scores but some very high-risk
scores.
Finally, part c shows a "negatively skewed" distribution. This distribution would apply if a few drivers had very low rela-
tive risk, while the majority was at a relatively high level. In such a distribution, the mean is lower than the median. This dis-
tribution does not seem to frequently describe the variation of driver risk or its associated factors.
The second type of statistical relationship important for understanding and analyzing driver risk is the correlational relation-
ship. Correlates range from -1.0 (a linear inverse relation) through zero (no relation) to +1.0 (a linear direct relation). Risk fac-
tors or predictive measures (e.g., the score on a personality or performance test) are often stated in terms of their degree of
correlation with a safety criterion measure (e.g., crashes or incidents). Scatter plots are shown in Figure 19 below illustrating (a)
a moderate +0.5 correlation and (b) a high +0.9 correlation. Correlations of psychological variables are often in the moderate
range such as the +0.5 correlation shown.
E.2 DIFFERENTIAL RISK AND RANDOM EVENTS
Crashes, violations, and incidents are examples of discrete events that can be described in relation to a Poisson distribution.
This distribution applies to a situation where events occur by random chance. If observed variation does not follow this dis-
tribution, there are factors other than chance impacting the outcome.
A research example illustrates how chance variation could be distinguished from variation not explainable by chance
alone. The FMCSA Driver Fatigue and Alertness Study (DFAS; Wylie et al. 1996) used instrumented vehicles equipped
with in-cab videos that permitted driver drowsiness to be rated over a week of normal operational driving. Eighty drivers had
an average of 6.7 drowsiness episodes each. There were pronounced individual differences in the number of observed drowsy
episodes; 29 of the DFAS drivers (36%) were never judged drowsy while, at the other extreme, 11 of the drivers (14%) were
responsible for 54% of all the drowsiness episodes observed in the study. Figure 20 shows the distribution of drowsiness episodes
among the 80 drivers that one would expect if only chance variation were operating. It illustrates a Poisson statistical distribution
used to evaluate a situation where there is variation in the occurrence of discrete events (e.g., crashes) for a group of subjects. The
actual distribution differs very significantly from the "chance only" Poisson distribution (chi-square test; p < 0.00001, meaning
that there is less than one chance in 100,000 that the result could have occurred by chance alone). Figure 21 shows the chance
and actual distributions. Most notably, many of the drivers had more drowsiness episodes than could likely have occurred by
chance. It's clear that other factors besides chance impacted the number of drowsy epochs. Comparing the two distributions
shows one driver group has a larger number of drowsy episodes than would be expected based on chance, while another
driver group has a lower number than would be expected by chance.

OCR for page 60

E-2
(a) "Normal" Distribution
(b) Positively Skewed Distribution
(c) Negatively Skewed Distribution
Figure 18. Three hypothetical risk distributions.

OCR for page 60

E-3
6
5
4
3
2
1
0
0 1 2 3 4 5 6
(a) r = +0.5
6
5
4
3
2
1
0
0 1 2 3 4 5 6
(b) r = +0.9
Figure 19. Hypothetical correlational relationships.

OCR for page 60

E-4
45
40
35
Number of Drivers
30
25
20
15
10
5
0
0 1-5 6-10 11-15 16-20 21-25 25+
Number of Drowsy Episodes
(Predicted based on Poisson)
Figure 20. Predicted number of drowsy episodes for the sampled group based
on the Poisson distribution (assuming that only chance is operating)
45
40
35
Number of Drivers
30
25
Predicted
20 Actual
15
10
5
0
0 1-5 6-10 11-15 16-20 21-25 25+
Number of Drowsy Episodes
Figure 21. Comparison of the actual number of drowsy episodes with the predicted number
of episodes for the sample group.

OCR for page 60

E-5
TABLE 12 Variables from simple regression model predicting likelihood of severe crash
Driver Characteristic Coefficient Probability (chi-square)
Identified as Drowsy 1.679 p < 0.0001
Uses Safety Be lt -1.153 p < 0.0001
Intercept -2.82
E.3 DEMONSTRATING FACTORS
AFFECTING RISK
To determine whether correlating factors should be viewed as having true relationships with risk, and not as being the result
of a chance occurrence, one must understand how researchers investigate and analyze the data. The determination of crash
risk factors can be made using statistical models that predict the likelihood of crash involvement based on various factors (e.g.,
hours driving, drowsiness, safety belt usage, age). The outcomes of these models are numbers measuring the relative risk of
crash involvement. These numbers have a probability associated with them that indicates whether the resulting risk is occurring
due to chance alone. In most scientific studies, p-values of less than 0.05 are required to discount chance as a principal fac-
tor underlying the results.
A simple example, generated by a commonly used statistical software package (Statistical Analysis Software [SAS]) using
NHTSA General Estimates System (GES) data (on the overall population of crashes), is shown in Table 12. This model was
developed to predict the likelihood of being involved in a severe crash based on whether or not the driver was cited as being
drowsy at the time of the incident and on safety belt usage. A severe crash was defined as a crash that resulted in a fatality
or incapacitating injury. A non-severe crash was defined as a crash that resulted in no injury or injuries that were not inca-
pacitating. In this example, drowsiness and safety belt non-use were found to be significant indicators of severe crash involve-
ment. The negative coefficient for "uses safety belt" indicates an inverse relationship, that is, use was associated with low
crash severity, non-use with high severity. The probability of these variables having an impact on crash severity by chance
are less than 0.0001 (or very slim) as indicated in the chi-square probability value that was generated.
Writing these variables in the form of a predictive regression equation, the model would look like
Probability (Severe Injury) 2.82 1.68 Drowsy 1.15 Safety Belt
The coefficients of this model can be used to approximate the relative risk or odds ratio of being involved in a severe injury.
These approximations based on logistic regression odds calculations would be:
· Drowsy
· Odds Ratio: 5.36
· 95% Confidence Interval: 5.24 to 5.48
· Safety Belt
· Odds Ratio: 0.32
· 95% Confidence Interval: 0.31 to 0.32
What do these relative risks indicate? These values indicate that drivers who tend to be drowsy are more than 5 times more
likely to be involved in a severe crash (versus a low-severity or non-injury crash). The corresponding confidence interval indi-
cates that a driver who is drowsy will have a risk factor somewhere between 5.2 and 5.5.
These regression-type models have been used to investigate the likelihood of crashes given driver age, gender, driving time
and day, alcohol use, fatigue, and medical related conditions. Regression models can encompass both objective and subjec-
tive variables, as long as they can be quantified. Objective predictive measures include driver physical characteristics, aver-
age hours of sleep nightly, number of crashes or violations, and age. Subjective predictive measures include driving prefer-
ences, self-assessments of fatigue, and self-assessments of driving stress. In addition to personal driver factors associated with
risk, there are also many non-driver (vehicle and roadway) and situational (weather, traffic) factors that can be predictive of
crash involvement.