National Academies Press: OpenBook

Analysis of Existing Data: Prospective Views on Methodological Paradigms (2012)

Chapter: Chapter 2 - Research Approach

« Previous: Chapter 1 - Introduction
Page 14
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 14
Page 15
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 15
Page 16
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 16
Page 17
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 17
Page 18
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 18
Page 19
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 19
Page 20
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 20
Page 21
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 21
Page 22
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 22
Page 23
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 23
Page 24
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 24
Page 25
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 25
Page 26
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 26
Page 27
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 27
Page 28
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 28
Page 29
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 29
Page 30
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 30
Page 31
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 31
Page 32
Suggested Citation:"Chapter 2 - Research Approach." National Academies of Sciences, Engineering, and Medicine. 2012. Analysis of Existing Data: Prospective Views on Methodological Paradigms. Washington, DC: The National Academies Press. doi: 10.17226/22837.
×
Page 32

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

14 C h a p t e r 2 Overview The analysis of the data provided by VTTI and UMTRI was guided by the following five research questions: 1. What is the nature of the relationship between events (e.g., crashes, near crashes, incidents), and pre-event maneuvers? What are the contributing driver, environmental factors, and other factors? There are many findings and implications to share from analyses concerning this question. 2. What hierarchical structure (statistically speaking), if any, exists in the manner in which these relationships need to be explored? Two specific hierarchical models are reported, both using VTTI data: one was applied to event modeling and the second to driver-based models. A series of compari- sons between hierarchical models, estimated using Bayesian methods and frequentist models, which are estimated using typical maximum likelihood principles, is presented. 3. What kind of elucidative evidence emerges from the analysis of roadway departure crashes in terms of Questions 1 and 2? Is the illustrative hierarchy of relationships generalizable to other nonintersection crash types such as leading vehicle crashes? Elucidative evidence refers to evidence of the likely effect of individual predictor variables in modeling event occur- rence (including crashes). Surrogates are a special type of variable that have been discussed as a general replacement for crash data; the description and interpretation of Penn State surrogate analyses are contained in the responses to this general question. Exposure requires a predictor vari- able reflecting time or distance of travel; exposure-based analyses of both data sets are described in Chapter 3. Responses to this question thus provide a summary of the extent to which the modeling results provide guidance on variables to be given priority in future analysis studies. 4. In terms of elucidative evidence, what types of behavioral correlates emerge? For example, are attitudinal measure- ments indicative of revealed behavior in terms of headway maintenance and speed reductions? Several behavioral cor- relates (also referred to in several SHRP 2 safety symposia as crash predisposition measures) have emerged as factors of interest. The responses to this question describe the work in this area. 5. If elucidative evidence does in fact emerge in terms of atti- tudinal correlates and how their interactions vary by con- text, is it plausible to parse out the marginal effects of various context variables on crash risk by suitable research design? This question bears directly on the importance of context in the analysis of naturalistic data. This sec- tion summarizes the findings and discusses their impli- cations for SHRP 2 scheduled projects (specifically S04 and S08). analysis of VttI Data Two parallel tracks were pursued in the analysis of the 100-car study data. The first approach modeled the occurrence of each event in detail and focused on understanding the interaction of the many factors that led to event occurrence. This initiative fit nicely with the data provided by VTTI, as it allowed the team to compare events at three levels (summary definitions provided by Dingus, Klauer, et al. 2006): • Crash event—any contact with an object, either moving or fixed, at any speed, in which kinetic energy is measurably transferred or dissipated; • Near crash—a circumstance that requires a rapid, evasive maneuver by the subject vehicle, or any other vehicle, to avoid a crash; the maneuver causes the vehicle to approach the limits of its capabilities (e.g., vehicle braking greater than 0.5 g or steering input resulting in lateral acceleration greater than 0.4 g); and • Crash-relevant incident (in this report referred to as a crit- ical incident)—a circumstance that requires a crash avoid- ance response on the part of the subject. Research Approach

15 consistent with the road departure event covered in the VTTI analyses, and it was thought there may be some ben- efit from the similarity. Two approaches were taken in the analysis of the UMTRI data. The first was to use a series of piecewise linear models to characterize the nature of the relationship between vehicle kine- matics and CSW alert frequency and duration. The interest was in finding which kinematic variables were most correlated with the triggering of the alert. This information was used to gain insight about potential surrogates, under the assumption that the kinematic variables most associated with alert occurrence would be potentially good crash surrogates to consider in sub- sequent research. A positive association between a kinematic variable and an alert could be an indication of a kinematic vari- able that might also be associated with (or potentially causing) road departure crash occurrence. While the team acknowledges the nature of this conceptual leap, it was believed that the exploratory nature of the SHRP 2 S01 projects would support this type of analysis. Time–series models of the kinematic data were also attempted, but they did not yield particularly mean- ingful results and are not discussed in this report. The second approach taken with the UMTRI data was to use a cohort-based formulation to estimate the probability of a par- ticular number of alerts being triggered for an individual driver (e.g., characterized by gender, years of driving experience, and mileage driven in particular contexts). This formulation is based on actual miles driven under specific environmental and roadway conditions as measured by the CSW–LDW system. Because of the structure of the UMTRI data, the team was able to analyze alert frequency at a very detailed level of exposure. The team believes the successful estimation of the models predicting the number of alerts using homogeneous trip seg- ments is one of the most important outcomes of the UMTRI modeling effort. This formulation takes advantage of the unique trip-by-trip information in the naturalistic study, along with GIS-related factors coded by UMTRI (such as road type and environmental conditions), to derive a measure of alert frequency in each trip segment. The issue of interest is the ability to truly capitalize not only on the naturalistic driver behavior data, but also on detailed GIS roadway data. Since there is a plan to collect detailed roadway data as part of the scheduled SHRP 2 Safety Project S04, the Penn State team believes this formulation merits consideration for future studies. Even though the models are estimated with alerts, there is a direct parallel to the modeling of crashes or other events of interest. In addition, researchers can flexibly define homogeneous trip segments to match their research needs. The Penn State team discussed this approach during several SHRP 2–sponsored research symposia. The estimated mod- els using the cohort formulation verify the efficacy of this approach; the findings contribute to answering Research Question 3. Each of these events was identified by VTTI staff as part of the 100-car study, and the three event types were provided to Penn State in response to the team’s data request. Penn State developed a structured analysis framework for these event- based data; the model specified driver attributes, the context in which the event occurred (including roadway and envi- ronmental variables), and attributes describing details about the event itself, particularly in the few seconds before and during the event. Examples of event-level variables include whether the driver was observed to be distracted just before the event and whether the vehicle crossed over the lane or road edge. One may think of these models as exploring the details of factors associated with the events. Various model formulations were used to find variables associated with crashes and near crashes, and the attributes of vehicle motion associated with such events (e.g., vehicle over lane or road edge) that could serve as surrogate measures for crashes were investigated. If these event-related measures were shown as being positively associated with crash or near-crash events, they were considered as potential surrogates. The team tested the specific measures available in the data set and attempted to supplement the available vehicle kinematic data by downloading information from the NHTSA website. Unfor- tunately, kinematic data were only available for a small number of crashes; near crashes and critical incidents were not repre- sented, and this approach was, therefore, abandoned. One weakness of event analysis is that it precludes the study of drivers who experience none of the three measured events (i.e., the safest drivers). In order to include these drivers, the second analysis track conducted by Penn State with the VTTI data was a series of models of the number of events per driver. Consistent with much of the modeling in the safety field, these analyses were conducted using a set of count regression formu- lations (e.g., Poisson, negative binomial [NB], and zero-inflated Poisson [ZIP]) that resulted in estimates of the probability of a driver with particular attributes having 0, 1, 2, . . . , n events during the year of the 100-car study. These models allowed comparisons to be made across all drivers. analysis of UMtrI Data The UMTRI data consisted of a set of drivers who experi- enced a series of alerts from onboard systems about potential crashes. Because there were no crashes during the study, the dependent variables used in the analyses were derived from a system designed to detect excessive speed entering a curve (i.e., CSW) and an alert triggered when the subject vehicle deviated from the lane or road edge (i.e., LDW). After an initial screening of the data, the team decided to focus on the CSW alerts as they provided alert duration data and thus contained more details about the driver response to the alert. Further, the curve speed event was more

16 Conversely, the event-based models estimate the probabil- ity of having an event in a given context for drivers with events; these models do not include the best drivers in the data set (that is, those with no events of interest). As discussed in the background section of this report, this omission should not necessarily be the case in all naturalistic data sets. Penn State’s proposal anticipated that data would be available for event-based analyses that included nonevent observations (so-called control epochs in the VTTI data), but because these epochs contained no context information, they were useless for modeling. Consistent with the desire to explore multiple analysis paradigms, both driver- and event-based analyses are included in this report. The driver-based models, both frequentist (those applying classical maximum likelihood principles with asymptotic analysis plan for VttI Data Figure 2.1 is an overview of the analysis conducted with the VTTI data, including separate analysis streams for driver- based and event-based models. This differentiation in model- ing approach was identified in the proposal for the current study and reflects the authors’ view of the most sensible way to approach exploration of the data set. The driver-based models estimate the number of events expected of all drivers in the data set, including drivers with zero observed events. The VTTI data set lacks details concerning all the contexts in which the exposure to crash risk occurred, including the many miles driven with no events. While these data are avail- able in concept within the original 100-car naturalistic data set, they were not provided to the Penn State team. Figure 2.1. Overview of modeling design for VTTI data.

17 the undertaking of relative risk and exposure-based designs. These studies could not be undertaken with mea- sured exposure using the VTTI data because data were not available for the processed data set that Penn State received. Instead, the team used as exposure the subject- estimated annual mileage obtained during driver inter- views. Thus, the exposure-based risk analysis shown in Figure 2.1 represents the driver-based VTTI modeling, which included self-reported annual miles driven for each primary driver. Exposure-based models and relative risk analyses using measured travel in different contexts were developed using UMTRI data and are described in that section of the report. Table 2.1 shows the summary statistics for the driver- based model. Only the statistically significant covariates included in the final models are presented in the table sum- mary. Driver attributes are presented for all the drivers and by gender. normality assumptions) and Bayesian, are fundamentally count regressions that estimate the probability that a driver with given attributes has 0, 1, 2, 3, . . . , n events during the 1-year duration of the Virginia Tech study. Hierarchical models are estimated using Bayesian methods. Count models under consideration during the study include Poisson, NB, ZIP, zero-inflated NB (ZINB), and other models. Event-based models include (by definition) crashes, near crashes, and critical incidents. As with driver-based models, a range of model forms was considered, including probit, logit (binary, ordered, and multivariate), and hierarchical versions of these using Bayesian formulations. Figure 2.1 illustrates that the driver-based models use driver attributes in the data set along with driving behav- ior (classified as crash, near crash, and critical incident). The event-based models use context and event variables as predictors along with driver attributes in a search for valid surrogates for crashes. The last box in the figure calls for Table 2.1. Summary Statistics for Variables Used in VTTI Driver-Based Models Driver Group Variable Mean SD Min Max All drivers Number of events 2.37 5.06 0 28 Gender (male) 0.60 0.49 0 1 Drivers with BS degree or above 0.63 0.49 0 1 Scaled Dula Dangerous Driving Index (DDDI) aggressive driving (AD) score 6.23 1.16 4.0 9.1 Scaled DDDI risky driving (RD) score 10.38 1.29 7.2 14.9 Driving experience 18.73 14.41 1.5 52 Past violations 1.35 1.31 0 5 Total mileage 11,369 5,726 12 23,980 Males Number of events 1.72 5.03 0 28 BS degree or above 0.47 0.50 0 1 Scaled DDDI AD score 3.87 3.30 0 9.1 Scaled DDDI RD score 6.22 5.16 0 13.1 Driving experience 12.36 14.73 0 52 Past violations 0.67 1.09 0 5 Total mileage 7,445 7,461 0 23,980 Females Number of events 0.65 1.63 0 10 BS degree or above 0.16 0.37 0 1 Scaled DDDI AD score 2.36 2.99 0 8.1 Scaled DDDI RD score 4.16 5.24 0 14.9 Driving experience 6.37 12.25 0 51 Past violations 0.67 1.20 0 5 Total mileage 3,924 6,021 0 21,564 Note: SD = standard deviation; Min = minimum; Max = maximum.

18 of run-off-road–related events includes only 17 crashes, which presents a problem for model significance and also fails to uti- lize the information from the other 180 events (30 near crashes and 150 critical incidents). Therefore, two distinct dependent variables are considered. The first combines crashes and near crashes; the second combines crashes, near crashes, and criti- cal incidents. Figure 2.2 presents the frequency distribution for crashes and near crashes for the 1-year study, and Figure 2.3 presents the frequency distribution for all events (that is, crashes, near crashes, and critical incidents). In preliminary modeling, Characteristics of Dependent Variables To discuss the effects of driver attributes on the number of events during a time period, simple relationships must be formulated between them, such as number of events per driver as some function (f ) of his or her attributes: number of crashes per person during a period of time driver attributes= ( )f Although the dependent variable on the left-hand side of this simple equation is not difficult to obtain, this subsample 75 19 7 2 3 1 0 20 40 60 80 Fr eq ue nc y 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Crashes and near crashes Figure 2.2. Frequency distribution of crashes and near crashes. 49 23 10 9 3 2 4 1 1 1 1 1 1 1 0 10 20 30 40 50 Fr eq ue nc y 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Events Figure 2.3. Frequency distribution of all events (crashes, near crashes, and critical incidents).

models were fit by both dependent variables. The data appear overdispersed, with a large number of zero-event drivers and a few drivers with high counts of events. Characteristics of Predictor Variables The first step in checking the predictor variable data was to scrutinize the correlations between all variables in hand to avoid multicollinearity and to get a rough sketch of the overall data, as shown in Table 2.2. The high correlation between driver age and driving experience (0.76) was expected and is summarized in Figure 2.4. Dropping age instead of driving experience may improve model results, since driving experience usually reflects driving skill more directly than age (Shinar 2007). Education level should not necessarily be considered a con- tinuous variable, as nonlinear relationships may exist. Hence, education levels were initially tested as categorical, as shown in Figure 2.5. The education levels 4, 6, 7, and 8 constituted less than 20% of the total. Education levels were combined to reduce the number of categories to three: some college attended, bachelor’s degree, and professional (master’s, PhD, or other) degree. Descriptive statistics for the grouped education vari- able are shown in Table 2.3. Crash Predisposition Measures The Dula Dangerous Driving Index (DDDI) was used to mea- sure drivers’ self-reported likelihoods of dangerous driving. Each DDDI scale—DDDI total, aggressive driving (AD), neg- ative emotional (NE) driving, and risky driving (RD)—had tests of internal reliability and evidence of construct validity of the scales as part of initial scale development and testing Table 2.2. Correlation Between All Primary VTTI Variables EVENTS PACCIDENT AGE GENDER EDU DRIYEAR CRASH CYEAR PVIOLATION EVENTS 1.000 0.072 -0.070 0.050 0.056 -0.039 0.712 0.096 -0.019 PACCIDENT 0.072 1.000 0.400 0.696 0.400 0.228 0.087 0.289 -0.024 AGE -0.070 0.400 1.000 0.676 0.326 0.758 -0.094 0.085 -0.073 GENDER 0.050 0.696 0.676 1.000 0.572 0.421 0.047 0.201 -0.030 EDU 0.056 0.400 0.326 0.572 1.000 0.182 0.073 0.080 -0.042 DRIYEAR -0.039 0.228 0.758 0.421 0.182 1.000 -0.067 0.086 -0.069 CRASH 0.712 0.087 -0.094 0.047 0.073 -0.067 1.000 0.089 -0.108 CYEAR 0.096 0.289 0.085 0.201 0.080 0.086 0.089 1.000 0.089 PVIOLATION -0.019 -0.024 -0.073 -0.030 -0.042 -0.069 -0.108 0.089 1.000 EVENTS = crash, near crash, or critical incident; PACCIDENT = past accident; EDU = educational level; DRIYEAR = number of years driving; CRASH = number of crashes experienced by the subject during the study; CYEAR = vehicle age (years); PVIOLATION = past violation. 0 10 20 30 40 50 dr ivi ng e xp er ie nc e 20 30 40 50 60 70 age Figure 2.4. Plot of driving experience against driver age. 19

20 Driver-Based Analysis The naturalistic driving environment is a complex web of interactions between various measurable factors that repre- sent both physical infrastructure and human behavior and attributes. Modeling of subjects offers the potential to cap- ture interactions and characterization of heterogeneity in the driving environment. To answer Research Question 1 (What is the nature of the relationship between events [e.g., crashes, near crashes, incidents], and pre-event maneuvers? What are the contributing driver factors, environmental fac- tors, and other factors?), it is crucial to determine what kinds of drivers tend to have higher counts of events that poten- tially increase crash probability; this focus on driver charac- teristics leads to the idea of driver-based models. These models are used to examine the relationships between driver attributes and risky driving events by using driver character- istics such as gender, driving experience, and other socio- economic variables. A significant amount of research has been conducted on the application of Poisson and NB distributions (Jovanis and Chang 1986; Miaou 1994; Shankar et al. 1995; Poch and Mannering 1996; Milton and Mannering 1996; Lord et al. 2005, 2007) to predict crash frequencies. The Poisson model is only appropriate if the mean and variance of crash frequen- cies are approximately equal, but the NB model can be applied if the data are overdispersed (i.e., the variance of the data is significantly greater than the mean). Let l represent the expected number of events per driver during a period of time as a function of b, a set of estimated parameters, and xi, a set of crash contributing factors (Jovanis (Dula and Ballard 2003). Participants responded to the items using the following 5-point Likert scale: A = never, B = rarely, C = sometimes, D = often, and E = always. In order to quantify the DDDI, numerical values were assigned to each response (1 through 5 for A through E, respectively). The higher the score per driver, the more dangerous that person’s driving behavior was considered to be. To further account for the inconsistency of driver responses, DDDI, AD, RD, and NE were rescaled by deflating the scores for each driver by their specific mean. Two additional indices measuring driver risk predisposition were included in the analysis (see Dingus et al. 2006 for addi- tional information). The Driver Stress Inventory uses a 10-point Likert scale to obtain information about drivers’ general atti- tudes toward driving on a variety of roadways and in traffic congestion. The Life Stress Inventory contains information about the types of stress that the subject may have experienced in the past year (e.g., ill relative, marital or relationship prob- lems, work performance); evidence suggests that these types of stresses can predispose an individual to have an elevated crash risk. These indices were used in both event- and driver-based models to assess correlations with event occurrence. 35 7 42 12 44 0 10 20 30 40 Fr eq ue nc y 1 2 3 4 5 6 7 8 Education Figure 2.5. Histogram of education levels. Table 2.3. Descriptive Statistics of Grouped Education Levels for 100 Cases Variable Mean SD Min Max Some college 0.4 0.492 0 1 BS degree 0.4 0.492 0 1 Postgraduate college 0.2 0.402 0 1

21 probability 1 - pi, contributes in combination to the apparent excess zero problems. Thus, p y p p ei i i i =( ) = + −( ) −0 1 3λ ( ) and p y k p e ki i i k i =( ) = −( ) ( ) − 1 4 λ λ ! ( ) where k is the number of crashes with mean li. Combining Equations 3 and 4 provides the ZIP model of crash frequency. In the present case, the team used the logit model to estimate the proportion of observations with a zero frequency and used count regression for the other frequencies. The Vuong statistic (Vuong 1989) is often used as a measure of whether the ZIP or ZINB model fits the modeled data better. Shankar (1997) proposed a decision guideline for model selec- tion among Poisson, NB, ZIP, and ZINB models using the Vuong statistic and a, as shown in Table 2.4. Gender differences in crash experience and etiology are well established in the safety literature. Hierarchical structures such as the one illustrated in Figure 2.6 can be used to explore these differences in the VTTI data. An advantage of hierarchical models is that they can capture driver differences over time and space, depending on how the data are clustered. This allows the and Chang 1986). The Poisson regression finds maximum likelihood estimates of the b parameters: ln ( )λ β εi i ix( ) = + 1 In traffic safety, crash counts are often overdispersed. Thus, the use of the NB distribution to represent the distribution of crash counts is considered. In Equation 1, exp (εi) is a gamma- distributed variable with mean 1 and variance a. If the number of crashes is conditioned on exp (εi), the resulting probability distribution for a given count yi is p y y yi i i i i i ( ) = +( )( ) +     +     Γ Γ θ θ θ θ λ λ θ λ θ !  yi ( )2 where q = a-1, G represents the gamma function, and a is an overdispersion parameter. When a > 0, there is overdispersion of the distribution about the mean. The NB distribution can capture overdispersion that occurs as a result of unobserved heterogeneity in crash data. In the context of crashes, the likelihood of having a large pro- portion of zero frequencies is high, implying the importance of zero-inflated models. Since their formal introduction by Lambert (1992), the use of these models has grown and can be found in numerous fields. Crash frequencies can be modeled as belonging in two states (Shankar et al. 1997). One state occurs when the entity of interest is inherently safe (theoretical zero-crash state). In the second state, crash frequencies follow some known distribution. ZIP and ZINB models can handle this dual-state phenomenon (Miaou 1994; Shankar et al. 1997). An overabundance of zeros in a crash count distribution may reflect true lifetime proportions or may arise as a result of partial observability, which poses methodological challenges. Shankar (2004) pointed out that if partial observability and overdispersion are suspected, NB variants of the ZIP model are plausible. With probability pi, the Poisson process, with Table 2.4. Decision Guideline for Model Selection Vuong Statistic t-Statistic for Overdispersion Parameter a < 1.96  > 1.96  < 1.96 ZIP or NB NB > 1.96 ZIP ZINB Females Driver m+1 Males Driver n Driver mDriver 1 All Drivers Event 1 Event 2 Event 1 Event 1 Event 2 Event 3 Figure 2.6. Hierarchy for driver-based model.

22 or risky events per driver during a period of time. Count models such as Poisson, NB, ZIP, ZINB, Bayesian multilevel Poisson, and Bayesian multilevel NB models are well suited to handle VTTI data. Event-Based Analysis A series of frequentist models was estimated with a wide range of predictor variables. In order to search for consis- tency in modeled predictor effects, three sets of models were compared. The first set of models is binary logit, in which the base alternative was a critical incident and the other alterna- tive was a crash or near crash; positive parameters in this type of model reflect an increase in the likelihood of moving from a critical incident to a crash or near crash. The second set of models is multinomial logit, in which there are three catego- ries: the baseline is again a critical incident, one category is a near-crash event, and the third is a crash event; positive parameters reflect an increase in the likelihood of the cate- gory of outcome compared with a critical incident. A third set of models using an ordered logit formulation was estimated. Estimation and comparison of all three models was chosen as a basic approach because of the limited experience modeling naturalistic data. The team believes that greater confidence can be accumulated about the utility of the naturalistic driving anal- ysis paradigms if consistent results are obtained across the meth- ods. The three logit models used in this study are very commonly used in transportation analysis (see also Washington et al. 2003). One can think of these models as reflecting a conditional analysis: a study of factors contributing to crashes and near crashes compared with those contributing to critical incidents, given that an event has occurred. This has a rough parallel in most models of injury severity in crashes: the model is an esti- mate of injury severity given that a crash has occurred. In both cases, the models do not provide an estimate of the probability of an event (VTTI data) or the probability of an injury (crash severity analysis) because of a lack of appropriate exposure data. Bayesian models offer considerable additional flexibility in event-based analyses and assist in answering Research Ques- tion 2 (What hierarchical structure [statistically speaking], if any, exists in the manner in which these relationships need to be explored?). Figure 2.7 presents the hierarchy for the Bayesian event-based models. Events and their attributes are included at the first level, and driver attributes are represented at the second level. In the Bayesian formulation, outcomes are modeled as Bernoulli trials, in which a crash or near crash is considered a success: y Bernoulli pij ij~ ( )( ) 10 where pij is the probability of success for event i of driver j defined by Equation 11. investigation of individuals (driver-level parameters) them- selves as random effects. In the driver-based model, the drivers are the units of analysis, but they are aggregated by gender to explore gender-specific differences between drivers. Frequentist models, such as the NB, use predictors (such as a male dummy variable or gender-based interaction terms) that estimate the difference in probabilities of having Y events between males and females; the hierarchical model in Figure 2.6 shows the effect of a predictor on male crash probability and female crash probability individually. The effect of the attri- bute on male event probabilities is thus estimated separately from the effect on female probabilities. This produces a model that is much more interpretable. For the driver-based model, the levels are defined by gender, creating two groups. The response variable is the number of events for each driver participating in the study. The number of events is modeled as a Poisson distribution (see Aguero- Valverde and Jovanis 2008 for a similar formulation): yij ij~ ( )Poisson θ( ) 5 where yij is the observed number of events for driver i of gen- der j, and qij is the expected Poisson rate. The Poisson rate is modeled as a function of the covariates following a lognormal distribution, as shown in Equation 6: log ( )θ β βij j jk ijk ij k K X v( ) = + + = ∑0 1 6 where b0j = intercept for gender j, bjk = coefficient for k covariate and gender j, Xijk = value of k covariate for event i of gender j, and vij = random effects at Level 1. The random effects represented by vij capture the extra-Poisson heterogeneity among drivers. At the second stage, the coefficients (bjk), including the inter- cepts, are modeled using very noninformative normal priors: β jk ~ ( )N 0,1000( ) 7 Now, the prior distribution for the Level 1 random effects is given by vij v~ ( )N 0, τ −( )1 8 where tv is the inverse of the variance, also known as preci- sion. The precision has a gamma prior: τv ~ . , . ( )Gamma 0 001 0 001 9( ) with a mean of 1 and a variance of 1,000. The driver-based model seeks to explore the relationship between driver attributes and the expected number of crashes

23 team used the occurrence of an alert (either an LDW or CSW) as the dependent variable. This decision immediately created the challenge of learning about this measure, as there is a very limited literature of its analysis. As a result, a building block approach was taken with the alert data. Using the UMTRI technical report as an initial guide, the team first explored gen- eral attributes of the data before settling on an analysis plan. As the team worked with the data, it became clear that the CSW alerts, because of their relation to road curvature, had a closer association with lateral and longitudinal vehicle kinematics. The LDW alerts were more closely associated with vehicle posi- tion within a lane. The team thus chose to focus on CSW alerts analyses, given the more likely application to road depar- ture crashes, at least those occurring on curves. Rather than select specific model forms, which could have confounded the identification of promising variables, the team chose a piecewise linear modeling approach and explored several formulations. The goal in these analyses was to identify factors that were associated with the trigger- ing of alerts in the hope of identifying these variables as good prospects for future study in SHRP 2. It was also expected that a cohort-based formulation might have particular advan- tages for future data analysis. This twofold objective, the iden- tification of promising variables and testing of alternative analysis structures, particularly the cohort formulation, motivated the research plan. Kinematic Models The Penn State team conducted a careful structured analysis of the kinematic data received from UMTRI by using the steps outlined in Figure 2.8. Initially, the data received from UMTRI needed to be organized in such a way that event counts could be defined. Alert durations and alert counts, both aggregated log logit p p p Xij ij ij k ijk k K( ) = −     = + +=∑1 1α β γ l jll L Z = ∑ 1 11( ) where a = intercept, bk = coefficient for k Level 1 (event level) covariate, Xijk = k covariate for i event of j driver, gl = coefficient for l Level 2 (driver level) covariate, and Zjl = l covariate for driver j. Very noninformative normal priors are used for all of the coefficients including the intercept [~N (0, 1000)]. analysis plan for UMtrI Data The UMTRI data provided the opportunity to work with a rich set of vehicle kinematic variables, but the limitation was that there were no crashes in the data set. As a result, the Penn State Event 1Event 1 Event 2 Event 3 Driver 3 Driver 1 Driver 2 Event 1 Event 2 Figure 2.7. Hierarchy used for Bayesian event-based models with VTTI data. Data manipulation Cross- tabulation (descriptive statistics) CSW vs. LDW Aggregated vs. disaggregated data (e.g., divide by roadway classification) Individual driver analysis vs. multiple drivers Initial relationships between kinematics (primarily speed) and time Initial models - linear regression Detailed modeling process (see Figure 2.10 for more detail) BA C D EFGH Figure 2.8. Flow chart showing analysis process for UMTRI data.

24 to decelerate as needed to safely navigate the curve (dash–dot line with crosses). One possible driver adaptation to the CSW is to approach the curve without decelerating, waiting for the sys- tem to provide an alert, and then decelerating more rapidly (dashed line with diamonds). With the CSW engaged, drivers may approach curves at a constant speed until the alert is trig- gered, but the prealert speed may be lower than that observed during the RDCW-disabled period (dotted line with squares). Only empirical testing will determine which of the suggested models is observed. To determine an initial relationship between kinematics and time, individual drivers were randomly sampled, and longitu- dinal speed was taken as the first kinematic variable (Steps D and E of Figure 2.8). The relationship between speed and time was deemed useful for modeling. Additional relationships were developed between speed and other kinematic variables in the initial models (Step F). Other models were subsequently developed to look at the relationship in a more detailed man- ner, mostly on the aggregate level (Steps G and H). To continue to learn more about the data set, the team eval- uated candidate dependent measurements, including alert fre- quency, alert duration, and alerts per trip for all drivers. Alert counts were then divided by alert ID, from which average durations were found. Alert counts, classified by alert ID, were then obtained for different exogenous factors (e.g., headlamp status, turn-signal status, and windshield wiper status) pro- vided in the RDCW alert data set. Examples of the organized data can be seen in Tables 2.5 through 2.8. The total number of alerts, 2,605, was used as the sample size for analysis. However, this count only provided a glimpse of the relationship between the exogenous factors and alert fre- quency and duration and did not take into account the kine- matic variables included in the data set. Different time periods of observation in the data set were identified by using a combi- nation of headlight status and solar zenith angle (i.e., the angle of the sun relative to the horizon, which was used to identify light or darkness conditions for driving). This variable was transformed into binary form and included in all models. and by driver, needed to be obtained (Step A). Several cross- tabulations were performed to determine the number of alerts occurring under certain circumstances (Step B). Kinematic analysis depended on which type of alert contained the data that were usable for modeling. All LDW alerts in the data set essentially had no duration (they were listed as instantaneous). Kinematics do not play a crucial role in the occurrence of LDW alerts; they are solely based on the threshold of lateral displacement from the center- line of the travel lane. In the case of LDW alerts, there would likely be a much weaker relationship between various vehicle kinematics and longitudinal speed than was seen during pre- liminary analysis of CSW alert data. Kinematics were deemed to be a desirable attribute of potential crash surrogates, a focus of S01 activities, so it was decided that CSW alerts would be used for initial analysis (Step C). LDW alerts were included as part of cohort-based model development. The overall approach to the CSW models involved a consid- eration of the relationship between longitudinal speed and combinations of exogenous and kinematic factors. However, the primary focus was on the effect of changes in vehicle kine- matics on vehicle longitudinal speed. The research team focused on CSW alert and vehicle kinematics to learn more about one type of road departure event: those occurring on curves. Since kinematic data were not available through the VTTI data set, the team hoped to learn more about this issue through the UMTRI data analyses. The proposal stated that the goal was exploratory and explicitly said the team would not “compare models” across data sets. The team understood the data to be different and has sought to take advantage of those differences. Early in the analysis, through careful review of the UMTRI report and communication with UMTRI researchers, it became clear that driver adaptation to CSW needed to be considered. One can think about driver behavior on curves by constructing a diagram such as Figure 2.9. A theoretical baseline is consid- ered constant deceleration while approaching and moving through the curve (solid line with triangles). Another possible driver behavior is to approach a curve while slowing and then Figure 2.9. Conceptual model of driver speed adaptation to CSW alerts.

25 Table 2.5. CSW Alert Counts and Duration Summary (by ID) Number and Duration of CSW Alerts Alert Type All CSW AlertsCautionary Imminent No. of alerts (%) 1,867 (71.67) 738 (28.33) 2,605 (100) Average duration (s) 2.242 4.632 2.919 SD of duration (s) 7.923 15.494 — Table 2.6. CSW Alert Counts by Wiper Status Wiper Status No. of CSW Alerts by Type TotalAlert 5 Cautionary Alert 6 Imminent 0 (Off ) 1,730 683 2,413 1 (Low) 43 15 58 2 (High) 6 4 10 4 (Intermittent) 88 36 124 Total 1,867 738 2,605 Table 2.7. CSW Alert Counts by Headlamp Status Headlamp Status No. of CSW Alerts by Type TotalAlert 5 Cautionary Alert 6 Imminent 0 (Off ) 1,234 491 1,725 2 (Low) 609 236 845 3 (High) 24 11 35 Total 1,867 738 2,605 Weather status was estimated by using wiper status. Table 2.9 shows the counts of alerts (regardless of ID) classified between daylight status, wiper status, and headlamp status. The Penn State team sought to explore changes in driver behavior through curves with and without the CSW alert sys- tem activated. To determine if differences existed in driver lon- gitudinal and lateral speed behavior between the first week (no alerts provided to driver) and Weeks 2 to 4 (alerts provided), 41 drivers were sampled by plotting longitudinal and lateral speed versus time for one randomly selected alert. Sampling individual drivers with randomly selected alerts provided some evidence of differences between Week 1 and Weeks 2 to 4, so speed changes for all drivers were modeled to determine differ- ences between the two time periods. One may think of these analyses as part of Steps E, F, and G in Figure 2.8. It was noted that driver adaptation was possibly occurring, so this consider- ation was included in all the analyses conducted. Table 2.8. CSW Alert Counts by System State (Disabled or Enabled) (a) With Duration Summary Number and Duration of CSW Alerts System State All CSW AlertsDisabled Enabled No. of alerts (%) 694 (26.64) 1,911 (73.36) 2,605 (100) Average duration (s) 3.057 2.869 2.919 SD of duration (s) 12.495 10.759 — (b) By Headlamp Status Headlamp Status System State TotalDisabled Enabled 0 (Off) 462 1,263 1,725 2 (Low) 220 625 845 3 (High) 12 23 35 Total 694 1,911 2,605 (c) By Wiper Status Wiper Status System State TotalDisabled Enabled 0 (Off) 638 1,775 2,413 1 (Low) 11 47 58 2 (High) 3 7 10 4 (Intermittent) 42 82 124 Total 694 1,911 2,605 Table 2.9. Count of Alerts by Daylight, Wiper, and Headlamp Status Wiper Status Daylight Total Light Dark Headlamps On Off On Off 0 (Off) 280 1,630 496 7 2,413 1 (Low) 21 22 13 2 58 2 (High) 5 3 2 0 10 4 (Intermittent) 48 61 15 0 124 Total 354 1,716 526 9 2,605

26 responded. Recall also that kinematic data were not obtained from VTTI; the time-based regimes were an attempt to learn something about kinematics during an event of interest and model it. In all these models, variables were input at their collected rate of 10 Hz. This means that there is serial correlation present within the data because multiple observations were made on the same event, closely spaced in time. The Penn State team recognized this as an analysis issue, but sought to learn more about the nature of the interrelationship between the kinematic variables. A thorough review of the literature revealed few useful references that could improve the analysis plan. The team was initially more interested in the associations between variables during alert events and less concerned about the variables’ statistical significance. For this reason, modeling activities were continued, but the team was mindful of the need to return to the correlation issue in the future. Similar arguments can be made about the endogeneity present in the data. Many of the kinematic variables are the result of driver perception and feedback while negotiating and approaching curves. As such, they are part of the same physi- cal and psychological process undertaken by the driver during the driving task; they are not independent predictor variables. Issues of endogeneity and serial correlation were explored through the unsuccessful testing of time–series models. Cohort-based approaches were used to fully integrate expo- sure with event occurrence. Cohort-Based Approaches Research Questions 1 to 3 focus on the identification of sur- rogates and the evaluation of behavioral and contextual cor- relates of surrogates. Research Question 5 (If elucidative evidence does emerge in terms of attitudinal correlates and how their interactions vary by context, is it plausible to parse out the marginal effects of various context variables on crash risk by suitable research design?) focuses on the definition of relative risk and exposure-based risk, especially vis-à-vis con- text. Such assessments can be made using a cohort-based approach. The cohort design can be used to formulate an exposure-based model relating potential risk factors to sev- eral possible outcomes. The cohort design is well suited to account for measures of exposure such as time at risk or dis- tance traveled under specific driving conditions. These mea- sures can be readily obtained from naturalistic studies if the data are suitably structured after collection. Data Structure The proposed cohort analyses begin, in general, with a driver as the unit of analysis; the driver is followed over multiple The overall approach to the modeling considered the rela- tionship between longitudinal speed and combinations of exogenous and kinematic factors. The motivation was to seek kinematic variables that may be particularly good surrogates (e.g., by identifying variables that may influence longitudinal speed entering curves). In order to explore a variable’s util- ity as a surrogate, it was necessary to understand how it was related to other vehicle kinematics. Early data analysis quickly led to a focus on longitudinal speed entering curves as a prom- ising variable. Much of the piecewise linear modeling focused on this variable. The modeling process is shown in Figure 2.10. Regimes refer to the number of separate pieces of longitudinal speed that were modeled. Single-regime models assumed that one equation could be used to estimate the longitudinal speed for each tenth of a second from 5 s before the alert occurred through the com- pletion of the alert. Two-regime models broke this time into two pieces or regimes, and three-regime models into three sep- arate pieces. The first step was to determine basic relationships between longitudinal speed, time, certain important exogenous factors, and several kinematic variables. It was deemed neces- sary to use interaction terms for the kinematic variables to see how they affected each other. The first models developed were single-regime linear regression models using both main effect kinematic vari- ables and first-order interaction terms among the variables. After studying the results of these models, the team deter- mined that kinematic characteristics changed too much over time, so time periods for each model were divided into two, and eventually three, regimes. The goal here was to better understand the nature of the relationship between the triggering of the alert for excessive curve speed entry and the detailed vehicle dynamics. Recall that the alert is being used as a substitute for a crash event. The team wanted to better understand how the kinematic measures interacted before the alert and during the alert as the driver Step 4: Time-Series Analysis Step 5: Cohort-based Approaches Step 3: 3-Regime Models Step 2: Pure Linear 2-Regime Models Step 1: Pure Linear 1-Regime Models Figure 2.10. Flow chart depicting modeling of longitudinal speed entering curve in UMTRI data.

27 Table 2.11, a second sample table, shows how the indi- vidual outcomes can be grouped, if needed, for each cohort. Each unique combination of driver and context variables is now listed with the cumulative time or distance—a measure of exposure to risk. Notice that each cohort includes the sum of individual trip segments and their outcomes. Each driver’s outcomes are aggregated and matched to context. The sum of the “1” values in the Outcome column in Table 2.10 are the number of events of interest for that cohort. The length and time variables from Table 2.10 are also summed to derive the total time and total distance for each driver in each context. Note that the trips without an event of interest (i.e., outcome zero) are summed and included in the corre- sponding total distance and time for each cohort. A dummy variable designation is employed for the context variables and driver attributes. This structure allows for the computation of exposure- based risk (addressing Research Question 3). At the choice of the analyst, the cohort data can remain in the individual trip form of Table 2.10 with essentially a 0, 1 outcome (and the implied use of categorical dependent variables to be mod- eled), or the data can be aggregated as suggested in Table 2.11, and a count regression approach can be used to estimate the number of events in each cohort. There is also flexibility in the definition of the events of inter- est. In the present study, alerts were used as the dependent vari- able because they were available in the UMTRI data set. In the trips throughout the course of the study. Each driver is asso- ciated with specific attributes that are constant, such as age, gender, driver attitudinal measures, and vehicle type and characteristics. Other variables can change throughout the course of the study and within each trip (e.g., roadway type, roadway characteristics, environmental factors, driver dis- traction, driver impairment, and driving speed). A subset of these variables can be used to define a cohort—that is, a trip segment that is homogeneous with respect to the variables of interest. Travel time and/or distance may thus be accumu- lated during the study for individual drivers in each defined context (i.e., a homogeneous trip segment). Travel undertaken in each homogeneous trip segment would then be aggregated to determine total exposure and total number of events within a cohort. A cohort thus repre- sents a set of drivers, by type, who experience travel over defined homogeneous trip segments characterized by time or distance of travel. The number of events of interest (e.g., crashes or other events) occurring for a cohort is thus accu- mulated across identical drivers, retaining the number of events and/or the time between events for each driver. This concept is illustrated in Table 2.10, a sample table that contains the initial cohort data in which a particular outcome (i.e., an event or nonevent) occurs after some period of time or length of travel. The context and driver attributes are selected by the researcher depending on the issues to be explored. Table 2.10. Initial Cohort-Based Data Structure for UMTRI CSW Alerts Outcome (0/1) Length Time Context (all context variables needed) Driver Attributes (as many as needed) Table 2.11. Summed Event Outcomes by Context and Driver Attributes with Exposure Measures No. of Outcomes (count) Total Length (vehicle mile) Total Time (vehicle hour) Context (all context variables needed) Driver Attributes (as many as needed)

28 Cohort-Based Count Regression and Event Analysis The initial analysis involved the use of traditional NB count regressions to show how both context- and driver-related variables affect the likelihood of alert occurrence. The first set of sample models included all drivers, but the data were dis- aggregated by the roadway functional classification used for the homogeneous trip segment data set. Two functional classes were used to illustrate these models: Functional Class 1—limited access (limited-access freeway); and Func- tional Class 3—nonlimited access (minor surface). The response variable was either the number of LDW or CSW alerts (not the total number of alerts). The eight models esti- mated are summarized in Table 2.14. The initial predictors considered included the following: • Context variables: ramp (for nonlimited access), urban/ rural, day/night, wet/dry (based on windshield wiper use), and RDCW disabled/enabled status; and • Driver variables: gender, education, years of driving expe- rience, last year’s mileage driven, use of glasses or contacts, and smoker/nonsmoker. It may not be appropriate or useful to include kinematic variables in a specification or model of this type because the averages of kinematic values over homogeneous trip seg- ments may not represent what is actually happening during the course of the traversal of the entire segment. In addi- tion, the aggregation of average values for each kinematic variable may be problematic because they may be affected by factors that could be used to redefine homogenous trip segments, but they are not included in the data set. For example, suppose additional variables were included in the data set, including curve and tangent presence, presence of an intersection, traffic volume, and grade. These could be used to redefine homogeneous trip segments. Once the seg- ments are redefined, average speeds would more accurately reflect travel speeds on each segment. The UMTRI event-based models used binary logit struc- ture, including single- and multilevel specification, similar to those used in the VTTI event-based models. Summary This chapter describes the model structures applied to the VTTI and UMTRI data sets in order to identify prospective views of methodological paradigms. For each data set, the Penn State team described why it developed the specific model para- digms and how the paradigms related to the proposed research questions. The next chapter presents a summary of the results of the empirical investigation. larger data set available in SHRP 2 Safety Project S07, crashes could certainly be used, or even crashes of a specific type such as roadway departure or intersection-related crashes. For the UMTRI data, the Penn State team demonstrates both the categorical-outcome models using logistic regression and survival models along with count regression models using data formulated as shown in Table 2.11. This data set allows the estimation of a count regression model of the probability of having Y events during the study period. An estimation is formulated of the mean of the underlying probability distribu- tion (such as Poisson and NB). Once this basic structure is obtained, several additional analyses may be undertaken beyond the basic count regression: 1. The week of the study can be included from the UMTRI data to test driver adaptation with the RDCW system installed. This information is not required with general naturalistic driving data, but it provides an opportunity to test for learning. As with context and driver attributes, there would be dummy variables to describe each week of the study, with Week 1 as the baseline. 2. A case–control formulation is possible from the basic data (Table 2.10); each row in the data set is either a case (Y = 1) or a control (Y = 0). While there may be large variability in the data, such a model can be formulated and estimated using different random samples of controls. Table 2.12 shows the structure used to define cohorts in the UMTRI data on the basis of rural or urban settings, road- way functional classification, ramp presence, and lighting conditions. The model would provide an estimate (through parameter values) of the effect of each of these factors on the outcome measure (e.g., CSW or LDW alerts). Risk and Relative Risk Risk was calculated for each cohort as the number of total alerts, CSW alerts, and LDW alerts (each analyzed separately) divided by the total exposure (time) to the specific environ- ment (see Table 2.13 for example calculations of risk and relative risk, using RDCW system data). In particular, each cohort was compared with the baseline cohort (Cohort 3) to determine the relative risk. Relative risk (RR) is a ratio of the probability of the event occurring in the exposed group ver- sus a nonexposed group: RR P P = exposed non-exposed These basic calculations aimed to satisfy the proposal com- mitment to estimate exposure-based risk and relative risk.

(continued on next page) Setting Functional Classification Ramp Day/ Night Cohort No. Total Alerts CSW LDW Segment Time (vehicle hour) Segment Distance (mi) Urban FC1, limited access yes day 1 NA NA NA NA NA night 2 NA NA NA NA NA no day 3 2,399 146 2,253 4,778.16 31,379.245 night 4 945 49 896 1,466.784 9,693.391 FC2, limited access yes day 5 NA NA NA NA NA night 6 NA NA NA NA NA no day 7 1,074 56 1,018 1,494.564 9,444.842 night 8 526 23 503 572.268 3,827.514 FC3, limited access yes day 9 NA NA NA NA NA night 10 NA NA NA NA NA no day 11 28 3 25 42.94 228.528 night 12 12 2 10 18.264 116.138 FC1, non limited access yes day 13 248 134 114 153.08 760.026 night 14 100 36 64 45.076 264.462 no day 15 0 0 0 0.821 1.446 night 16 0 0 0 1.84 1.537 FC2, non limited access yes day 17 292 178 114 119.327 548.112 night 18 121 58 63 38.313 198.587 no day 19 324 124 200 975.078 3,346.723 night 20 93 24 69 215.723 818.44 FC3, nonlimited access yes day 21 448 353 95 203.914 801.668 night 22 117 77 40 60.693 264.53 no day 23 902 184 718 3,750.3 10,563.31 night 24 388 52 336 994.524 3,264.723 FC4, nonlimited access yes day 25 234 204 30 103.732 320.584 night 26 74 61 13 34.36 123.676 no day 27 1,653 228 1,425 5,972.7 17,149.845 night 28 640 65 575 1,569.57 4,797.187 FC5, nonlimited access yes day 29 5 4 1 3.19 7.571 night 30 0 0 0 0.516 2.028 no day 31 234 198 36 3,583.62 5,779.455 night 32 77 59 18 1,086.198 1,684.258 No functional class no day 65 NA NA NA NA NA night 66 NA NA NA NA NA day 67 400 0 400 5,694.9 21,623.717 night 68 135 0 135 1,230.654 3,559.623 Table 2.12. Cohort Structure 29

30 Setting Functional Classification Ramp Day/ Night Cohort No. Total Alerts CSW LDW Segment Time (vehicle hour) Segment Distance (mi) Rural FC1, limited access yes day 33 NA NA NA NA NA night 34 NA NA NA NA NA no day 35 163 9 154 680.61 5,053.614 night 36 87 2 85 136.287 1,025.625 FC2, limited access yes day 37 NA NA NA NA NA night 38 NA NA NA NA NA no day 39 169 6 163 433.368 3,079.085 night 40 63 0 63 101.293 755.244 FC3, limited access yes day 41 NA NA NA NA NA night 42 NA NA NA NA NA no day 43 0 0 0 0.141 0.773 night 44 NA NA NA NA NA FC1, non limited access yes day 45 11 8 3 3.223 18.804 night 46 0 0 0 0.003 0.026 no day 47 NA NA NA NA NA night 48 NA NA NA NA NA FC2, nonlimited access yes day 49 7 6 1 2.653 12.312 night 50 0 0 0 0.674 3.746 no day 51 50 0 50 244.535 1,471.637 night 52 6 0 6 13.984 89.434 FC3, nonlimited access yes day 53 23 21 2 6.844 20.91 night 54 2 1 1 0.554 2.293 no day 55 215 44 171 644.496 3,331.434 night 56 41 6 35 85.865 433.793 FC4, nonlimited access yes day 57 25 22 3 7.433 26.621 night 58 9 7 2 2.717 9.983 no day 59 407 90 317 872.292 4,041.153 night 60 165 13 152 250.577 1,106.478 FC5, nonlimited access yes day 61 0 0 0 0.246 0.804 night 62 NA NA NA NA NA no day 63 52 46 6 300.024 929.395 night 64 5 5 0 67.803 213.677 No functional class no day 69 NA NA NA NA NA night 70 NA NA NA NA NA day 71 63 0 63 355.584 1,672.599 night 72 2 0 2 25.996 36.334 Table 2.12. Cohort Structure (continued)

31 Cohort Risk (time) RR (time) Risk (distance) RR (distance) CSW LDW CSW LDW CSW LDW CSW LDW 3 0.0306 0.4715 1.000 1.000 0.0047 0.0718 1.000 1.000 4 0.0334 0.6109 1.093 1.296 0.0051 0.0924 1.086 1.287 7 0.0375 0.6811 1.226 1.445 0.0059 0.1078 1.274 1.501 8 0.0402 0.8790 1.315 1.864 0.0060 0.1314 1.292 1.830 11 0.0699 0.5822 2.287 1.235 0.0131 0.1094 2.821 1.524 12 0.1095 0.5475 3.584 1.161 0.0172 0.0861 3.701 1.199 13 0.8754 0.7447 28.648 1.579 0.1763 0.1500 37.894 2.089 14 0.7986 1.4198 26.137 3.011 0.1361 0.2420 29.257 3.371 15 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 16 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 17 1.4917 0.9554 48.819 2.026 0.3248 0.2080 69.798 2.897 18 1.5138 1.6443 49.544 3.487 0.2921 0.3172 62.772 4.418 19 0.1272 0.2051 4.162 0.435 0.0371 0.0598 7.963 0.832 20 0.1113 0.3199 3.641 0.678 0.0293 0.0843 6.303 1.174 21 1.7311 0.4659 56.655 0.988 0.4403 0.1185 94.639 1.650 22 1.2687 0.6591 41.520 1.398 0.2911 0.1512 62.561 2.106 23 0.0491 0.1915 1.606 0.406 0.0174 0.0680 3.744 0.947 24 0.0523 0.3379 1.711 0.717 0.0159 0.1029 3.423 1.433 25 1.9666 0.2892 64.361 0.613 0.6363 0.0936 136.766 1.303 26 1.7753 0.3784 58.102 0.802 0.4932 0.1051 106.007 1.464 27 0.0382 0.2386 1.249 0.506 0.0133 0.0831 2.857 1.157 28 0.0414 0.3663 1.355 0.777 0.0135 0.1199 2.912 1.669 29 1.2540 0.3135 41.041 0.665 0.5283 0.1321 113.546 1.840 30 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 31 0.0553 0.0100 1.808 0.021 0.0343 0.0062 7.363 0.087 32 0.0543 0.0166 1.778 0.035 0.0350 0.0107 7.529 0.149 35 0.0132 0.2263 0.433 0.480 0.0018 0.0305 0.383 0.424 36 0.0147 0.6237 0.480 1.323 0.0020 0.0829 0.419 1.154 39 0.0138 0.3761 0.453 0.798 0.0019 0.0529 0.419 0.737 40 0.0000 0.6220 0.000 1.319 0.0000 0.0834 0.000 1.162 43 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 45 2.4823 0.9309 81.240 1.974 0.4254 0.1595 91.438 2.222 46 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 49 2.2616 0.3769 74.016 0.799 0.4873 0.0812 104.737 1.131 50 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 51 0.0000 0.2045 0.000 0.434 0.0000 0.0340 0.000 0.473 52 0.0000 0.4290 0.000 0.910 0.0000 0.0671 0.000 0.934 53 3.0686 0.2922 100.425 0.620 1.0043 0.0956 215.847 1.332 (continued on next page) Table 2.13. Risk and Relative Risk for Each Cohort

32 Table 2.14. Division of Single-Level Model Types by Exposure Measure, Functional Class, and Alert Type Distance as exposure Functional Class 1—limited access CSW 1 LDW 2 Functional Class 3—nonlimited access CSW 3 LDW 4 Time as exposure Functional Class 1—limited access CSW 5 LDW 6 Functional Class 3—nonlimited access CSW 7 LDW 8 Cohort Risk (time) RR (time) Risk (distance) RR (distance) CSW LDW CSW LDW CSW LDW CSW LDW 54 1.8044 1.8044 59.052 3.827 0.4361 0.4361 93.724 6.074 55 0.0683 0.2653 2.234 0.563 0.0132 0.0513 2.839 0.715 56 0.0699 0.4076 2.287 0.864 0.0138 0.0807 2.973 1.124 57 2.9598 0.4036 96.867 0.856 0.8264 0.1127 177.618 1.570 58 2.5767 0.7362 84.329 1.561 0.7012 0.2003 150.705 2.790 59 0.1032 0.3634 3.377 0.771 0.0223 0.0784 4.787 1.093 60 0.0519 0.6066 1.698 1.286 0.0117 0.1374 2.525 1.913 61 0.0000 0.0000 0.000 0.000 0.0000 0.0000 0.000 0.000 63 0.1533 0.0200 5.018 0.042 0.0495 0.0065 10.638 0.090 64 0.0737 0.0000 2.413 0.000 0.0234 0.0000 5.029 0.000 67 0.0000 0.0702 0.000 0.149 0.0000 0.0185 0.000 0.258 68 0.0000 0.1097 0.000 0.233 0.0000 0.0379 0.000 0.528 71 0.0000 0.1772 0.000 0.376 0.0000 0.0377 0.000 0.525 72 0.0000 0.0769 0.000 0.163 0.0000 0.0550 0.000 0.767 Table 2.13. Risk and Relative Risk for Each Cohort (continued)

Next: Chapter 3 - Data Description and Modeling Results »
Analysis of Existing Data: Prospective Views on Methodological Paradigms Get This Book
×
 Analysis of Existing Data: Prospective Views on Methodological Paradigms
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s second Strategic Highway Research Program (SHRP 2) Report S2-S01B-RW-1: Analysis of Existing Data: Prospective Views on Methodological Paradigms investigates structured modeling paradigms for the analysis of naturalistic driving data.

This report is available only in electronic format.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!