**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

**Suggested Citation:**"Appendix B - Risk Measures and Statistical Methods." National Academies of Sciences, Engineering, and Medicine. 2019.

*Guidelines for Traversability of Roadside Slopes*. Washington, DC: The National Academies Press. doi: 10.17226/25539.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

B-1 A P P E N D I X B Risk Measures and Statistical Methods This appendix is intended as a more detailed technical description of the risk measures and statistical methods used in the crash data analysis to identify vehicle types that are more likely to roll over on slopes. The appendix is organized in following sub-sections: B.1 Risk Measures and Statistical Inferences B.2 Dealing with Crash Data Limitations B.3 Dealing with Sampling Errors in GES Data Mathematical notations used in this appendix are kept simple on purpose. Specifically, notations are not structured to emphasize the distinction between random variables, statistical estimators, observed values (realizations), and sample estimates, as is the case in more rigorous statistical literatures. However, the meaning should be clear from the context under which a particular variable, measurement, or estimate is discussed. B.1 RISK MEASURES AND STATISTICAL INFERENCES A desirable setting of crash data for evaluating the effect of vehicle types on slope rollover risk is shown in Table B.1, in which passenger vehicles are assumed to be grouped into V types. For a specific time period of interest, the number of slope-related rollover crashes and slope-related non-rollover crashes for Type v vehicles are represented by vr and vn , respectively, where v 1, 2, ..., V. In addition, the total number of slope-related crashes, which is the sum of slope- related rollover crashes and slope-related non-rollover crashes, is represented by vs ( vv nr ) for Type v vehicles, where again v 1, 2, ..., V. Furthermore, the total number of slope-related rollover crashes, summed over all passenger vehicle types, is expressed as R (= V j j r 1 ), while the total number of slope-related non-rollover crashes is represented by N ( V j j n 1 ). Finally, the total number of slope-related crashes, including both rollover and non-rollover crashes, is denoted as S ( NR ). Based on the data setting, composition ratio, CR, and relative risk, RR, as discussed in the main text, are defined next.

B-2 Guidelines for Traversability of Roadside Slopes Total V j j rR 1 V j j nN 1 V j j V j j nrNRS 11 Vehicle CR is derived by dividing the proportion of a specific type of vehicles in passenger vehicles involved in slope-related rollover crashes by the proportion of the same type of vehicles in passenger vehicles involved in all slope-related crashes, which include both rollover and non-rollover crashes. It can be expressed mathematically as )CR( NR nr R r v vv v A CR value of 1.0 indicates equal representation, values between 0 and 1 indicate underrepresentation, and values greater than 1 indicate overrepresentation. Putting the statistical uncertainty aside, it can be shown that a vehicle belongs to a vehicle type with a higher CR value is more likely to rollover on slopes than a vehicle belongs to a vehicle type with a lower CR value. Because the variables involved in computing the proportions in the CR measure are correlated in a rather complex manner, the statistical uncertainty, due to the inherent randomness of the occurrence of crashes and sampling errors associated with the process of selecting a sample, is hard to estimate. For example, it can be seen that the rollover frequency vr is present explicitly in Equation B.1 and implicitly as part of the R variable. Thus, vr is in every level of the denominator and numerator in Equation B.1, which makes all the variables involved in the equation to be statistically correlated. This makes it extremely difficult to estimate the statistical uncertainty of an estimated CR with an acceptable precision. As a result, the statistical uncertainty of CR measures is not reported in practice. In this study, RR can be used to compare the slope-related rollover risk of one vehicle type to that of another (e.g., compact utility to 4-door sedan), between groups of vehicle types (e.g., utility vehicles to pickup trucks), or one particular vehicle type to that of all the rest of the passenger vehicle types (e.g., 2-door sedan to all other passenger vehicle types). The following equation compares the rollover risk of two vehicle types. v)RR(u, nr r nr r vv v uu u (B.1) (B.2) Table B.1. Number of slope-related crashes by vehicle type and rollover status. Passenger Vehicle Type Number of Slope- Related Rollover Crashes Number of Slope- Related Non-Rollover Crashes Total Number of Slope- Related Crashes 1 1r 1n 111 nrs 2 2r 2n 222 nrs â¦.. â¦.. â¦.. â¦.. v vr vn vvv nrs â¦.. â¦.. â¦.. â¦.. V Vr Vn VVV nrs

Risk Measures and Statistical Methods B-3 An RR > 1 means the event (rollover on slopes) is more likely to occur in the study group than in the comparison group. Under the assumption that the inherent randomness of the occurrence of crashes is Poisson distributed, the standard error (se) of the natural logarithm of the relative risk can be approximately estimated as follows (1, 2): 1111 v)))u,se(log(RR(SE nrrnrr vvvuuu When the number of non-rollover crashes is much larger than the number of rollover crashes ( uu rn and vv rn ), a conservative estimate of SE, which gives a slightly SE (than the true SE), is 11 v)))u,se(log(RR(SE rr vu The lower and upper limits in the 95% confidence interval for RR can be approximated as follows (3): )SEexp(-1.96v)RR(u, L and SE)exp(1.96v)RR(u,U Similarly, we can calculate RR for one particular vehicle type to that of the rest of the passenger vehicle types as follows: v)~RR(v, nrNR rR nr r vv v vv v where the symbol â v~ â stands for ânot vâ or âall the rest of the passenger vehicle types.â Descriptively, we say that a Type v vehicle is RR(v,~v) times as likely as a non-Type v passenger vehicle to rollover on slopes. Letâs take RR (compact utility, non-compact utility passenger vehicles) = 2.4 as an example. This RR value indicates that a compact utility vehicle is 2.4 times as likely as a vehicle from the rest of the passenger vehicle types to rollover on slopes. Under the same assumption that the inherent randomness of crashes is Poisson distributed, the standard error of the natural logarithm of the relative risk in Equation (B.6) can be approximately calculated as: 1111 v)))~v,se(log(RR(SE nrNRrRnrr vvvvvv (B.3) (B.4) (B.6) (B.7) (B.5a) (B.5b) Note that it can be shown that ))/CR(CR( )RR( vuu,v . Descriptively, we say that a Type u vehicle is RR(u,v) times as likely as a Type v vehicle to rollover on slopes. The interpretation of RR values between a study group (Type u vehicles) and a comparison group (Type v vehicles) is as follows: An RR = 1 means there is no difference in risk between the two groups. An RR < 1 means the event (rollover on slopes) is less likely to occur in the study group than in the comparison group.

B-4 Guidelines for Traversability of Roadside Slopes When vrR , i.e., when the number of slope-related rollover crashes for the non-Type v vehicle types is much larger than that of the Type v vehicle, the equation can be further reduced to 1 v)))~v,se(log(RR(SE rv This is a reasonable approximation for all vehicle types presented in Table B.1, where we have 703,11R and the maximum vr is for the 4-door sedan with 2,574 rollover crashes. If we take a confidence interval with both lower and upper limits to be within 10% of the RR estimate as an acceptable statistical uncertainty, then following Equations B.5a, (B.5b, and (B.9), we need to satisfy 9.0) 1 96.1exp()SEexp(-1.96 rv and 1.1) 1 96.1exp()SEexp(1.96 rv Thus, the number of slope-related rollovers, vr , needs to be about 425 to satisfy both inequality constraints above. As a rule of thumb, regardless of how vehicles are classified, we will need to have at least 400 slope-related rollover crashes for the RR of a vehicle type to that of all the rest of the vehicle types to be estimated within the acceptable precision level indicated above. B.2 DEALING WITH CRASH DATA LIMITATIONS The desirable setting of crash data shown in Table B.1 is not readily available from the existing crash databases for several reasons, including the lack of roadside geometric data, the complexity of run-off-road crash events, and the limitation of the sequence-of-events data coded for each crash. For example, none of the databases provides variables that allow slope-related rollover crashes to be directly identified and, as a consequence, they need to be inferred from the existing variables in the database. Thus, as in previous studies, an important task of the data analysis is to identify variables in the existing crash databases that can be used to determine whether encroaching on slopes was a likely pre-event that contributed to the rollover. Moreover, these variables should allow the crash frequencies in Table B.1 to be estimated in some way so that the risk measures and associated statistical inferences presented in the last section can be derived and that the comparison of rollover probability between vehicle types from the derived risk measures is pertinent to the study and statistically valid. (B.9) (B.10a) (B.10b) The same formula as in Equations (B.5a) and (B.5b) can be used to calculate the lower and upper limits of the 95% confidence interval. Moreover, when the number of non-rollover crashes is much larger than the number of rollover crashes ( vv rn and vv rRnN ), a conservative estimate of SE is 11 v)))~v,se(log(RR(SE rRr vv (B.8)

Risk Measures and Statistical Methods B-5 slope-related non-rollover crashes, vn , need to be estimable or at least estimable up to a constant, which is not the case with the crash data in consideration. Table B.2. Number of slope-related rollover crashes and a surrogate measure for slope- related crashes. Passenger Vehicle Type Number of Slope- related rollover Crashes Number of Slope-Related Non- Rollover Crashes Total Number of Slope-Related Crashes 1 1r 111 rcmn 11 cms 2 2r 222 rcmn 22 cms â¦.. â¦.. â¦.. â¦.. v vr vvv rcmn vv cms â¦.. â¦.. â¦.. â¦.. V Vr VVV rcmn VV cms Total V j j rR 1 RcMrmcN V j V j jj1 1 cMmcS V j j1 Data screening criteria to select a subset of crashes, as well as the procedures to estimate the number of slope-related crashes and the distribution of slope-related rollover crashes, have been presented in the main text. Under the new data setting in Table B.2, estimation of risk measures and associated statistical inferences are provided next. Under the new data setting in Table B.2, the vehicle CR is reformulated as follows: )CR( M m R r cM cm R r v v v v v where vr the number of slope-related rollover crashes for Type v vehicles, R is the total number of slope-related rollover crashes summed over all passenger vehicle types, vm is the number of fixed-object crashes for Type v vehicles, and M is the total number of fixed-object crashes across all passenger vehicle types. This particular measure was used in a previous study by Viner (2). (B.11) From the existing variables in FARS and GES databases, parts of the crash frequencies in Table B.1 can be estimated and they are shown in Table B.2. Specifically, the number of slope- related rollover crashes by vehicle type, represented by vr , can be inferred, slope-related non- rollover crashes, vn , cannot be estimated, and the total number of slope-related crashes, vs , can only be estimated up to a constant, i.e., vs = vcm , where vm is a surrogate measure estimable from the database, and c is an unknown constant. Note that odds ratio was mentioned in the main text as another popular measure of risks in scientific applications. In order to use this measure, the

B-6 Guidelines for Traversability of Roadside Slopes Using the same assumption and notation described in the last section, the standard error of the natural logarithm of the relative risk can be approximately estimated as follows: 1111 v)))u,se(log(RR(SE cmrcmr vvuu Again, using the same assumption as in the last section that the number of non-rollover crashes is much larger than the number of rollover crashes ( uu rcm and vv rcm ), a conservative estimate of SE is reduced to Equation (B.4), and the lower and upper limits in the 95% confidence interval for RR are computed in the same way as in Equation (B.5). As in Equation (B.6), RR for one particular vehicle type to that of all the rest of the passenger vehicle types is reformulated under the new data setting as: )( v)~RR(v, mM rR m r mMc rR cm r v v v v v v v v In addition, the standard error of the natural logarithm of the relative risk is approximately estimated as: )( 1111 v)))~v,se(log(RR(SE mMcrRcmr vvvv Assuming that the number of non-rollover crashes is much larger than the number of rollover crashes, the conservative estimate of SE as presented in Equation (B.8) is still applicable for Equation (B.14). (B.13) (B.14) (B.15) Similarly, under the new data setting, the equation comparing the rollover risk of two vehicle types is reformulated as: v)RR(u, m r m r cm r cm r v v u u v v u u where ur and um are the number of slope-related rollover and fixed-object crashes for Type u vehicles, respectively, and vr and vm are the number of slope-related rollover and fixed-object crashes for Type v vehicles, respectively. (B.12)

Risk Measures and Statistical Methods B-7 based data like GES data. Also, less sophisticated statistical methods usually assume the observations from individual sample cases are independent, which tend to understate the statistical uncertainty of estimates produced from such data. Specialized survey-based statistical methods are needed to produce unbiased estimates and to account for the statistical uncertainty of the estimates resulting from the stratified and clustered nature of the sampling design. A large number of textbooks on the statistical aspects of the sampling design and related data analysis are available. Sampling: Design and Analysis (Lohr 2010) is an excellent example. When the relative risk measure was discussed in the last section, the inherent randomness of the occurrence of crashes was the only source of statistical uncertainties considered in the development of inferential procedures. With the use of a sampling-based crash database, such as GES, additional sources of statistical uncertainties due to the sampling design need to be taken into account. What follows is a description of how the additional statistical uncertainties due to sampling are quantified in the inferential procedures adopted by this study. The sampling design adopted by GES is typically referred to as a stratified multi-stage cluster-sampling design. For such a design, the jackknife statistical procedure is commonly applied to calculate the statistical uncertainty of an estimate. In this analysis, the delete-one-PSU at a time jackknife procedure was employed (see, Chapter 9 in Lohr 2010). First, let Ë be the estimator of interest, specifically, the standard error of the natural logarithm of the relative risk (denoted by SE in previous sections). Also, let )(Ì jh be the estimator of the same form as Ë when PSU j and stratum h is omitted. To calculate )(Ì jh , a new weight variable is first defined as: PSUin not but stratumin is case sample if 1 stratum of PSUin is case sample if0 stratumin not is case sample if )( jhiw n n hji hiw w i h h i jhi (B.16) intensively than others on purpose to reduce data collection cost and to achieve certain levels of precision when data are used to estimate certain parameters of policy and legislative interests. Moreover, sampled crashes are not statistically independent because of the sampling from geographical clusters, called primary sampling units (PSUs). As discussed in the main text, GES divides the U.S. into 1,195 PSUs and 60 PSUs across the U.S. are sampled. The resulting sample can be very different from the overall crash population from which it is drawn and about which inferences are to be made. Conventional statistical methods, which are based on simple random sampling (SRS) or equal selection probability sampling assumption, are not appropriate for analyzing a survey- B.3 DEALING WITH SAMPLING ERRORS IN GES DATA Crashes selected for inclusion in GES are a probability sample of crashes, not a census of all crashes as is the case with FARS. By design, it is a biased sample, in which crashes are selected with unequal probabilities. Stratified by the severity of the injured, type of vehicles involved, and tow status of the vehicles involved, some crash sub-populations are sampled more

B-8 Guidelines for Traversability of Roadside Slopes )Ë(Ë* 2 VSESE JK where SE is estimated using Equations (B.4) or (B.8) with the crash frequencies in the equation replaced by weighted crash frequencies, which are unbiased estimates of the frequencies when the sampling weights provided in the GES database are used. The lower and upper limits in the 95% confidence interval for RR are to be estimated with Equation (B.5), with SE in the equation replaced by SE*. In this study, it turns out that the statistical uncertainty, SE*, is completely dominated by the sampling errors due to the clustered nature of the sampling design in GES, i.e., the size of )Ë(JÌKV . Specifically, out of the eight vehicle body types focused in the analysis, the size of )Ë(JÌKV is about 13 to 22 times that of SE. (B.18) where iw is the sampling weight of sample case i and hn is the number of PSUs in stratum h . Then use the new weight )( jhiw to calculate )(Ì jh and estimate the variance of the estimator Ë as follows: )ËË( 1 )Ë(Ë 2)( 11 n n V jh n j H h h h JK h where H is the total number of strata. The standard error of the estimator is the square root of )Ë(JÌKV . Assuming that the two sources of statistical uncertainties, i.e., inherent randomness of crashes and sampling design, are independent and additive, a combined standard error for the natural logarithm of the relative risk (denoted by SE*) is calculated as: (B.17) REFERENCES 1. Katz, D., J. Baptista, S. Azen, and M. Pike. 1978. Obtaining Confidence Intervals for the Risk Ratio in Cohort Studies. Biometrics, Vol. 34, 469â474. 2. Selvin, S. 2011. Statistical Tools for Epidemiologic Research. Oxford University Press, New York. 3. Woodward, M. 1999. Epidemiology: Study Design and Data Analysis. Chapman & Hall/CRC, London.