Read "A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors" at NAP.edu

« Previous: Chapter 8 - Orthogonal Studies

Page 40

Suggested Citation:"Chapter 9 - Transferability to Other Data Sources." National Academies of Sciences, Engineering, and Medicine. 2013. A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors. Washington, DC: The National Academies Press. doi: 10.17226/22849.

Page 41

Page 42

Page 43

Page 44

Page 45

Page 46

Page 47

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

40 C h a p t e r 9 The following sections describe use of the geospatial methods described previously to data mine field operational test (FOT) data in support of work to identify potential crash surrogate measures. In the previous sections, the driving data was collected during the University of Michigan Transportation Research Institute (UMTRI) FOTs. The roadway data were collected from various southeastern Michigan data sets. In this chapter, similar geospatial techniques will be tested against different data sets to explore the feasibility of transferring techniques to researchers with different data. Specifically, the purpose here was to dem- onstrate the same GIS-based data preparation methods using different naturalistic and roadway data. Data Used for testing The naturalistic driving data (NDD) used in this demonstra- tion were collected primarily in Virginia during the 100-Car Naturalistic Driving Study (Dingus et al. 2006). The roadway data used include databases from the Virginia Department of Transportation (VDOT) and the Federal Highway Admin- istration (FHWA). The VDOT data were provided by staff in the VDOT Transportation Engineering Division, and the FHWA data were provided by the FHWA Office of Highway Policy Information. Crash Data VDOT crash data containing information recorded in police crash reports were used. These data include the latitude and longitude, date and time, jurisdiction, collision type, weather condition, surface condition, and lighting conditions of the crashes. Additionally, crashes from the 100-Car Study were used. Because the UMTRI method is intended to be used to study run-off-road crashes, only the crashes in the 100-Car Study that were classified as run-off-road crashes were used in this work. Of the 69 crashes in the 100-Car Study data set, 26 were run-off-road crashes. One of those crashes did not have valid GPS data, leaving 25 crashes available for use in this work. Highway Data The VDOT road data are geospatial, meaning that they include geographic location and shape information that can be mapped using computer software. Each road in the Virginia road net- work is made up of multiple segments of varying lengths. These segments are represented by lines in a data file, which indicate the centerline of the roadway. The centerline data also contain fields that hold attributes describing the section of roadway rep- resented by each line. The attributes contained in these fields include the name of the road, the route number, the beginning and ending mile points, route category, and jurisdiction (i.e., county, city, or town), and segment length. Highway Performance Monitoring System (HPMS) data were also used. FHWA gathers the data from each of the states annually and maintains the HPMS database. This database includes descriptive information for individual segments of the national highway network. The attributes available for each segment include information such as the state and county where the segment is located, the AADT, the length of the segment, and the number of through lanes. These data were delivered in a comma-delimited ASCII text file and are not geospatial. A list of the 20 road segments in northern Virginia with the most vehicle fatalities and injuries during the years 2002 to 2004 was also obtained from VDOT. This list identifies those crash hot spots by route number and the beginning and end- ing mile points. It also includes descriptive attributes such as segment length, the number of crashes in the corridor, the number of those crashes that included an injury, the number of crashes with a fatality, and the crash rate. These data are not geospatial, and the segments do not necessarily correspond to the individual road segments in the VDOT centerline data representing the Virginia road network. Transferability to Other Data Sources

41 with the road data, which allowed counting crashes on each road segment in the Virginia highway network. After joining the VDOT road data and HPMS data, more attributes, such as number of lanes and AADT, were associated with the road seg- ments in the VDOT data. Finally, naturalistic driving data were joined with their corresponding road segments. This process is described more completely in the following sections. Associate Road Attributes and Crash Statistics for Road Segments The first portion of this effort was to establish that state crash statistics could be associated with state roadway segment data. This processing will be of use in computing segment crash rates using whatever methods are appropriate for a given research question. Using GIS allows data to be joined by using spa- tial information. In a spatial join, data are joined based on geographic location instead of using attributes from columns within a table. The GPS data from the VDOT run-off-road crashes were plotted in a layer in GIS along with a layer of VDOT highway centerline data. The features in the two layers were then spatially joined. The result was a geospatial data set that can be plotted in GIS or queried like a table in a relational database. For this demonstration task, a query was written to count crashes that have occurred on each road segment in the VDOT road data and to identify the totals by route number. Values from a sample of 15 of these segments are shown in Table 9.1. Software Tools The geospatial procedures used to prepare the naturalistic data for further analysis are dependent on the functions of GIS software. GIS encompasses a number of computer data man- agement systems that specialize in geographic data and can be used to store, graphically represent, or analyze geographic data. ArcMap 9.3 from Esri with an ArcInfo license was used here. Structured Query Language (SQL) was used to query rela- tional database tables that were stored in a Microsoft SQL Server 2005 environment. SQL queries were executed in SQL Server Management Studio and in ArcMap. 100-Car Naturalistic Data Description The time series data stored in the 100-Car Study include numeric measures such as vehicle speed, range to other vehicles, geographic location, lateral acceleration, longitudinal accelera- tion, and yaw, as well as video of the driver and surrounding environment. The 100-Car data describe what is occurring over time, both in noncrash situations and during actual crashes and near crashes. Within these data, epochs of driving can be located and analyzed further to identify how people drive in geographic areas of interest and in situations such as during the moments just prior to a crash, to identify what factors may contribute to crashes, and to perform detailed analyses of what occurred dur- ing these crashes. In this case, driving through road segments with high numbers of crashes can be located for further analysis. Comparing 100-Car Data and Anticipated SHRP 2 Field Data In many ways, the 100-Car data resemble the anticipated SHRP 2 field data. The majority of the measures available in the 100-Car data, and described in the previous section, will also be present in the SHRP 2 field data. Some additional mea- sures are anticipated in the SHRP 2 data. For example, some roadways will be mapped in greater detail using a roadway measurement van. An automated method for monitoring the driverâs gaze is also anticipated in SHRP 2. Latitude and longi- tude values collected by GPS are present in 100-Car data and will be available and stored in a database format in SHRP 2 data. These values will permit geospatial-related analyses to be conducted on the SHRP 2 data. Method The different data sources each have their own possibilities, but when they are joined, richer queries and analyses become available. Using spatial joins available in GIS and typical SQL joins based on fields in a database table, the data from the different sources were joined. The crash data were joined Table 9.1. Sample of Road Sections with Number of Crashes from VDOT Roadway and Crash Data Route Number Begin Mile Point End Mile Point Number of Crashes Section Length 6608 0.8 0.84 3 0.04 7100 16.25 17.31 54 1.06 6608 2.75 2.78 3 0.03 6608 3.63 3.68 1 0.05 7900 0 0.27 215 0.27 6608 1.38 1.45 2 0.07 168 6.78 6.87 4 0.09 1112 0 0.16 44 0.16 7100 22.6 22.99 11 0.39 7100 23.89 23.96 26 0.07 161 10.05 10.26 68 0.21 7100 13.15 13.52 59 0.37 6608 2.89 2.92 1 0.03 7100 20.44 20.8 11 0.36 8763 0 0.07 48 0.07

42 were possible. Then, a simpler join was done in ArcMap to join the result of the first join with the geospatial data from the original VDOT data. It is believed that the second step would be unnecessary by taking advantage of geospatial data types, although that belief was untested in this work. By joining the HPMS data with VDOTâs road centerline data, an enhanced geospatially referenced layer was created that allowed use of GIS functionality. An example of the type of integrated data set made possible through this join is shown in Table 9.2. In Table 9.2 the columns labeled Section Length, AADT, Present Serviceability Rating (PSR), Number of Lanes, High Occupancy Vehicle (HOV), and Truck Route are collected from the HPMS data. NDD through VDOt hot Spots The previous sections have demonstrated the processing steps that would be used by researchers in quantifying elements of high crash-rate locations. A number of different approaches could then be used for locating segments that are likely to provide guidance in identifying crash surrogates for a specific crash type. Once these locations are identified, the objective Table 9.2. Sample of Road Sections with Additional Attributes Joined from HPMS Crash ID Route Number Begin Mile Point End Mile Point Section Length AADT PSR Number of Lanes HOV Truck Route 11 267 11.3 12.53 1.23 45,506 0 5 0 0 15 50 83.99 84.08 0.09 64,334 0 6 0 0 23 123 12.29 12.38 0.09 35,719 0 4 0 0 49 236 9.22 9.29 0.07 36,328 0 5 0 0 57 267 26.22 26.39 0.17 61,844 0 7 0 0 60 3 0.25 0.37 0.12 12,061 0 2 0 0 61 1 189.32 189.39 0.07 61,616 0 7 0 0 62 6608 2.11 2.19 0.08 17,154 3 4 0 0 63 7 59.59 59.72 0.13 68,462 0 6 0 0 64 1 192.61 192.69 0.08 19,674 0 3 0 0 65 7 59.34 59.48 0.14 68,462 0 6 0 0 68 267 16.74 17.4 0.66 104,628 0 8 0 0 70 267 20.32 22.83 2.51 133,391 0 8 0 0 71 123 9.15 9.95 0.8 31,403 0 4 0 0 72 29 244.27 244.34 0.07 23,333 0 4 0 0 74 123 12.29 12.38 0.09 35,719 0 4 0 0 75 123 14.1 14.22 0.12 35,719 0 4 0 0 76 1 188.98 189.04 0.06 61,616 0 6 0 0 78 267 7.09 8.09 1 37,718 0 5 0 0 81 7602 3.93 4.11 0.18 27,165 4.8 8 0 0 Joined VDOT Centerline Data and HPMS Data To investigate the roadway as an explanatory factor, the HPMS data can be used. These data include a number of attributes that could reveal contributing factors in crashes. As stated earlier, the HPMS data do not include geospatial information. However, FHWA has implemented the linear referencing sys- tem (LRS), which creates a unique identifier for each road seg- ment based on the county where the segment is located, the route number, and the milepost marker. There is some flexibil- ity in the system that allows each state to customize the num- bering system to accommodate the needs of that state. By using this unique LRS ID, the data contained in the HPMS database were joined with geospatial centerline data provided by VDOT. This join was accomplished by using SQL. In addition to their normal use with relational database tables, SQL queries can also be used with geospatial data sets. This allowed each segment in the VDOT geospatial data to be joined with its corresponding segment in the HPMS data on the basis of their shared identifying attributes (i.e., county, route number, and mile points). For this work, the joining required a two- step process. In the first step, an initial join was done in SQL Server Management Studio, where joins on multiple fields

43 Figure 9.1 shows the resulting layer mapped with the primary and Interstate routes. In Figure 9.1, the hot spots have been drawn thicker to make them visible. Initially, the polylines that represent the road network, and therefore hot spots, do not have a width. To give the hot spots width in the data, a GIS tool was used to generate buffers around the centerlines. Buffers of 200 ft on both sides of the centerlines were used around these poly- lines, creating polygons to represent the hot spots. The result of buffering the polylines can be seen in a close-up view of two hot spots in Figure 9.2. of this effort is to query NDD in which the drivers passed through the locations. A list of the 20 hot spots had previ- ously been identified by VDOT engineers. This list is based on injury and fatality rates and includes all crash types. The list of these locations was converted into a database table and imported into GIS. Again making use of the ability to join database tables with geospatial data, SQL queries were used to join the hot spots list with VDOT road centerline data to create a layer of polylines that corresponded to these 20 hot- spot road segments. These queries were similar to those used in the two-part process to join the VDOT and HPMS data. Figure 9.1. Top northern Virginia crash hot spots identified by VDOT. Figure 9.2. Buffers of 200 ft created around hot spots.

44 plots were created to describe the distributions of speeds found in the naturalistic data and maximum yaw values. In Figure 9.4, for each of the 20 hot spots, the speeds at which subject vehicles entered the hot spots are presented as a distribution. Outliers in the data are retained here. In the actual analyses, the outliers would be investigated further to determine if they were created, for example, by sensor errors or were true entry speed outliers. Cases in the tails of the distributions might be of interest as surrogates, particularly when speed appears to be a contributing factor in crashes for that hot spot. Similarly, one might expect in challenging locations that exit speed would generally be lower than entrance speed. To illustrate how this might be explored in the data, the differ- ence between segment exit speeds and entrance speeds was computed for each of the trips through the hot spots. This is provided in Figure 9.5. Yaw is also a potential base surrogate measure related to roadway departure crashes. The maximum and minimum yaw observed as 100-Car participants traveled through the 20 hot spots is presented in Figure 9.6. Sign of the values was retained when creating the box plots. In these data, a positive yaw value indicates left-hand rotation of the vehicle around the vertical axis. Therefore, in general, the top plot in Figure 9.6 is indicating left-hand rotation and the bottom plot is indicating right-hand rotation. Once segments of interest such as these are identified by researchers, the next step is to locate naturalistic data describ- ing the performance and behavior of drivers as they travel through the segments. To illustrate this step, a layer of bread- crumb trip points from 10 randomly selected trips by each vehicle in the 100-Car Study data set (1,000 trips total) was overlaid on the layer of hot spot polygons. This layer was cre- ated by using the ability to open a relational database table in ArcMap and have it plot the points defined by the tableâs latitude and longitude columns. The breadcrumb trip points that intersected with the polygons were used to identify trips where a vehicle passed through one of these 20 hot spots. A map of a selection of trip points through hot spots can be seen in Figure 9.3. The figure illustrates primary, secondary, and local streets from VDOT centerline data. The blue lines are GPS bread- crumb trails from 100-Car trips. In the locations shown in dark brown, the trip is passing through one of the hot spots. In addition to visualization of these epochs, a table was created that identifies all of the trips through the different hot spots, as well as all of the time series vehicle measures captured within the segment. This intermediate table can be further analyzed in a number of ways to identify con- tributing factors in crashes. The values can also be used to address video associated with the hot spot. To provide a small demonstration of what is found in these tables, box Figure 9.3. 100-Car trip points passing through hot spots.

45 Figure 9.4. Distribution of vehicle speeds observed in naturalistic data when entering hot spot. 0 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 19 20 Segment ID Segment Entry Speed Sp ee d (m ph ) Figure 9.5. Difference in speed observed between entry into the hot spot segment and exiting the hot spot segment. Negative values indicate speed was reduced. -50 -40 -30 -20 -10 0 10 20 30 40 50 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 19 20 Segment ID Segment Ending Speed Minus Starting Speed Sp ee d (m ph )

46 the 100-Car Study run-off-road crashes were plotted in a layer in GIS, with the layer created from the joined VDOT centerline data and HPMS data. The features in the two layers were then spatially joined. This process associated the crashes from the 100-Car Study with the road segments where they occurred. The resultant geospatial data from the previously joined VDOT crash data road segments where the crashes occurred were also used here. After these spatial joins were performed, the road attributes contained in the VDOT road data and HPMS database were then available for analysis of crashes in both the crashes in the VDOT data and 100-Car Study crashes. Summary This chapter has demonstrated the viability of exploring segment crash rates as well as naturalistic data associated with high crash-rate locations using geospatial techniques. As found with the Michigan data, VDOT crash statistics can be joined with VDOT roadway segment data to identify, for example, the number of crashes on a segment. These data can then be joined with driver and vehicle measures col- lected in NDD. Integration of the data in this manner pro- vides considerable power for identifying crash explanatory factors related to the driver, the environment, the vehicle, or the roadway. The GIS methods provide a powerful tool for locating epochs of interest within naturalistic data. As with many of As with the speed measures, outliers may indicate cases of interest, such as a high-amplitude yaw event, or may indicate sensor anomalies. Identify Road Characteristics for Run-Off-Road Crashes For the VDOT crash records, exploration of road character- istics was illustrated in Table 9.2. Factors such as number of lanes, AADT, or road type may indicate explanatory factors in crashes. Similar associations can be made with events col- lected within a driving study (Table 9.3). The GPS data from -20 0 20 40 60 80 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 19 20M ax y aw in s eg m en t (d eg /s) -80 -60 -40 -20 0 20 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 19 20 Segment ID M in y aw in s eg m en t (d eg /s) Figure 9.6. Distribution of maximum and minimum yaw observed in naturalistic data within the hot spots. Table 9.3. Sample of a Roadway Explanatory Factor Collected from 100-Car Run-Off-Road Crashes Number of Lanes Number of 100-Car Run-Off-Road Crashes 2 1 3 1 4 6 5 3 6 4 7 2 8 3

47 be running continuously or near continuously to generate the intermediate tables within a reasonable time frame. The programming languages native to the Esri suite should be explored for scalability. In the event that these tools are not able to support rapid processing, a possible alternative is to use an integration of the Mathworks Mapping toolbox, SQL- based geospatial data types, and a computational cluster that can run code developed in both of these applications. the tools for investigating these data, implementing the meth- ods discussed here requires some GIS-specific understanding. These tools will be new for many researchers. An introduc- tory workshop or clear documentation of procedures would be valuable. When small data sets were processed in this manner, man- ual processes worked well. At the scale of the SHRP 2 study, geospatial analyses will require automated processes that can

Next: Chapter 10 - Conclusions »

A Multivariate Analysis of Crash and Naturalistic Driving Data in Relation to Highway Factors (2013)

Chapter: Chapter 9 - Transferability to Other Data Sources

Welcome to OpenBook!

Get Email Updates