National Academies Press: OpenBook
« Previous: Chapter 6 - Measuring Travel Time Reliability
Page 56
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 56
Page 57
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 57
Page 58
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 58
Page 59
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 59
Page 60
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 60
Page 61
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 61
Page 62
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 62
Page 63
Suggested Citation:"Chapter 7 - Potential Problems and Issues in Data Reduction." National Academies of Sciences, Engineering, and Medicine. 2011. Feasibility of Using In-Vehicle Video Data to Explore How to Modify Driver Behavior That Causes Nonrecurring Congestion. Washington, DC: The National Academies Press. doi: 10.17226/14509.
×
Page 63

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

C H A P T E R 7 Potential Problems and Issues in Data ReductionBecause of the limitation of hardware or software design, data elements in candidate data sets may or may not be sufficiently accurate and verifiable for the analysis of driver behavior asso- ciated with crashes and near crashes and travel time reliability. The derivation of data elements from raw data sets may not meet the needs of the study purpose and may require modifica- tions. In some cases, the data element may not be accurate but can be easily transformed. An example of such a case is when an accelerometer box is installed backward in a vehicle. In this case, the data can be quickly converted through a simple mathemat- ical procedure without compromising data integrity. In other cases, the method to collect the data may be inaccurate for a par- ticular purpose (e.g., GPS-calculated vehicle speed compared with speed measured directly from the CAN bus). Similarly, it may be that some portions of the data are useful but others are not. Data need to be reviewed to determine the suitability for nonrecurrent congestion research purposes. Potential prob- lems, risks, and limitations for data collection and reduction are discussed in this section. Overall Data Collection A common problem associated with naturalistic studies is the proper identification of drivers. Normally, the data collection process tries to ensure the exclusive use of an equipped vehicle by the assigned driver whose demographic information has been recorded. It is not unusual that the equipped vehicle would be used by family members or friends. The different driving habits and behaviors of a second driver and the mismatched demographic information can bias the data. An elaborate scheme to correctly match driver information with the actual driver can improve the situation. Multiple pieces of equipment are on board the test vehicles. Different devices can be chosen for different research pur- poses. The DAS adopted in Project 2 and Project 5 is shown in Figure 7.1 (1). The basic arrangement of the DAS in Project 7 by VTTI is illustrated in Figure 7.2 (2). A similar setup was56used in the other three VTTI projects. These differences in equipment result in variations in types of data collected and the associated data storage and computation requirements. Customized data computation software and hardware are needed for individual studies. UMTRI developed a system called Debrief View, as shown in Figure 4.4, to conduct data reduction. VTTI developed DART software, which visualizes and reduces data, as shown in Figure 4.3. Even with similar types of equipment, different settings might apply according to the varied research purposes. For example, although all the candidate studies recorded video data at a predefined stable frequency (i.e., continuous), the frequency set by each study was not the same. Project 2 and Project 5 saved data at a relatively low frequency (as shown in Table 4.2) unless an event trigger was sent to the DAS to initial- ize a continuous video recording for 8 s. The fact that the purpose of these studies was to test the effectiveness of an ACA- RDCWS warrants this lower frequency. The disadvantage, as stated in the UMTRI reports, was that several alerts of very short duration that occurred below the 10-Hz recording rate may be omitted (1). On the contrary, most VTTI studies collected continuous video data at approximately 30 Hz. Details of dri- vers’ behavior are essential to the studies conducted by VTTI. A relatively higher video recording frequency will provide researchers with a better opportunity to closely observe drivers’ driving habits and distractions. A higher frequency generates data sets that are larger in size and brings more challenge to data reduction. When postprocessing and analyzing the same type of data from different sources, as was done for this study, spe- cial attention should be paid to the differences in data collection rates. Conversion or inference of data may be necessary. Another common problem is data dropping out; for exam- ple, GPS outages because of urban canyons. As shown in Fig- ure 7.3, when high buildings in downtown areas block out satellite signals, the resulting GPS data collected has gaps between the points. Postprocessing such GPS data to trace the real route traveled by the vehicle usually leads to an error, as

57Data Acquisition System Module Cell phone detect Comment PB Cellular antenna Steering wheel angle Inertial sensors Driver face camera Front view camera Audio microphone Figure 7.1. DAS used in Project 2 and Project 5.Figure 7.2. DAS used in Project 7.shown in the left part of Figure 7.4. By filling the gaps, the real routes can be accurately located, as shown in the right part of Figure 7.4. Another example of data dropping out is illustrated in Figure 7.5, which depicts a situation in which data (speed and range) are missing for a few milliseconds while the valid target is being tracked. To impute missing data, a linear interpolation may be performed. Figure 7.5 shows a speed profile with miss- ing values for a target (upper part) and the same profile after performing a linear interpolation (lower part of the figure).Invalid readings and null values are other commonly occur- ring obstacles. For example, in examining the rain gauge data, invalid (unrealistic) measurements were found, as shown in Table 7.1 (196.088 mm). A rain gauge measurements algorithm was developed to identify such unusual rainfall values and replace them with the average of the before-and-after values in the case of the first problem. Additionally, Table 7.2 shows another example of unrealistic data in which the GPS time reports a null value for an entry. Algorithms were developed to replace invalid or missing data, as demonstrated in the right column of Table 7.2. Details of challenges and potential problems regarding kine- matic data, video data, reduced data, and other data sources are discussed in the following sections.Kinematic Data Sensor measurements in each study to collect kinematic data differ from one to another. Project 7 and Project 8 used a lane tracker developed by VTTI (i.e., Road Scout) to detect the vehicle’s location within a lane. The maximum error was 6 in. for distance measuring and 1° for angular measuring. Simi- lar equipment was used in Project 5 with a different accuracy

58Figure 7.3. Urban canyon blockage of GPS signals. Figure 7.4. GPS gaps: (left) before links added; (right) after links added.level. It used a monochrome charge-coupled device (CCD) camera that observed painted lane markers or other nonpainted visual features and could observe lane markers forward of the vehicle to approximately 30 m. The radar systems used to detect surrounding objects and collect distance information for further derivation of data also vary from one project to another in numbers and config- urations. The variations in settings resulted in varied accuracylevels. In Project 2, radar units were operating at 77 GHz fre- quency to track up to 15 targets. The sensing range was 100 m. This range is reduced on a winding road because of the radar’s limited azimuth coverage. Project 5 included two forward- looking radar units configured at 20 Hz, two side-looking radar units configured at 50 Hz, and fields of view 120° wide. Radar systems used in VTTI studies were operating at 20 Hz. Pro- ject 6 had a radar range effective from 5 to 500 ft. Project 7 and Project 8 used radar systems with effective ranges from 5 to 600 ft. Radar units used in Project 11 increased the effective range to 700 ft. When reviewing and reducing radar data, one must consider the type of radar used, the rate of data collection, how “noisy” the data are, and the assumptions used to mathematically smooth the data. One typically derived variable was the TTC between the subject vehicle and a leading vehicle. In Project 2, enhanced TTC was computed incorporating the accelerations of both vehicles, as well as the range to the vehicle ahead and the speeds of the vehicles. In VTTI studies, the TTC was derived from the range measured by either a radar-based VORAD for- ward object detection unit or a TRW sensor. Acceleration and deceleration of related vehicles were taken into consideration in all studies except Project 11. Radar switching targets is another difficulty. When radar signal strength changes, the system might lose track of a tar- get and then reacquire it. When this happens, the target will be assigned a different ID number, which can cause confu- sion. The manner in which the distance between the target and the equipped vehicle is recorded and organized in the data set may also generate errors. The data collection system simultaneously tracks up to seven targets that change over time. As shown in Table 7.3, VTTI has developed algorithms that identify and track the primary target of interest. To illus- trate the importance of implementing such algorithms, con- sider the primary target of interest to be Target 231, as shown in Table 7.3. Before implementing the tracking algorithm

59-10 -5 0 5 15 20 25 30 Sp ee d (m /s ) -10 -5 0 5 15 20 25 30 Time (s) Sp ee d (m /s ) Figure 7.5. Speed profile: (top graph) with missing data; (bottom graph) after performing linear interpolation.Table 7.1. Rain Gauge Measurements Rain Gauge Measurements Rain Gauge Measurements (invalid values), mm (invalid values removed), mm 8.382 8.382 196.088 8.382 196.088 8.382 196.088 8.382 196.088 8.382 8.382 8.382 8.382 8.382Table 7.2. GPS Imported Data (Null Values) GPS Date Time (with null values) GPS Date Time (corrected) ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ Null ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.617’ ‘2007-06-22 06:30:39.903’ ‘2007-06-22 06:30:39.903’Table 7.3. Example Illustration of Radar Target Tracking Time VORAD_ID VORAD1_Range (Ft) Step (s) 1 2 3 1 2 3 −0.6 231 248 247 50.1 370.1 212.2 −0.5 231 248 247 49.5 363.4 205.8 −0.4 248 231 247 354.5 49.6 193.1 −0.3 247 231 248 186.7 49.6 344.5 −0.2 247 248 231 173.9 331.8 49.5 −0.1 247 231 248 165 49.3 321.8(i.e., using range values in the “VORAD1_Range_1” range calculations), erroneous variable computations would result, as shown in Figure 7.6. After applying the algorithm, correct target variables are identified. A critical postprocessing data issue that one must address is the linking of the in-vehicle data with other data, includingenvironmental, traffic, and weather data. Valid and accurate time and location variables should be available for researchers to complete the linkage. As described in earlier reports, most studies had GPS data recorded in their data sets. In Project 6, however, the local computer clocks without synchronization were used instead of GPS time, resulting in time errors. These errors deem synchronization infeasible. Video Data During the extensive naturalistic driving data collection period, it was not unusual to have technical difficulties in cameras, including disconnection of cables and malfunction of cameras. Special supervision is needed to ensure smoother data collection and reduction. Each study set up videos with varied views for its particu- lar research goal. VTTI studies usually have four cameras cap- turing the front view, side views (left and right), and driver’s

60Figure 7.6. Range with and without target tracking.face. Some projects added a fifth camera to capture the driver’s hand and foot movements. Figure 7.7 illustrates the camera views in Project 7 (2). In the UMTRI studies, only two cameras were used, providing a front view and a driver’s face view, because less emphasis was put on observing driver behavior. Video data that does not include all four views have limited usage in this study, given that it is not possible to identify causes and behavior before crash or near-crash events. In making decisions based on driver behavior, as is the case in this research effort, a prerequisite is satisfactory quality of video data. Incorrect brightness is a typical problem that pre- vents researchers from interpreting video data clearly. Figure 7.8 shows some video captures in Project 5 (3). Dur- ing some daytime driving, when the vehicle is heading directly into a setting sun, the camera images degraded because of sunBehind Vehicle Front of Vehicle Camera 2 Camera 1 Camera 3 Camera 4 Figure 7.7. Camera directions and approximate fields of view in Project 7.glare. When some data from outside sources are unavailable, such as the weather data and level of congestion, the ability to acquire them through video data reduction becomes vital. If the video data are blurred, it is impossible to obtain such data. In the new DAS protocol of VTTI, the problem is solved by using Autobrite, as shown in Figure 7.9 (4). As can be seen, the quality of video data on the right of the figure is significantly improved.Reduced Data One challenge faced by researchers is data reduction in which raw data can be organized in a more functional format. Each study listed previously has a data reduction dictionary into which raw data were coded by reductionists, but the coding schemes of each dictionary are not identical. In Project 2, the

61Figure 7.8. Simultaneous images from the cameras of an RDCWS vehicle heading into the sun.Figure 7.9. Prototype DAS camera under consideration.variable “Time of Day” was coded in 0 or 1 for Day or Night, respectively. The “Location of eyes at time of the alert” was coded 0 through 9, representing “Looking forward at forward scene” at one extreme, to the more distracted statuses of “Head down, looking at center stack console area” or “Cannot accu- rately evaluate eye location.” In contrast, data reduction at VTTI was more extensive. In a typical VTTI study (such as Proj- ect 7 and Project 8), “Date,” “Day of Week,” and “Time” were three independently coded variables to pinpoint the time of an event. “Light Condition” was a separate variable, in addition to time and day variables, coded from 01 to 05 to describe the light situation as “Daylight,” “Dark,” “Dark but lighted,” “Dawn,” or “Dusk.” VTTI studies coded driver actions and distrac- tions more elaborately. “Driver Potentially Distracting Driver Behavior” was a variable coded in 31 values describing situa- tions including “bite nails,” “remove/adjust jewelry,” and even “comb/brush/fix hair.” Besides behavior variables, some variables were designed to describe other statuses of drivers.For example, “Driver Actions/Factors/Behaviors Relating to Event” described drivers’ emotions, coded in 60 values to represent “angry,” “drowsy,” “drunk,” and others. When using data from multiple sources with different coding protocols, a proper unifying process is required to ensure the same stan- dards in data analysis. Reductionist capability is a dominant factor that affects the quality of reduced data. VTTI has a professional data reduc- tion team composed of professional researchers and graduate research assistants and a separate data reduction laboratory led by laboratory managers. All the reductionists have extensive experience in data reduction and analysis. Before data reduction officially starts, reductionists are trained using a protocol writ- ten by the laboratory manager and project researchers. The lab- oratory manager works closely with the reductionists to assess their comprehension of the data reduction dictionary. UMTRI also has a professional data reduction team with researchers and graduate students. Students are responsible for relatively easy variable coding, such as weather condition, presence of passen- ger, and type of road. The research staff is responsible for more difficult variables that require judgments, such as degree of dis- traction and behavior coding. Quality control in data reduction is critical in data post- processing and can be a decisive factor in the success of later analyses. A quality control procedure to support accurate and consistent coding was established at VTTI. For example, in Project 11 data reductionists performed 30 min of spot checks of their own or other reductionists’ work each week. Besides the spot checks, inter- and intra-rater reliability tests were conducted every 3 months. Reliability tests were developed for which the reductionist was required to make validity judg- ments for 20 events. Three of the 20 events were also com- pletely reduced; in other words, the reductionist recorded information for all reduction variables as opposed to sim- ply marking the severity of the event. These three tests were repeated on the next test to obtain a measure of intra-rater

62reliability. At the same time, using the expert reductionist’s evaluations of each epoch as a gold standard, the proportion of agreement between the expert and each rater was calcu- lated for each test. This inter-rater test between expert and regular data reductionists was completed on the initial reduc- tion for 6, 12, and 18 months of data reduction. The results indicated an intra-rater reliability of 99% for all three tests. The average inter-rater reliability score for this task was 92.1%. Discrepancies are mediated by a third, senior-level researcher (2). Similar quality control procedures were used by the UMTRI research team. Two researchers initially viewed and coded a small portion of the alerts indepen- dently. The coded data were compared to decide a percentage of agreement; each researcher then independently coded the remaining alerts. A third researcher examined their coding results and determined the degree of consistency in coding. The results showed a high level of agreement, which testi- fied to the efficiency and consistency of the data reduction dictionary. The researchers then jointly viewed and recoded all video to modify factors that had not been agreed on. Each of these meticulous steps guarantees that the data reduction is under control. Other Data Sources Besides the possible risks and problems that exist in vehicle data, the availability and quality of environmental data— specifically, weather data, traffic count, crash, and work zone data—are worthy of attention. Accurate weather data are available at ASOS stations. Only vehicle data that were col- lected at locations close enough (e.g., within 15 mi) to ASOS stations can be associated with the weather data observed there. Another source of weather data is the Road and Weather Information System (RWIS). RWIS is a combination of tech- nologies that uses historic and current climatological data to develop road and weather information. The information is then sent to road users and decision makers. RWIS usu- ally includes an environmental sensor system, a model to develop forecasts, and a dissemination platform to publish data. More than 10 DOTs in the United States have started or plan to start RWISs. When vehicles were far from ASOS stations or RWIS, the only source of weather information was the weather variable coded in the reduced data by data reductionists. Even at locations where weather data are available, risks that the data have errors in them still exist. In the example shown in Table 7.4, the variable Rain_Today Field, which records the cumulative rainfall since 12:00 a.m., is reset to zero. The controller times do not necessarily coincide with the local time, and thus resetting has to be done at 13:29,Table 7.4. Weather Station Input Data Row Minute Rain_ Number GPS of Day Today 57809 ‘2007-04-26 17:29:38.367’ 1439 8.382 57810 ‘2007-04-26 17:29:38.367’ 1439 8.382 57811 ‘2007-04-26 17:29:38.367’ 0 0 57812 ‘2007-04-26 17:29:38.367’ 0 0according to local time (shown as 17:29 as of Coordinated Universal Time [UTC]). To address this problem it was determined that the offset was a constant value that was loca- tion specific. An offset was allocated to each location in com- puting the precipitation rate to account for this error. For traffic count and traffic condition, crash, and work zone data, the quality and availability differ from state to state. As introduced in Chapter 6, some states have more continuous traffic count stations than others. For example, Virginia has maintained a traffic count database containing traffic data from more than 72,000 stations, among which 470 are continuous. West Virginia has only 60 continuous- count stations. Some states have more complete crash or work zone databases (e.g., New Jersey and Michigan) and others maintain only general crash records from police reports. When linking vehicle data to outside data sources, special attention should be paid to variations in data quality. In summary, data elements designed to accomplish the objectives of the original research study may not be suitable for a study of driver behavior that causes nonrecurring con- gestion. As discussed in this chapter, if certain modifica- tions can be feasibly executed, a candidate data set can be more valuable to serve the research goal of this study. Table 7.5 lists the possible modifications that can be made for each can- didate data set to render them more compatible with this research goal. As illustrated in the last row of Table 7.5, one more potential data set will be available in the near future. Being the most extensive naturalistic travel data collection effort, SHRP 2 Safety Project S07 will include high-quality video data and other in-vehicle data. With accurate location and time information, Project S07 data can be easily linked with other external environment data. Once integrated with other external data sets, the data set from Project S07 will be the most valuable candidate data set for studying non- recurring congestion and its relationship to driver behav- ior. Details of Project S07 are discussed in Chapter 8 in the section on recommendations.

63Table 7.5. Modification Needed for Each Data Set Feasibility () Modifications Needed Cost for Modifications ($$$) Project 2: ACAS FOT  • Additional external data, such as work zone and crashes $$$ • Manual filtering of invalid triggers Project 5: RDCWS FOT  • Additional external data, such as work zone and crashes $$$ • Manual filtering of invalid triggers Project 6: The 100-Car Study  • More efficient identification of driver $ Project 7: DDWS FOT  • More comprehensive external data $ Project 8: NTDS  • More comprehensive external data $ Project 11: NTNDS  • More efficient identification of driver • More comprehensive external data $$ SHRP 2 Project S07  $$$References 1. University of Michigan Transportation Research Institute. Automotive Collision Avoidance System Field Operational Test Report: Methodology and Results. Report DOT HS 809 900. NHTSA, 2005. 2. Hickman, J. S., R. R. Knipling, R. L. Olson, M. C. Fumero, M. Blanco, and R. J. Hanowski. Phase I—Preliminary Analysis of Data Collected in the Drowsy Driver Warning System Field Operational Test:Task 5, Preliminary Analysis of Drowsy Driver Warning System Field Operational Test Data. NHTSA, 2005. 3. University of Michigan Transportation Research Institute. Road Departure Crash Warning System Field Operational Test: Methodology and Results. NHTSA, 2006. 4. Biding, T., and G. Lind. Intelligent Speed Adaptation (ISA), Results of Large-Scale Trials in Borlänge, Lidköping, Lund and Umeå During the Period 1999–2002. Swedish National Road Administration, 2002.

Next: Chapter 8 - Conclusions and Recommendations for Future Data Collection Efforts »
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!