National Academies Press: OpenBook

Design of the In-Vehicle Driving Behavior and Crash Risk Study (2011)

Chapter: Chapter 5 - Data Management

« Previous: Chapter 4 - Quality Processes
Page 25
Suggested Citation:"Chapter 5 - Data Management." National Academies of Sciences, Engineering, and Medicine. 2011. Design of the In-Vehicle Driving Behavior and Crash Risk Study. Washington, DC: The National Academies Press. doi: 10.17226/14494.
×
Page 25
Page 26
Suggested Citation:"Chapter 5 - Data Management." National Academies of Sciences, Engineering, and Medicine. 2011. Design of the In-Vehicle Driving Behavior and Crash Risk Study. Washington, DC: The National Academies Press. doi: 10.17226/14494.
×
Page 26
Page 27
Suggested Citation:"Chapter 5 - Data Management." National Academies of Sciences, Engineering, and Medicine. 2011. Design of the In-Vehicle Driving Behavior and Crash Risk Study. Washington, DC: The National Academies Press. doi: 10.17226/14494.
×
Page 27

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

C H A P T E R 5 Data ManagementHuman Subjects Protection Federal regulations and good research practice call for pro- tection of persons who participate in research studies (“human subjects”). The Office for Human Research Protections (OHRP) in the U.S. Department of Health and Human Ser- vices (HHS) provides leadership in the protection of the rights, welfare, and well-being of subjects involved in research. OHRP does this by providing clarification and guidance, developing educational programs and materials, maintaining regulatory oversight, and providing advice on ethical and regulatory issues in biomedical and behavioral research. These protective policies are enforced at the local level by an organization’s institutional review board (IRB), an entity required by federal regulations. Of paramount concern in the design of the SHRP 2 NDS was the need to maintain close coordination with nearly all of the other project tasks, as virtually all tasks have some impact on the safety or privacy of the participants or their data. Key issues include protection of participant confidentiality, pro- tection of unconsented passengers (e.g., no continuous audio recording can be employed since it may capture the conver- sations of unconsented passengers), informed consent (and assent/parental consent for minor participants), protection of potentially identifying information (e.g., face video and geo- spatial identifying data), and the continued protection of par- ticipant confidentiality once the data are stored in a database for post hoc analyses. Institutional Review Boards and Certificate of Confidentiality Human subjects protection in the SHRP 2 NDS will be ensured by the review and approval of eight separate IRBs: those of the S06 contractor, the six S07 contractors, and the National Academy of Sciences (NAS). To prepare as well as possible for the human subjects protection review expected in the full-25scale field study, the protocols for the S05 pilot studies under- went the full board review process at Virginia Tech (VT). This allowed a wider range of reviewers to see the complete proto- col and raise human-participant concerns and issues prior to running the NDS. The combination of full board review at VT and full board review at NAS resulted in a very robust protocol that serves as a good starting point for the NDS. Additionally, a Certificate of Confidentiality (CC) was secured from the National Institutes of Health (NIH) for the S05 pilot study. A CC helps researchers protect the privacy of subjects in biomedical, behavioral, clinical, or other research projects against compulsory legal demands (e.g., court orders and subpoenas) that seek the names or other identifying char- acteristics of a research subject. The CC covers the collection of sensitive research information for a defined time period (the term of the project); however, personally identifiable informa- tion obtained about subjects enrolled while the CC is in effect is protected in perpetuity. A CC will also be requested for the full-scale NDS. On the basis of the approval of the S05 CC, a timely approval for the SHRP 2 NDS CC is anticipated. Upon NDS inception, one of the first sets of tasks relates to securing the IRB approvals from the S06 IRB, the NAS IRB, and each S07 IRB before proceeding with any and all aspects of the research involving human participants. Similarly, all S06 and S07 project personnel who will interact with participants or their data must certify that they have passed an approved IRB course or a course on protecting human participants. Each individual site contractor (except any that have chosen to formally rely on the VT IRB) will have to receive approval from its own IRB on the basis of the research protocol and participant-consent documents approved by the VT IRB. It is likely that modifications to the standard set of documents to meet local needs will be reviewed, but these are not expected to fundamentally change. IRB-related submission materials were shared with the various stakeholder IRB personnel early in the process, including during a meeting of these stakehold- ers in Washington, D.C., in the summer of 2009. IRB approval

26will be sought for the call center recruitment separately from that for the main study activities. Collection Process from Vehicle to Server The data collected during the NDS will include participant- identifying data and other sensitive personal information that must be protected. Consequently, every effort will be taken to protect all data from unauthorized access. The video data will be encrypted on the DAS and will remain encrypted until the data transfer process to the S06 server has been successfully completed. Once data quality processes have been applied, the video data will be reencrypted for storage. The data col- lection process is illustrated at a high level in Figure 5.1. The hard drive on the DAS has a single copy of the data. As those data are transferred to the S07 server, they are replicatedDAS Problem? Periodic Wireless Health Check No Yes S07 Uploads Data to Staging Server HD Nearing Capacity or Sched Removal?No Fixable by S07? S07 Schedules & Fixes DAS Yes No S07 Schedules HD or DAS Removal DAS Shipped to S07 Bad DAS Workflow Process Uploads Data to S06 Host Server S07 Refurbishes DAS Predominantly Oversight & Integration Contractor Functions Predominately Site Contractors Functions S07 Installs Good DAS DAS Collects Data S07 Gets Confirmation of Successful Data Upload Yes Good DAS & All Data Wireless Software Update Criterion Crash? Crash (or Crash Site) Investigation Considered w/ S06 Manager Yes No Yes S06 Processes Data for Access Workflow Process Confirms Successful Data Upload DAS (Re)Manufactured - Instantiated into Inventory Mgt No Figure 5.1. Data collection process.and stored on an array of HDs configured in a RAID (Redun- dant Array of Independent Disks). The RAID configurations on the S07 and S06 servers allow system administrators to completely restore a full copy of the data in the event of an HD failure on the server. Furthermore, once the data have been successfully transferred to the S06 site, an additional copy of the data will be stored for archival purposes. Data will be encrypted onboard the DAS by way of Advanced Encryption Standard (128- or 256-bit AES) symmetric encryp- tion. The key used for the encryption will be randomly gen- erated for each trip, and that key is encrypted using the Rivest, Shamir, Adleman (RSA) public key of a public/private key pair. The encrypted key file is stored with the same naming convention alongside the encrypted data and video files. This scenario provides the security of having a private key that will not be onboard the system, while allowing the data and video to be encrypted with the much faster symmetric encryption.

27Data Processing Data Upload to Data Storage Server When uploading the data from the S07 staging server to the S06 server, checksum analyses will be performed to ensure the integrity of the uploaded file. After uploading the data to the S06 database and decrypting, multiple quality checks will be done for each trip. These will be similar to but more sophisticated than those done during the routine health checks. Specifically, due to the amount of data and its contigu- ous nature (i.e., each trip file should begin at or near the same GPS coordinates where the previous trip file ended), more sophisticated comparisons between variables can be made to isolate potential problems within a trip. Analyses will also be conducted to compare trips to ensure that data are not being lost. For example, is the GPS location at the beginning of a trip the same location as the end of the previous trip? When a problem has been identified by the data-quality algorithms, any questionable data will be marked as such. At a minimum, the annotation will include a start sync, end sync, and metadata describing the test the variable failed. As resources are available, fixes may be applied to the data where such is possible (e.g., where it can be determined that a particular sensor was generating data that were off by a known constant value). S06 quality personnel will review the problems to try to determine the root cause (i.e., on the DAS or otherwise). The S06 contractor may need to work with the individual S07 contractor to isolate the problem and determine the best course of corrective action. Quality personnel will also conduct random spot checks by remotely requesting data snippets. Note that any additional processing required to get the data into a format to answer specific research questions is outside the scope of the current S06 project. However, it is believed that providing access to these data to researchers early on is paramount to the success of this project because it lets stake- holders at all levels begin to see results and the value of the project early on, without waiting for all data to be collected some 28 months later. Data Acquisition System Data Processing The purpose of the data processing is to get the highest- quality data as is feasible in the database and in a form usable by researchers. Several processes will be performed once the data arrive at the S06 server. Backups The data will be housed at the VT Data Center. RAID 6 pro- tection is also employed at this facility to guard against loss of data due to server failure. Archival backups of the data will be stored at a different physical location.Trip Summary Data Summary data will be extracted for each trip. These data include mileage driven, duration, start time, average speed, maximum speed, number of stationary epochs, maximum deceleration, driver identification (where possible), etc. This summary will help with quality processes, and it will provide a useful first look at the data for researchers. Data Standardization Data will be standardized into common formats. Because data are being collected on different vehicle makes, models, and countries of origin, it is possible that the DAS may collect data from a single sensor with different units, scales, axes, sample rates, or coding. It will be important to transform the data into standard units to assist researchers when they attempt to analyze the data across vehicles. This is also impor- tant if any algorithms are to be applied across the entire fleet consistently. The raw data will also be stored in the event that any researcher ever wants to review or analyze them. Also, some of the vehicle models (e.g., those equipped with LDW systems) may generate higher-resolution data (i.e., in time or the measured dimension) than others. Using steering information as an example, this higher-frequency data would be of great interest to researchers looking at steering reversals to investigate workload, drowsiness, steering entropy, or the performance of the onboard LDW. Expected Data Magnitude Data that are staged on S07 servers and then transferred to the central S06 data server could often exceed 100Mbps. This requires the use of high-performance research-caliber networks, such as Internet2 or National Lambda Rail. With almost 2,000 DAS units simultaneously collecting video and other sensor data for 2 years each, as well as a projected data life span of up to 30 years, the magnitude of data storage and criticality of adequate infrastructure cannot be overstated. Specifically, the NDS database will house information from several sources, including video and sensor trip data, crash data, health check (i.e., system) data, management informa- tion (i.e., inventory data and participant enrollment data), participant demographic and assessment data, vehicle inven- tory data, and analysis data (i.e., aggregated or reduced), as well as other external sources such as PARs, GIS, weather data, maps, and roadway information. In total, it is anticipated that 2 years of data collection will create a volume of approximately 1 petabyte of data comprising approximately 60–80 million miles and approximately 1.5 to 1.7 million hours of driving data (i.e., a data volume that would require approximately the storage capacity of one million 1-gigabyte USB flash drives). To characterize this volume of data in a different context, it would take approximately 70 million copies of the King James Version of the Bible to fill 1 petabyte of storage capacity.

Next: Chapter 6 - Data Provisioning »
Design of the In-Vehicle Driving Behavior and Crash Risk Study Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s Strategic Highway Research Program (SHRP 2) Report S2-S05-RR-1: Design of the In-Vehicle Driving Behavior and Crash Risk Study provides a summary of the key aspects of the planning effort supporting the SHRP 2 Naturalistic Driving Study (NDS). SHRP 2 Safety Project S05: Design of the In-Vehicle Driving Behavior and Crash Risk Study (Study Design) designed the SHRP 2 NDS, which will collect data—on the order of 1 petabyte (1,000 terabytes)—on “naturalistic,” or real-world, driving behavior over a two-year period beginning in fall 2010.

The resulting data is expected to provide a wealth of information regarding driving behavior, lane departures, and intersection activities, which is anticipated to be of interest to transportation safety researchers and others for at least 20 years.

An e-book version of this report is available for purchase at Google, iTunes, and Amazon.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!