Click for next page ( 28

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 27
27 Data Processing Trip Summary Data Data Upload to Data Storage Server Summary data will be extracted for each trip. These data include mileage driven, duration, start time, average speed, maximum When uploading the data from the S07 staging server to the speed, number of stationary epochs, maximum deceleration, S06 server, checksum analyses will be performed to ensure driver identification (where possible), etc. This summary will the integrity of the uploaded file. After uploading the data help with quality processes, and it will provide a useful first to the S06 database and decrypting, multiple quality checks look at the data for researchers. will be done for each trip. These will be similar to but more sophisticated than those done during the routine health Data Standardization checks. Specifically, due to the amount of data and its contigu- ous nature (i.e., each trip file should begin at or near the same Data will be standardized into common formats. Because GPS coordinates where the previous trip file ended), more data are being collected on different vehicle makes, models, sophisticated comparisons between variables can be made and countries of origin, it is possible that the DAS may collect to isolate potential problems within a trip. Analyses will also data from a single sensor with different units, scales, axes, be conducted to compare trips to ensure that data are not sample rates, or coding. It will be important to transform being lost. For example, is the GPS location at the beginning the data into standard units to assist researchers when they of a trip the same location as the end of the previous trip? attempt to analyze the data across vehicles. This is also impor- When a problem has been identified by the data-quality tant if any algorithms are to be applied across the entire fleet algorithms, any questionable data will be marked as such. At consistently. The raw data will also be stored in the event a minimum, the annotation will include a start sync, end that any researcher ever wants to review or analyze them. Also, some of the vehicle models (e.g., those equipped with sync, and metadata describing the test the variable failed. As LDW systems) may generate higher-resolution data (i.e., in resources are available, fixes may be applied to the data time or the measured dimension) than others. Using steering where such is possible (e.g., where it can be determined that information as an example, this higher-frequency data would a particular sensor was generating data that were off by a be of great interest to researchers looking at steering reversals known constant value). S06 quality personnel will review to investigate workload, drowsiness, steering entropy, or the the problems to try to determine the root cause (i.e., on the performance of the onboard LDW. DAS or otherwise). The S06 contractor may need to work with the individual S07 contractor to isolate the problem and determine the best course of corrective action. Quality Expected Data Magnitude personnel will also conduct random spot checks by remotely Data that are staged on S07 servers and then transferred to requesting data snippets. the central S06 data server could often exceed 100Mbps. Note that any additional processing required to get the data This requires the use of high-performance research-caliber into a format to answer specific research questions is outside networks, such as Internet2 or National Lambda Rail. With the scope of the current S06 project. However, it is believed almost 2,000 DAS units simultaneously collecting video and that providing access to these data to researchers early on is other sensor data for 2 years each, as well as a projected data paramount to the success of this project because it lets stake- life span of up to 30 years, the magnitude of data storage and holders at all levels begin to see results and the value of the criticality of adequate infrastructure cannot be overstated. project early on, without waiting for all data to be collected Specifically, the NDS database will house information from some 28 months later. several sources, including video and sensor trip data, crash data, health check (i.e., system) data, management informa- tion (i.e., inventory data and participant enrollment data), Data Acquisition System Data Processing participant demographic and assessment data, vehicle inven- The purpose of the data processing is to get the highest- tory data, and analysis data (i.e., aggregated or reduced), as quality data as is feasible in the database and in a form usable well as other external sources such as PARs, GIS, weather data, by researchers. Several processes will be performed once the maps, and roadway information. In total, it is anticipated that data arrive at the S06 server. 2 years of data collection will create a volume of approximately 1 petabyte of data comprising approximately 6080 million Backups miles and approximately 1.5 to 1.7 million hours of driving data (i.e., a data volume that would require approximately the The data will be housed at the VT Data Center. RAID 6 pro- storage capacity of one million 1-gigabyte USB flash drives). tection is also employed at this facility to guard against loss To characterize this volume of data in a different context, it of data due to server failure. Archival backups of the data would take approximately 70 million copies of the King James will be stored at a different physical location. Version of the Bible to fill 1 petabyte of storage capacity.