National Academies Press: OpenBook

Leveraging Big Data to Improve Traffic Incident Management (2019)

Chapter: Appendix A - Data Source Assessment Tables

« Previous: References
Page 132
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 132
Page 133
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 133
Page 134
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 134
Page 135
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 135
Page 136
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 136
Page 137
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 137
Page 138
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 138
Page 139
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 139
Page 140
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 140
Page 141
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 141
Page 142
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 142
Page 143
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 143
Page 144
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 144
Page 145
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 145
Page 146
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 146
Page 147
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 147
Page 148
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 148
Page 149
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 149
Page 150
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 150
Page 151
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 151
Page 152
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 152
Page 153
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 153
Page 154
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 154
Page 155
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 155
Page 156
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 156
Page 157
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 157
Page 158
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 158
Page 159
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 159
Page 160
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 160
Page 161
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 161
Page 162
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 162
Page 163
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 163
Page 164
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 164
Page 165
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 165
Page 166
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 166
Page 167
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 167
Page 168
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 168
Page 169
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 169
Page 170
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 170
Page 171
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 171
Page 172
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 172
Page 173
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 173
Page 174
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 174
Page 175
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 175
Page 176
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 176
Page 177
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 177
Page 178
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 178
Page 179
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 179
Page 180
Suggested Citation:"Appendix A - Data Source Assessment Tables." National Academies of Sciences, Engineering, and Medicine. 2019. Leveraging Big Data to Improve Traffic Incident Management. Washington, DC: The National Academies Press. doi: 10.17226/25604.
×
Page 180

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

132 A P P E N D I X A Data Source Assessment Tables Appendix A presents the detailed data assessment tables for 31 data sources. The criteria used to assess each data source are shown and described in Table 5-1 in Chapter 5 of this report. The data source assessments were qualitative, driven by the assessment criteria, and based on the information that was readily available for each source. For some of the data sources, interviews with data owners provided more detailed and specific information about the sources, allowing for a more complete understanding of the data and limitations. The data assessment is by no means exhaustive in terms of data sources or the information associated with each source. Some tables are more detailed than others, depending on the information available and/or whether the data is proprietary or business sensitive. • A.1 STATE TRAFFIC RECORDS DATA SOURCES o Crash data (Table A-1) o Vehicle data (Table A-2) o Driver data (Table A-3) o Roadway data (Table A-4) o Citation and adjudication data (Table A-5) o Injury surveillance data (Table A-6) • A.2 TRANSPORTATION DATA SOURCES o Traffic sensor data (Table A-7) o Traffic video data (Table A-8) o Freeway/safety service patrol and incident response program data (Table A-9) o 511 system data (Table A-10) o Road weather data (Table A-11) o Toll data (Table A-12) • A.3 PUBLIC SAFETY DATA SOURCES o Law enforcement, fire and rescue, and EMS CAD system data (Table A-13) o Emergency communications system (ECC)/911 call center/public safety answering point (PSAP) data (Table A-14) o Video data (Table A-15) o Towing and recovery data (Table A-16) • A.4 CROWDSOURCED/SOCIAL MEDIA DATA SOURCES o Waze data (Table A-17) o Twitter data (Table A-18)

Data Source Assessment Tables 133 • A.6 AGGREGATED DATASETS o RITIS data assessment (Table A-23) o NPMRDS (Table A-24) o Meteorological Assimilation Data Ingest System (MADIS) and MADIS Integrated Mesonet (Table A-25) o Third-party web service weather data (Table A-26) o NFIRS data (Table A-27) o NEMSIS data (Table A-28) o MCMIS data (Table A-29) o HERE data (Table A-30) o INRIX data (Table A-31) • A.5 ADVANCED VEHICLE SYSTEMS DATA SOURCES o Automated vehicle location (AVL) system data (Table A-19) o Event data recorder (EDR) data (Table A-20) o Vehicle telematics data (Table A-21) o Automated and connected vehicle, traveler, and infrastructure data (Table A-22)

134 Leveraging Big Data to Improve Traffic Incident Management A.1 State Traffic Records Data Sources Table A-1. Crash data. Assessment Criteria Assessment Description of Data Crash data includes detailed information about every reportable motor vehicle crash in a state, documents the characteristics of crashes, and provides the who, what, when, where, how, and why about each incident.1 Data elements include crash time, location, injury status, hazardous materials, motor carrier identification, roadway surface condition, total lanes in roadway, weather conditions, and other crash- specific data elements. Who Collects, Maintains, and Owns the Data Local, regional, and state law enforcement agencies collect the data via crash reports (either manually or electronically). Maintenance and ownership of the crash data varies among jurisdictions. Crash data is commonly aggregated at the state level. How the Data Are Collected Mostly electronically. When collected manually, paper reports are later keyed into electronic form. Data from multiple collection sources (paper and/or electronic) is then merged into a single database. Data Structure Structured and semi-structured. Each state has its own reporting system and storage system. The Model Minimum Uniform Crash Criteria (MMUCC) guideline is a minimum, standardized dataset for describing motor vehicle crashes and the vehicles, persons, and environment involved.2 The MMUCC contains 110 data elements, including 77 data elements to be collected at the scene; 10 data elements to be derived from the collected data; and 23 data elements to be obtained after linkage to driver history, injury, and roadway inventory data. MMUCC data is often exported in XML format. Data Size, Storage, and Management Gigabytes. Data is typically stored in relational databases maintained by local or statewide agencies. The database is kept in-house, archived in flat files, historical data is kept for several years (specific duration varies across agencies). Some crash data is aggregated at a national level like the Fatality Analysis Reporting System (FARS), which is maintained by the NHTSA to track all crashes involving a fatality. Data Accessibility Varies by agency. The closer the database schema is to the MMUCC, the easier the data can be understood and analyzed. Some agencies provide redacted public facing web-based information portals to query the data, while most states offer redacted large datasets that can be electronically downloaded. Data Sensitivity Personally identifiable information (PII) present in raw data; typically, redacted data is available for analysis. Data Cost Free, but some minor cost may be incurred to maintain data-sharing infrastructure. Data Openness Limited openness. Only redacted data is public. Access to non-redacted data needs to be granted by agency. Data Challenges Because the MMUCC is voluntary, states often use differing formats and names for data elements and attributes, or they may combine (or split) MMUCC elements and attributes.2 As a result, it can be very difficult to compare, merge, or share crash data among states, between state and federal datasets, and—in some cases—even between different agencies within a state. Although many agencies utilize electronic crash-reporting systems, which result in more complete and exploitable data, some agencies still use paper crash reports, which results in data that is less precise (vague time or location) or of lesser quality (e.g., missing fields, wrong categories). The latter can delay the upload of crash reports into a local or state database, as state or local personnel perform additional inquiries to obtain more precise or correct data. 1 Traffic Records Program Assessment Advisory, NHTSA, U.S. Department of Transportation. Online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811644. 2 Model Minimum Uniform Crash Criteria (MMUCC). Online: https://www.transportation.gov/government/ traffic-records/model-minimum-uniform-crash-criteria-mmucc-0.

Data Source Assessment Tables 135 Table A-2 Vehicle data. Assessment Criteria Assessment Description of Data An inventory of data that enables the titling and registration of each vehicle under a state’s jurisdiction to ensure that a descriptive record is maintained and made accessible for each vehicle and vehicle owner operating on public roadways. Vehicle information includes identification and ownership data for vehicles registered in the state, and out-of-state vehicles involved in crashes within the state’s boundaries. Although data elements vary by jurisdiction and in element definitions, data elements generally include issuing agency; plate type; vehicle year, body style, weight, and identification number; and name of vehicle owner.1 Who Collects, Maintains, and Owns the Data State-level government agency that administers vehicle registration and driver licensing (e.g., Department/Division/Office/Bureau of Motor Vehicles). The traditional department of motor vehicle (DMV) functions are handled by various agencies in different states (e.g., department of transportation, department of public safety, department of revenue, department of finance and administration, secretary of state, department of justice). How the Data Are Collected Electronically keyed at time of registration, automated license plate reader technology (ALPR), barcode/reader technology. Data Structure Structured. Data Size, Storage, and Management Gigabytes to terabytes. Data is stored in-house in relational database located in state agencies. Data is archived and maintained for multiple years (specific number of years varies from state to state). Data Accessibility Web services via criminal justice information networks and less-restrictive systems managed by the state licensing authority (limited to that state). The American Association of Motor Vehicle Administrators (AAMVA) maintains a pointer index system for commercial driver’s license information called the National Motor Vehicle Title Information System (NMVTIS).3 The NMVTIS also assists states and law enforcement in deterring and preventing title fraud and other crimes. Data Sensitivity Contains PII. The Driver's Privacy Protection Act (DPPA) of 1994 is a federal statute governing the privacy and disclosure of personal information gathered by state DMVs.2 Data Costs Free. Data Openness Data is not fully open due to personal information. Data can be shared, but usually is the basis of one inquiry/one record at a time. Personal information is protected by the DPPA. Data Challenges May not be accessible from the DMV due to PII and other restrictions. Disparities that sometimes make it difficult for officials in one jurisdiction to interpret data elements appearing on the vehicle registration document of another jurisdiction.4 1Traffic Records Program Assessment Advisory, National Highway Traffic Safety Administration, U.S. Department of Transportation. Online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811644. 2 Drivers Privacy Protection Act (18 U.S.C. §2721 et. Seq.), Prohibition on Release and Use of Certain Personal Information from State Motor Vehicle Records. Online: http://www.accessreports.com/ statutes/DPPA1.htm. 3 American Association of Motor Vehicles Administrators, National Motor Vehicle Title Information System. Online: http://www.aamva.org/nmvtis/ (accessed March 2017). 4 Motor Vehicle Registration Document & Insurance Identification Best Practices Guide for Paper & Electronic Credentials, American Association of Motor Vehicle Administrators (August 2013). Online: https://www.aamva.org/WorkArea/DownloadAsset.aspx?id=4437.

136 Leveraging Big Data to Improve Traffic Incident Management Table A-3. Driver data. Assessment Criteria Assessment Description of Data Maintains driver identity, driving history, and license information for all records in the system. Contains information on each licensed driver, including name, birth date, license number, issuing state, license type, and historical driving record information (issuance, suspension, revocation, citations, crashes). The driver data system ensures that each person licensed to drive has one identity, one license to drive, and one record.1 Who Collects, Maintains, and Owns the Data State-level government agency that administers vehicle registration and driver licensing (e.g., Department/Division/Office/Bureau of Motor Vehicles). The traditional DMV functions are handled by various agencies in different states (e.g., department of transportation, department of public safety, department of revenue, department of finance and administration, secretary of state, department of justice). How the Data Are Collected Electronically keyed, magnetic stripe, and barcode readers are three means of data collection. Typically, data also is reviewed physically for verification and updated through law enforcement or other means. Data Structure Structured. Data Size, Storage, and Management Gigabytes to terabytes. The data is stored in-house in relational databases located within the state agencies. Data is archived and maintained for multiple years (specific number of years varies from state to state). Data Accessibility Each state has its own database. The information can be accessed via web services, criminal justice information networks, and less-restrictive systems managed by the state licensing authority (limited to that state) via FTP download. For example, Florida has a system called DAVID (Driver and Vehicle Information Database) that allows officers and courts to see driving records, all digital photos on file for drivers, and links to vehicles owned/registered.2 States also share information with the American Association of Motor Vehicle Administrators (AAMVA). The AAMVA develops and maintains many information systems that facilitate the electronic exchange of driver, vehicle, and identity information between organizations (e.g., driver records, CDL skills testing, vehicle title, registration).3 For example, AAMVA maintains the Commercial Driver’s License Information System (CDLIS), a nationwide computer system that enables state driver licensing agencies (SDLAs) to ensure that each commercial driver has only one driver’s license and one complete driver record. Release of information is protected by the Drivers Privacy Protection Act (DPPA).4 Data Sensitivity Contains PII and, in some cases, legal privacy restrictions. Data Costs Free. Data Openness Limited openness, as the data contains PII and access needs to be requested. Data Challenges May not be accessible from the DMV due to PII and other restrictions like state laws protecting driver information. 1Traffic Records Program Assessment Advisory, NHTSA, U.S. Department of Transportation. Online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811644. 2 DAVID, Florida Department of Highway Safety and Motor Vehicles. Online: http://www.flhsmv.gov/courts/david/ (accessed March 2017). 3 American Association of Motor Vehicles Administrators, Application Services. Online: http://www.aamva.org/Application-Services/ (accessed March 2017). 4 18 U.S.C. 2721, U.S. Government Publishing Office. Online: https://www.gpo.gov/fdsys/granule/USCODE-2011- title18/USCODE-2011-title18-partI-chap123-sec2721/content-detail.html (accessed March 2017).

Data Source Assessment Tables 137 Table A-4. Roadway data. Assessment Criteria Assessment Description of Data Roadway datasets contain extensive information about roadway segments including roadway characteristics such as physical curvature, lane types and widths, pavement types, connected access roads, roadside descriptors, and interchange and ramp descriptors. Asset management datasets contains data relevant to the various equipment and facilities supporting roadways such as traffic signals, traffic signs, barriers, drainage, power stations, communications cables, etc. Roadway inventory data and asset inventory datasets are typically maintained in multiple separate databases. More advanced data management practices maintain these data in an integrated geospatial information systems (GIS) platform to allow assets and roadways to be easily located and mapped. The data itself ranges from tables to computer-aided design (CAD) drawings to geospatial vector data. Who Collects, Maintains, and Owns the Data State transportation agencies, county public works departments. How the Data Are Collected Manually, aerial images, linear referencing, GIS, cameras on vans, and backpacks.3 Data Structure Structured and semi-structured. The Model Inventory of Roadway Elements (MIRE) is a recommended listing of roadway inventory and traffic elements critical to safety management.1,2 MIRE is intended as a guideline to help transportation agencies improve their roadway and traffic data inventories. It provides a basis for a standard of what can be considered a good/robust data inventory and helps agencies move toward the use of performance measures to assess data quality. The MIRE listing contains 202 data elements divided among three broad categories: (1) roadway segments, (2) roadway alignment, and (3) roadway junctions. The composition of MIRE was purposefully designed to link with supplemental databases, including: roadside fixed objects, signs, speed, automated enforcement devices, land use elements related to safety, bridge descriptors, and railroad grade-crossing descriptors. Data Size, Storage, and Management The size of the datasets varies among agencies (gigabytes to terabytes) and is directly correlated to the miles of the road network being inventoried, the level of detail being recorded for each roadway, the number of assets, and the level of details recorded for each asset. Dataset size further increases when maintaining detailed digitized CAD drawings and geospatial vector data files and linking these files to each roadway or asset record relevant data record. Storage of the datasets varies widely from agency to agency. Some agencies store roadway/asset data as spreadsheet files, some store scanned paper drawings or CAD files, and others store their data into fully integrated geospatial databases. Dataset management varies as well; typically, it is done by maintaining one or more file archives or databases in-house that contain multiple years of roadway and assets data. The file archive or database is updated periodically with new data to reflect the asset’s maintenance or improvement history. Depending on the agency, the archives can be managed and centralized into a single location such as a GIS database or managed independently within each agency district. Data Accessibility Accessibility varies from agency to agency and can range from files and images mailed on disk or portable media to dedicated public web portals with searching and downloading capabilities. Data Sensitivity Sensitivity is dependent on the asset. For example, some asset data such as bridge CAD drawings and material information or models and versions of traffic signal management software, could be exploited by malicious individuals or groups. Data Costs Free. (continued on next page)

138 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Openness Limited data openness to full openness, as some agencies do not publish this type of data to the public, whereas others maintain portals where the data can be easily searched, downloaded, and sometimes even visualized. Data Challenges Data quality, delivery, timeliness, and accuracy vary widely across agencies. Many agencies may not have a web portal or FTP site, requiring that large datasets be delivered via disc or mail. Some agencies only use basic file-sharing systems to store their asset data, and these systems lack the data management structure to easily find, retrieve, and format requested asset data quickly. It is not uncommon to have to wait several days or weeks following a request to receive requested asset data. Asset data can also be distributed across agency districts and not routinely managed, updated, and maintained in a consistent fashion. Depending on budget and staff availability, each district may manage its asset data differently. The result may be the storage of asset data across various internal legacy systems with diverse structures and formats.4 This could make it very difficult to access and mine the asset data. The accuracy of the asset data also can be affected, as agencies or agency district resources may not have the resources to update assets records as soon as an asset is upgraded or replaced, resulting in stale asset data several weeks or months after asset work has been performed. 1 FHWA Roadway Safety Data Program. Online: https://safety.fhwa.dot.gov/rsdp/mire.aspx. 2 Model Inventory of Roadway Elements VERSION 1.0, FHWA, U.S. Department of Transportation, October 2010. Online: https://safety.fhwa.dot.gov/tools/data_tools/mirereport/mirereport.pdf. 3 Khattak, A. J., J. E. Hummer, and H. A. Karimi. “New and Existing Roadway Inventory Data Acquisition Methods.” Journal of Transportation and Statistics, Vol 3, No 3, Paper 2. Bureau of Transportation Statistics, U.S. Department of Transportation, Washington, D.C. Online: https://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/ publications/journal_of_transportation_and_statistics/volume_03_number_03/paper_02/index.html. 4 Asset Management Overview, FHWA, U.S. Department of Transportation (October 2010, December 2007). Online: https://www.fhwa.dot.gov/asset/if08008/assetmgmt_overview.pdf.

Data Source Assessment Tables 139 Table A-5. Citation and adjudication data. Assessment Criteria Assessment Description of Data Citation and adjudication databases maintain information about citations, arrests, and dispositions. The process is highly localized in data management from delivery of citation through adjudication. After the completion of local adjudication, the data will be delivered (in most states) to a state entity for driver’s license reporting functions. Citation databases may contain information relevant to TIM, including occurrences of law enforcement activity along the roadside and potentially duration and type of activity. Who Collects, Maintains, and Owns the Data Law enforcement and parking enforcement are the primary point of data collection. Courts having jurisdiction coordinate with the state agency responsible for driver data. How the Data Are Collected Mostly electronic at point of collection. Paper documents are converted to electronic records at the court level. Data Structure Semi-structured and structured. Data Size, Storage, and Management Gigabytes to terabytes. State databases maintained in-house for multiple years. Data Accessibility FTP. Data Sensitivity PII. Data Costs Free. Data Openness Limited openness due to PII. Data Challenges May not be accessible from the DMV due to PII and other restrictions. 1Traffic Records Program Assessment Advisory, NHTSA, U.S. Department of Transportation. Online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811644.

140 Leveraging Big Data to Improve Traffic Incident Management Table A-6. Injury surveillance data. Assessment Criteria Assessment Description of Data These surveillance systems typically incorporate pre-hospital emergency medical services (EMS), trauma registry, emergency department, hospital discharge, rehabilitation databases, payer-related databases, and mortality data (e.g., death certificates, autopsies, and coroner and medical examiner reports). The data from these various systems are used to track injury type, causation, severity, cost, and outcome.1 Who Collects, Maintains, and Owns the Data EMS, hospitals (emergency departments, discharge, trauma registry), state vital records, medical examiner/coroner. How the Data Are Collected Given the numerous files and datasets that make up the injury surveillance system, a correspondingly large number of data standards and applicable guidelines exist for data collection.1 For example, EMS providers have been rapidly transitioning their paper records into electronic patient care reports (EPCRs) that are completed using laptop computers or tablets.2 Data Structure Semi-structured and structured. The National Emergency Medical Services Information System (NEMSIS), developed through a collaborative effort with the EMS industry and originating from a memorandum of agreement among 52 states and territories, assigns specific definitions to 481 data elements identified as desirable to be collected on a national level for EMS. NEMSIS was developed to help states collect more standardized elements and eventually submit the data to a national EMS database. Administrative data files for emergency department visits and inpatient hospitalizations are based on the uniform billing code issued by the U.S. Department of Health and Human Services.1 The National Trauma Data Standard (NTDS), developed by the American College of Surgeons Committee on Trauma, provides data standards for trauma registry databases. Built on an XML schema shared with NEMSIS, the NTDS enables improved integration of EMS and trauma data.1 The U.S. Standard Certificates of Birth and Death and the Report of Fetal Death are the principal means of promoting uniformity in the data collected by the states. These documents are reviewed and revised approximately every 10 years through a process that includes broad input from data providers and users. The Centers for Disease Control and Preventions’ National Center for Health Statistics provides guidance for cause of death coding based on ICD-10 standards.1 The AIS and the ISS are measures of injury severity. The AIS categorizes injury severity by body region and—when combined with crash data—can be used to describe injury patterns by crash configuration. The ISS provides a more comprehensive measure of injury severity when a patient has injuries to multiple body regions. The Glasgow Coma Scale is used to assess the neurologic state of a patient.1

Data Source Assessment Tables 141 Assessment Criteria Assessment Data Size, Storage, and Management Component databases—gigabytes. EMS providers, hospitals, state department of health, state databases, NEMSIS. In-house systems, maintained for multiple years. The EMS applications of today can sync up with monitoring equipment and computer-aided dispatch (CAD) systems to automatically populate data related to each assigned call. Providers can track and input the progress of a patient’s vitals, automatically record medication dosage and times, capture and electronically save electrocardiograms (EKGs), and transmit that information to the awaiting hospital. One app can sometimes be utilized by the EMS user to manage all the information from a shift, from populating dispatch and patient information, to gathering and documenting current findings, to the transmission of a patient’s records to a health-care facility.2 Data Accessibility Ideally, data is made available for local and state agency use. FTP, data dump. Data Sensitivity Contains PII. In addition to any applicable state statutes, state health-care data custodians must comply with the pertinent aspects of the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Data Costs Potential cost; available through data-sharing agreements at no cost. Data Openness Limited openness due to PII. Data Challenges May not be accessible due to PII and other restrictions. 1Traffic Records Program Assessment Advisory, NHTSA, U.S. Department of Transportation. Online: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/811644. 2 Busa, M. Information-Sharing Applications & Technology for the Fire Service (July 30, 2013). Online: http://www.firerescuemagazine.com/articles/print/volume-8/issue-9/technology/information-sharing- applications-technology-for-the-fire-service.html.

142 Leveraging Big Data to Improve Traffic Incident Management A.2 TRANSPORTATION DATA SOURCES Table A-7. Traffic sensor data. Assessment Criteria Assessment Description of Data Data from sensors, including inductive loop detectors, magnetic sensors and detectors, video image processors, microwave radar sensors, laser radars, passive infrared and passive acoustic array sensors, and ultrasonic sensors, plus combinations of sensor technologies. Certain detectors give direct information concerning vehicle passage and presence, while other traffic flow parameters such as density and speed are inferred from algorithms that interpret or analyze the measured data. Data elements collected include date, time, sensor ID, roadway ID, direction, annual average daily traffic (AADT), Truck AADT, volumes (vehicles/minute), speed, occupancy, vehicle classification. Who Collects, Maintains, and Owns the Data State, MPO, county, and city transportation agencies. How the Data Are Collected Automatically collected through the technology through sampling interval (e.g., 20 seconds, 60 seconds), or manually either by lane or for roadway sections. Data Structure Structured.1,2 Data Size, Storage, and Management Gigabytes to terabytes, depending on the size of the area being monitored (e.g., regional, statewide). Typically stored as flat files or in relational databases. Data are typically aggregated at 1-, 5-, or 15-minute intervals for storage, analysis, and visualization. Data Accessibility Variable by entity ranging from aggregate data stored in CSV files by location on the premises to statewide web-accessible databases providing more granular data. Traffic sensor data is typically archived for several years, and many states have requirements on the history for storage. Examples of states and organizations that have developed data storage services includes, the Texas DOT, which has begun storing detailed traffic sensor data through its STARS II system; Caltrans, which uses its Performance Measurement System (PeMS) website to store more than 10 years of traffic sensor data; and the University of Maryland’s CATT Lab, which consolidates traffic data into the Regional Integrated Transportation Information System (RITIS). The first two provide both visualizations and structured datasets to users, while RITIS focuses mostly on advanced visualizations of traffic sensor data for multiple states. Although visualizations and aggregated datasets are very valuable to human consumption, disaggregate, high-resolution data is essential to Big Data analysis. Raw traffic sensor data is often unavailable because the large volume of data can be costly to store, resulting in aggregation and storage of data from seconds of resolution to 5- or 15-minute averages, and even hourly or lower resolution for some organizations. Data Sensitivity None Data Costs Free/public. Data is most often offered at no cost online by state or regional organizations and is characterized as public domain data. A minor fee may be charged when requesting the data on a paper or disk format. Data Openness Limited data openness – Traffic sensor data is often shared with the public through high-level aggregation or visualizations such as maps, but rarely as raw data. More granular or raw traffic sensor data is typically not shared openly to the public, and its accessibility for federal, state, local, private, and public individuals is usually only granted upon request and after review of the intended use of the data.

Data Source Assessment Tables 143 Assessment Criteria Assessment Data Challenges Institutional. Tied to the ability of the institution to be able to provide and manage access to raw traffic sensor data as well as its ability to ensure high traffic sensor data quality by monitoring sensor drift, performing recalibration on a regular basis, and maintaining precise sensor location information. 1 Traffic Monitoring Guide (TMG), Federal Highway Administration, U.S. Department of Transportation (October 2016). Online: https://www.fhwa.dot.gov/policyinformation/tmguide/tmg_fhwa_pl_17_003.pdf. 2AASHTO Guidelines for Traffic Data Programs, 2nd Ed. Online: https://bookstore.transportation.org/ item_details.aspx?ID=1393.

144 Leveraging Big Data to Improve Traffic Incident Management Table A-8 Traffic digital video data. Assessment Criteria Assessment Description of Data Digital video is a representation of moving visual images in the form of encoded digital data. Digital video data is collected by transportation agencies through closed- circuit television (CCTV) (video surveillance) cameras, video detection, and automatic license plate reader/recognition (ALPR) systems. • CCTV systems use video cameras to transmit a signal to a specific place, on a limited set of monitors. Transportation agencies use CCTV cameras on highways, ramp locations, and intersections to monitor traffic from a central location such as a traffic management center (TMC). • Video detection devices capture video images of traffic and analyze the information using algorithms for traffic management (e.g., traffic signal control). • ALPR systems identify vehicles passing fixed locations using cameras that read the license plates. Such systems are widely used in electronic tolling applications. Who Collects, Maintains, and Owns the Data State and local transportation agencies, private toll operators, and parking lot managers. How the Data Are Collected Video data is collected via various types of remote camera technologies, generally deployed at fixed locations but with selectable orientation connected to a centralized location. Collected video and related images are then viewed live from the central location and sometimes recorded and stored in video archive for various amounts of time, often dictated by law or budget. Automatic video recognition software such as ALPR systems located either on the camera or down the video stream can be added to automatically extract metadata from captured images within the video. This metadata is then associated with a specific camera and a specific timestamp or timeframe and saved to one or more databases for storage or sent as a message to alert authorities when a vehicle of interest has been observed.1 Watching live video images allows for the extraction of many relevant data elements; however, this approach to data processing is limited. Most modern approaches to capturing video embed metadata such as date, time, and location into video frames during capture using exchangeable image file format (EXIF) tags. This metadata can then be augmented using machine-learning software, which uses image processing algorithms to extract from each video frame additional metadata such as vehicle counts, estimated speeds, tag numbers of passing vehicles, vehicle type, vehicle orientation, and so forth. This metadata is then used to qualify and characterize the event recorded on video. Data Structure Semi-structured. Data Size, Storage, and Management Gigabytes to terabytes. Video data is notorious for its very large size. Common practice is to compress video data on capture and then transmit and store it in compressed form until it is viewed or used by automated recognition algorithms. Storage/recording of video images is largely a policy decision for transportation agencies. Three fundamental video recording approaches are used: (1) always (continuously record most feeds and retain them for a few days), (2) sometimes (initiate recording of individual feeds for specific events), and (3) never. Kuciemba and Swindler (2016) describe the benefits and limitations to each approach. Of 32 TMCs surveyed, five reported they recorded most feeds most of the time, 23 reported they recorded the videos only under limited circumstances, and four reported they never recorded videos.2

Data Source Assessment Tables 145 Assessment Criteria Assessment Data Accessibility Accessibility varies widely among traffic video sources. Video is typically shared using a streaming method, which is commonly used to share video with media outlets and to some degree with the public via 511 and motorist-information websites using low resolution/quality video streaming. Transportation agencies also share in real time images extracted from video feeds with allied agencies like law enforcement, fire, and towing dispatch centers. Video streams and pictures also can be accessed by a restricted list of users from allied agencies using custom mobile or desktop applications. Alternatively, when stored or archived, TMC video can be provided upon request, which involves manual searches for the date, time, and location of the event requested. Most requests come from law enforcement. Video is typically copied onto a media store device and either picked up by or mailed to the requester. Although most traditional video data systems can store or archive and make the data accessible to a public or restricted audience, the data remains accessible at low resolution, which greatly limits its ability to be analyzed to provide value when machine processed. Digital and Internet Protocol (IP) camera systems offer an alternative that uses the Internet to transmit video to servers that can process the stream to add tags, clean the images, detect, and send alerts to interested parties directly using less communications bandwidth. Data Sensitivity When dealing with low-resolution video, generally the quality of the video is too low to allow sensitive information to be extracted. Low-quality video rarely depicts license plates and recognizable facial images; however, when dealing with high- definition video, sensitivity can increase greatly, as such information becomes visible and video processing can be performed automatically to detect sensitive information such as faces, license plates, location, and so forth. Data Costs Video is typically available for free to the public (at low resolution) or to other agencies and institutions (at high resolution). Video and image data files, even compressed, require large storage capabilities. Consequently, a non-negligible cost is associated with the retention of video and images. The amount and quality of data stored, compression ratios, image size, and retention period are factors that impact operational cost. Cloud storage services typically are used to store video and images because they offer the most economical storage solutions, allowing video to be stored without degrading its quality; but cloud storage is used rarely by TMCs. Data Openness Low-resolution video data from roadway CCTVs is usually open to the public. High- resolution video content is not usually accessible to the public; rather, it is made available only to requesting agencies on demand with a valid reason for obtaining the data (e.g., for a law enforcement investigation). (continued on next page)

146 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Challenges In most cases, TMC video or images are not stored or archived. When stored, video data is only stored and maintained for a brief period; then it is purged to make room for newer video. This practice greatly limits the potential quantity of video content that could be mined. Also, video collection is not uniform across space, time, and quality: • Coverage areas for roadway cameras varies; when present, camera views do not always provide complete coverage for all parts of the highway. • Equipment failures of field cameras, communications networks, and recording systems also can increase the lack of coverage when maintenance of cameras is not performed in a timely manner. • Weather conditions such as snow and rain can greatly affect the quality of the video collected, in some cases making it impossible to extract metadata. • Video container and compression standards vary widely between equipment and manufacturers. These standards often are proprietary and cannot be converted easily to a common standard without losing some video data integrity. These challenges result in video/image datasets that are sparse, non-uniform, and unevenly distributed, making it difficult to extract general trends or patterns. Conversion of real-time video for large-scale distribution can be expensive and require considerable information technology (IT) infrastructure. 1 International Association of Chiefs of Police, About ALPR. Online: http://www.iacp.org/ALPR-About. 2 Kuciemba, S., and K. Swindler, Transportation Management Center Video Recording and Archiving Best General Practices, U.S. Department of Transportation, Washington, D.C. (March 2016). Online: https://ops.fhwa.dot.gov/publications/fhwahop16033/fhwahop16033.pdf.

Data Source Assessment Tables 147 Table A-9. Safety service patrol and incident response program data. Assessment Criteria Assessment Description of Data Data is collected by safety service patrol (SSP) program (often called freeway service patrol or incident response) staff that is present at the scene of an incident. Data collected generally includes time and location of incident, type of incident, arrival and departure times, responder and response vehicle identification, supplies expended (e.g., gas or a tire patch), and the assistance provided (e.g., refueling, repairing tire, blocking lane or calling tow vehicle) using either pre-established codes or keywords, or free text. Some SSP programs also request a response from the drivers/vehicles assisted in the form of a postcard survey or request to complete an online survey with structured and unstructured data. This data typically captures the quality and value of services provided. Who Collects, Maintains, and Owns the Data State transportation agencies, metropolitan planning organizations (MPOs), transportation authorities. How the Data Are Collected Depending on the program, data is collected by the responder either manually (simple paper forms/logs), electronically (via laptops, tablets, mobile phones), and/or is communicated via radio back to a central location such as a TMC.1 Data Structure Data structure varies based on the collection method. Data can range from free text on simple forms to standardized records in relational databases. Data is often integrated with TMC software and records management systems. Data Size, Storage, and Management Megabytes. Although service patrols and incident response programs respond to large numbers of incidents daily, most of these incidents are easy to mitigate and do not generate a large amount of information. Most SSP incidents can be described accurately in less than 10 data fields. Data archiving often is done in-house by maintaining spreadsheets for a period of time (e.g., a month or an entire year) and then integrated into TMC software systems as part of the system archiving process. Archiving duration varies greatly across agencies. Data Accessibility Accessibility varies by entity, ranging from CSV or Excel files to statewide web- accessible databases providing more detailed and organized data that can be searched easily. Free text fields often are used to capture the details of incident responses rather than a standardized taxonomy. Free text, while still providing valuable information, is more difficult to analyze. The presence of abbreviations, synonyms and orthographic mistakes in the text makes the use of advanced text analytics mandatory before valuable information can be extracted. Data Sensitivity May contain PII. Data Costs Service patrol data is most often offered at no cost by state or regional organizations upon request and acceptance. A minor fee may be charged to obtain the data on a paper or disk format. Data Openness Service patrol data has limited data openness, as in most cases the data is not publicly shared online, and a request including the intended use of the data needs to be made to the operating agency to obtain the data on portable media or through a file-sharing service such as FTP. SSP customer feedback is typically accessible only at the aggregate level. (continued on next page)

148 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Challenges Most service patrol data are still collected using paper forms that are later entered into a database or spreadsheet or by a TMC operator in radio communication with responders. More modern ways of collecting service patrol data are becoming more prevalent. These systems, such as computer-aided dispatch (CAD) systems or mobile phone/tablet applications, capture data at the scene using a more structured and strict data collection process. Data collected from paper forms or radio communication and subsequently entered in spreadsheets or simple applications often lacks precise location information and is of lower quality due to the inability to correct for misspelled words, non-existent categories, non-standardized abbreviations, and custom narratives. This lower quality requires complex analysis to correct content and attempt to standardize the “fuzzy” content; but even with additional complex analysis, the resulting content may lose information precision in the process and become less valuable. Additionally, the current data management of service patrol data files (except for database systems) may also lead to difficulty ingesting and analyzing content. Often , spreadsheet files are collected, stored into shared network folders, and managed manually. Data file formats evolve and improve on a regular basis with improvements such as adding new columns or changing the category name used to describe service patrol responses, but the new formats may not be retroactively applied to update previously created data files. This less-rigorous data management leads to content that is non-uniform and difficult to analyze without cleaning. In some cases, retrofitting a new data format in older data files is not possible, as the historical data is less precise than the new data format requires. 1 FHWA Service Patrol Handbook, U.S. Department of Transportation (November 2008). Online: https://ops.fhwa.dot.gov/publications/fhwahop08031/ffsp_handbook.pdf .

Data Source Assessment Tables 149 Table A-10. 511 system data. Assessment Criteria Assessment Description of Data Traveler information (511) systems acquire, analyze, and communicate information to assist surface transportation travelers. The 511 system data and information can include general traffic (congestion and speeds) and weather conditions, as well as the location of incidents, work zones, roadway closures, and planned special events. Data sources to 511 systems generally include the state DOT, the highway patrol and police departments, transit agencies, and sometimes local jurisdictions and private companies. Who Collects, Maintains, and Owns the Data Varies between public transportation agency, private companies, or combination of both. How the Data Are Collected Varies between manual, semi-automated, or automated. Data Structure Structured and semi-structured.1 Data Size, Storage, and Management Megabytes to gigabytes depending on area covered. Storage and management is typically done on the premises using third-party systems. The 511 data systems are real-time information systems focused on delivering travel information to users in less than 3 seconds. Currently, no specific guidelines exist for how 511 data need to be stored or archived, and archiving practices vary widely across systems.2 Data Accessibility Mobile, web, web services, SMS, email and phone (text-to-speech). Data Sensitivity Not sensitive, except for homeland security concerns. Data Costs Free. Data Openness Aggregated data is open to the public via the 511 system. Raw data may be shared upon request via FTP or data dump. Data Challenges The 511 data are, first and foremost, real-time human readable information that is unstructured or semi-structured. Although 511 systems are designed to quickly broadcast traffic and transit event information to travelers, they are not designed to store that data or even structure and organize it for later retrieval or searches. The 511 data would need to be stored on a different system to be analyzed over time. Event data elements such as location, timestamps, and event type can be easily used for analysis, but data elements containing free text, such as event description, will be more challenging to mine and organize. These data elements will require more advanced text analysis to extract valuable keywords and topics essential to further analysis. 1 Real-Time System Management Information Program Data Exchange Format Specification, Federal Highway Administration, U.S. Department of Transportation (August 2013). Online: https://ops.fhwa.dot.gov/ publications/fhwahop13047/fhwahop13047.pdf. 2 America's Travel Information Number, Implementation and Operational, Guidelines for 511 Services, Federal Highway Administration, U.S. Department of Transportation, Version 3 (September 2005). Online: https://ops.fhwa.dot.gov/511/resources/publications/511guide_ver3/511guide3.htm.

150 Leveraging Big Data to Improve Traffic Incident Management Table A-11. Road weather data. Assessment Criteria Assessment Description of Data Road weather data is precise, facility-specific, and timely weather information as it pertains to the effects on the road.1 Road weather data collected at roadway locations can include atmospheric, pavement, and water level conditions. Atmospheric data can include air temperature and humidity, visibility distance, wind speed and direction, precipitation type and rate, tornado or waterspout occurrence, lightning, storm cell location and track, as well as air quality data. Pavement data can include pavement temperature, pavement freeze point, pavement condition (e.g., wet, icy, flooded), pavement chemical concentration, and subsurface conditions (e.g., soil temperature). Water level data can include tide levels (e.g., hurricane storm surge) as well as stream, river, and lake levels near roads.2 State agencies use different systems, and the development of the Clarus System was an attempt to standardize data across regions. Clarus is based on the premise that the integration of a wide variety of weather observing, forecasting, and data management systems, combined with robust and continuous data quality checking, could serve as the basis for timely, accurate, and reliable weather and road condition information.1 Clarus provides targeted and route-specific road weather information. Clarus has become the RWIS (Road Weather Information System) of the MADIS (Meteorological Assimilation Data Ingest System) operated by the National Centers for Environmental Prediction (NCEP), a part of the National Weather Service (NWS). Clarus aggregates various weather data from all over the world. FHWA has developed a new research platform, the Weather Data Environment (WxDE).3 The WxDE incorporates much of the Clarus data and functionality, as well as various ways to augment station data using connected vehicle data and applications. How the Data Are Collected Managed by a state or local agency, a RWIS collects data generated by a group of environmental sensor stations (ESS) located in sensitive areas of an agency’s road network. A communication network relays data from the stations to a central RWIS system where the stations’ data is stored. The weather station data is then monitored by the RWIS and transmitted to automated warning systems, traffic management centers, emergency operations centers, and road maintenance facilities. Clarus is an RWIS data aggregator that relies on state and local agency ESS networks to collect pavement and meteorological data next to roadways nationwide. Clarus aggregates road weather data from more than 2,400 ESS owned by state transportation agencies.4 The WxDE collects data in real time from both fixed ESS and mobile weather stations, such as automated vehicle location (AVL) systems and connected vehicle systems. In addition to collecting road weather data in a central location (like Clarus), WxDE provides additional enhancements by combining and correlating collected weather data and events data such as windshield wiper activation to further refine its data quality and value.3 Data Structure Semi-structured (CSV, XML, and NetCDF).

Data Source Assessment Tables 151 Assessment Criteria Assessment Data Size, Storage, and Management Gigabytes to terabytes, depending on coverage and time window. RWIS data typically is stored in relational databases and archived in flat files. State and local RWIS data management and archiving policies vary across state and local agencies. MADIS data (which includes data from Clarus) is archived indefinitely by the NOAA National Environmental Satellite, Data, and Information Service (NESDIS) and is stored as files in either CSV or NetCDF format. WxDE is archived indefinitely. Data Accessibility State and local data typically is accessible through website maps, download, or FTP. MADIS offers an advanced website with maps, a download page to access its data, and an associated application programming interface (API) to access its data directly from other applications. MADIS offers several file formats, but some of its data is stored in what are called “NetCDF files.” NetCDF files are a common application-agnostic format used to store scientific data. NetCDF files require an API to be read, which is also provided on the site. MADIS grants several levels of access for its data, ranging from “public” to “NOAA only.” Most information from WxDE is available to all website visitors. Registered users are provided with some additional capabilities, such as creating data subscriptions and accessing data for which the original provider placed restrictions on its distribution by the WxDE.3 Data Sensitivity None. Data Costs Free. Data Openness Limited openness. Access is dataset-dependent. Some datasets are accessible to the public, others require user registration, and some are restricted to government users. Data Challenges ESS may not always be maintained or monitored to counter sensor failure and sensor drift, which can lead to data quality issues (e.g., missing data, erroneous data). To circumvent this problem, quality checks and more advanced data verification and correction are performed by aggregators such as MADIS and WxDE. The NetCDF file format could also be challenging to use for non-scientific staff because it requires the implementation of dedicated API to access the data. 1 Bureau of Transportation Statistics. 2011. Clarus. Online: https://ntl.bts.gov/lib/44000/44300/44374/FHWA-JP0-11- 154_Clarus_Overview_final.pdf (accessed February 2017). 2 FHWA. 2017. “Surveillance, Monitoring, and Prediction.” Online: https://ops.fhwa.dot.gov/weather/ mitigating_impacts/surveillance.htm#esrw (accessed February 2017). 3 Weather Data Environment, FHWA. Online: https://wxde.fhwa.dot.gov/. 4 National Environmental Sensor Station Map, Road Weather Management Program, FHWA (February 2017). Online: https://ops.fhwa.dot.gov/weather/mitigating_impacts/essmap.htm.

152 Leveraging Big Data to Improve Traffic Incident Management Table A-12. Toll data. Assessment Criteria Assessment Description of Data Toll data, collected via electronic toll collection technology, includes the number of vehicles passing through toll gates, vehicle identification (license plate), unique toll tag identifier, automated vehicle classification, transaction processing, violation enforcement, date/timestamp, and location information. Who Collects, Maintains, and Owns the Data State transportation agencies, tollway authorities. How the Data Are Collected Each time a vehicle crosses a toll gate, an active vehicle-mounted radio-frequency identification (RFID) tag communicates with an antenna at a toll gate via dedicated short-range communications (DSRC). During the communication, the RFID tag broadcasts a unique identifier that is recorded in the toll system database along with the time and location at the time of capture. Automatic license plate reader/recognition technology (ALPR) also is used in automated tolling. Cameras mounted on toll gates capture vehicles’ license plate numbers using image recognition technology and store the numbers in the toll system database along with the time and location of the capture.1 Data Structure Structured. Data Size, Storage, and Management Gigabytes to terabytes, depending on coverage area and time window. Data often is stored in-house and managed by the toll agency or third-party service provider. Data typically is stored in relational database systems. Data Accessibility Database dump files are delivered either through FTP or using portable media. Data Sensitivity Electronic toll collection data are considered very sensitive as they contain names, addresses, credit card information, vehicle description, and license plate number. This information poses a threat to the privacy of participants because the systems record when specific motor vehicles pass toll stations. From this information, one can infer the likely location of the vehicle's owner or primary driver at specific times. Data Costs Costs will depend on agreements established between toll operators and the agencies requesting the data. Many toll operators may provide at least some data to requesting agencies for free, but it is possible that some toll operators may impose fees if no provision for data access has been made before the request. Data describing the daily whereabouts of thousands and even hundreds of thousands of citizens is currently of high-value for the private sector. Data Openness Limited openness, mainly because of the high sensitivity of the data. Data Challenges Toll data may be difficult to obtain, both because of its sensitivity and because of the possibility of private-party ownership. Although the data structure is simple and toll data should be able to be reused easily for Big Data analysis, data quality can be an issue. Automatic detection of vehicles at toll gates is known to be error prone, particularly when using ALPR, which is known to have significant error rates. Although data quality may be an issue when performing data analysis that requires the identification of vehicles (e.g., toll calculation or speed checking), TIM data analysis may not require the need to identify vehicles and therefore may not be affected by this issue. 1 Persad, K., C.M. Walton, S. Hussain. Toll Collection Technology and Best Practices, Texas Department of Transportation (January 2007). Online: https://ctr.utexas.edu/wp-content/uploads/pubs/0_5217_P1.pdf.

Data Source Assessment Tables 153 A.3 Public Safety Data Table A-13. Law enforcement, fire and rescue, and EMS CAD system data. Assessment Criteria Assessment Description of Data Law enforcement, fire and rescue, and EMS agencies use computer-aided dispatch (CAD) to initiate public safety calls for service, dispatch, and to facilitate and maintain communications and the status of responders in the field. CAD typically consists of a suite of software packages and modules that provide interfaces and services for call-takers, dispatchers, and field personnel. CAD includes: • Logging on/off times of personnel. • Generating and archiving incidents. • Assigning field personnel to incidents. • Updating incidents and logging those updates. • Generating case numbers for incidents. • Timestamping every action taken by the dispatcher. Relevant data elements include TIM timestamps, notification, dispatch, arrival/departure of agency responders, type of incident, disposition, and other incident details.1 Who Collects, Maintains, and Owns the Data Many of more than 12,000 Individual law enforcement agencies. Many of nearly 30,000 local fire departments. How the Data Are Collected Human- and auto-populated using commercial CAD software and in-house systems. Data Structure Semi-structured to structured. Data Size, Storage, and Management Megabytes (spreadsheets or PDFs) to gigabytes (relational databases) to terabytes (large Oracle databases). Data is typically managed and stored in-house at the local level or by third parties; maintained for several years. Data management and data interoperability procedures vary widely across the U.S. Data Accessibility FTP and web download are typically available for single or limited incidents upon request; live public facing views share limited information; some CAD systems are integrated with TMCs. Data Sensitivity Most data are public record, but some that contain sensitive data fields, criminal investigation information, and criminal history information are not made available outside the collecting agency. Data Costs Free. Although some minor cost may be incurred to maintain data-sharing infrastructure. Data Openness Limited openness as full (filtered) data is available upon request only. Data Challenges Time and cost to fill requests. Presence of sensitive data in data requested connection can complicate sharing as it would involve criminal justice systems. CAD data is recorded using an event database format, that is each row is an event combining an action such as “responder arrived” or “responder departed” with a timestamp. This data organization is ideal for collection but can complicate further data extraction and analysis as the data typically sought after is present in more than one record (time on scene, number of responders on the scene). 1 https://it.ojp.gov/documents/LEITSC_Law_Enforcement_CAD_Systems.pdf.

154 Leveraging Big Data to Improve Traffic Incident Management Table A-14. Emergency communication center (ECC)/911 call center/public safety answering point (PSAP) data. Assessment Criteria Assessment Description of Data Emergency communication centers (ECCs), also called 911 call centers and public safety answering points (PSAPs), are responsible for answering the 911 system for a geographic expanse following National Emergency Number Association (NENA) data standards. Who Collects, Maintains, and Owns the Data Approximately 6,500 locations across the United States serve as ECCs/PSAPs. How the Data Are Collected Incoming 911 calls are answered at the ECC/PSAP of the governmental agency that has jurisdiction over the caller's location. Location management via ANI (automatic number identification) and ALI (automatic location information) is the foundation of 911 data collection and call origination. With the location of the caller, based on telephone service provider information (landline or cellular), the call to 911 is first routed to the correct PSAP. When the 911 call arrives at the appropriate ECC/PSAP, it is answered by a specially trained operator or dispatcher. For landline calls, computer-aided dispatch (CAD) software uses the telephone number to retrieve and display the name, number, and location of the caller to the operator in near-real time.1 For wireless calls, the location is either handset based (GPS) or network based (towers). The integration of ANI/ALI functionality in modern CAD systems is common. The operator uses CAD software and interface to input information as described in Table A-13. Data Structure Structured and semi-structured (CSV, XML, RDF, JSON). Data Size, Storage, and Management Megabytes to gigabytes depending on coverage area and time frame. Data is managed and stored in-house at the local level or by third parties; maintained for several years. Typically, CAD systems use relational databases to store 911 data and flat file storage to archive it. How the data is managed and how interoperable it is varies widely across the United States. The Association of Public-Safety Communications Officials (APCO) and National Emergency Number Association (NENA) have jointly issued APCO/NENA ANS 1.107.1.2015, Standard for the Establishment of a Quality Assurance and Quality Improvement Program for Public Safety Answering Points, a voluntary standard that defines the recommended minimum components of a quality assurance/quality improvement (QA/QI) program within a public safety communications center. It recommends effective procedures for implementing the components of the QA/QI program to evaluate the performance of public safety communications personnel.2 Data Accessibility Data is typically accessible on request due to possible sensitivity in the data (criminal investigations, personal identification, comments). Redacted or partial 911 data can be found on https://www.data.gov from a variety of agencies (e.g., all police responses within the city of Seattle) and is refreshed at a variety of rates (e.g., 4 hours). Data Sensitivity Sometimes (criminal investigations, personal identification, comments) . Data Costs Free. Data Openness Limited openness, as full data is available only upon request.

Data Source Assessment Tables 155 Assessment Criteria Assessment Data Challenges Some prominent standards from national organizations exist and are being implemented, but there is no national standard or regulatory authority. Consequently, among the 6,000+ PSAPs nationwide, only a few have implemented standards that enable operational or data analytics assessments. This can render the integration and analysis of 911 data more challenging and untenable from a time, cost, and resource perspective. Partial or redacted datasets are publicly available. Additional analytical value will be found in complete datasets, but access to the full dataset may be challenging due to local and state law restrictions. 1 https://en.wikipedia.org/wiki/Enhanced_9-1-1. 2 APCO/NENA ANS 1.107.1. Standard for the Establishment of a Quality Assurance and Quality Improvement Program for Public Safety Answering Points (2015). Online: https://www.apcointl.org/doc/911-resources/apco-standards/600- 11071-2015-quality-assurance/file.html.

156 Leveraging Big Data to Improve Traffic Incident Management Table A-15. Public safety digital video data. Assessment Criteria Assessment Description of Data As with transportation agencies, public safety agencies make use of various types of digital video technologies, including CCTV, ALPR, dashboard cameras, and wearable cameras. Public agencies use ALPR to capture license plate numbers and compare them to one or more databases of vehicles of interest and alert authorities when a vehicle of interest has been observed.1 Dashboard cameras and/or wearable cameras are used to monitor traffic stops and other enforcement activities. Basic dashboard cameras are video cameras with built-in or removable storage media that constantly record. More advanced dashboard cameras can have audio recording, GPS logging, speed sensors, accelerometers, and uninterrupted power supply capabilities.2 Body cameras range from small, low-resolution options to high- definition options. Who Collects, Maintains, and Owns the Data Public safety agencies. How the Data Are Collected Via various types of cameras. Video stream is either recorded in a continuous loop of a few hours on the camera device or streamed directly to a data center where it is recorded and archived. Data Structure Unstructured (video) and semi-structured (XML, JSON, CSV). Data Size, Storage, and Management Terabytes. Like transportation agency highway cameras, video images from fixed roadway/venues are not always stored. Dash cams will record up to about 2 GB (about 6 hours) of video on a loop that refreshes continuously. Videos may be saved on a secure digital (SD) card or on an external drive, and typically download automatically to a server without human intervention.3 Body cameras use SD or microSD cards for storage. Depending on the model, they support anywhere from 4 GB to 120 GB of video storage and upload their video for storage automatically to a server without human intervention.4 As an example, the Birmingham police initially purchased 5 TB of online storage to store the video from 319 body cameras. In just 2 months, the department used 1.5 TB of its allotment and was on track to exceed the 5 TB limit in about 6 months.5 Depending on the agency, either plain storage or media library software including metadata management is used. Data Accessibility Data dump from server or device storage, upon request. Data Sensitivity Yes (faces, license plates, etc.). Data Costs A cost is incurred in the retention of video images. The amount and quality of data stored on storage media is subject to compression ratios, images stored per second, and image size, and the amount of data stored is affected by the retention period of the videos or images. Data Openness Not open. Sensitive and accessible on request only.

Data Source Assessment Tables 157 Assessment Criteria Assessment Data Challenges Dependency on wireless connection can be a technical obstacle. Institutional, technical, and legal – In most cases, video is stored or archived by law, but retention laws have not kept pace with video technology and greatly limit archiving. There are numerous legal restrictions regarding the acquisition, use, and storage of video images by law enforcement. Also, video that is not archived automatically from camera devices must be archived manually on a regular basis; failure to do so leads to the video data being overwritten and lost. These challenges greatly limit the quantity of video content that could be mined. Also, video data collection is not uniform across space, time, and quality: • Equipment failures of cameras, communications networks, and recording systems can also increase the lack of coverage when maintenance is not able to remedy failure quickly. • Weather conditions can greatly affect the quality of the video collected making it impossible in some cases to extract metadata under conditions such as snow and rain. • Video resolution varies widely between devices and many devices are still recording video at low resolution which affects its ability to be processed effectively. • Video container and compression standards vary widely between equipment and manufacturers. These standards are often proprietary and cannot be converted easily to a common standard without losing some video data integrity. These factors lead to video datasets that are sparse and non-uniform making it challenging to extract information or patterns from them. 1 http://www.iacp.org/ALPR-About. 2 https://www.lifewire.com/types-of-dash-cameras-534889. 3 http://www.randmcnally.com/support/faqs/what-is-the-recording-time-on-the-dash-cam-and-how-are-video-files- stored. 4 http://www.toptenreviews.com/electronics/photo-video/best-wearable-cameras/. 5 http://www.computerworld.com/article/2979627/cloud-storage/as-police-move-to-adopt-body-cams-storage- costs-set-to-skyrocket.html.

158 Leveraging Big Data to Improve Traffic Incident Management Table A-16. Towing and recovery data. Assessment Criteria Assessment Description of Data Catalog of calls for service and various timestamps for response, such as dispatch, arrival, and departure times, as well as type of assistance, equipment, insurance, and financial transactions. Who Collects, Maintains, and Owns the Data Towing companies. How the Data Are Collected Data collection is typically manual or electronic. A few towing companies do not collect any data, relying on the state police or transportation dispatch for this data. Some towing companies utilize computer-aided dispatch (CAD) equipment coupled with touch screen mobile data terminals (MDTs) within each of the trucks. Electronic systems allow for the accurate mapping and recording of each dispatch and arrival time on all calls. Software programs allow for cloud-based management of dispatched jobs/trucks on a map in real time. Data Structure Semi-structured to fully structured. Data Size, Storage, and Management Megabytes. Private company database in-house or in the cloud. Data Accessibility Contact for data dump. Data Sensitivity Yes (financial transactions and company business practices). Data Costs Unknown. Data is private and may not be available for sale. Data Openness Not open. Data are proprietary to the towing and insurance entities. Data Challenges A predominance of individual providers still do not maintain any data at all or maintain only limited data through a paper log or spreadsheet. In-house systems rarely go outside of the business.

Data Source Assessment Tables 159 A.4 Crowdsourced/Social Media Data Table A-17. Waze data. Assessment Criteria Assessment Description of Data Data generated by users of the Waze community-based navigation mobile application, including real-time road information such as crashes, construction, police presence, road hazards, traffic jams, etc. Also captured is confirmation of this information by other Waze users through either a “thumbs-up” or “thumbs- down” response or through detailed messages. Additionally, Waze automatically records the speed at which users travel on the roadways and captures messages sent between users through the mobile app. Data elements relevant to TIM include incidents’ reported times, incident details (e.g., number/types of vehicles involved), incident clearance times, traveler sentiments, speeds. Who Collects, Maintains, and Owns the Data Waze. How the Data Are Collected Road users report events using the Waze mobile application. Data Structure Semi-structured (CSV, JSON). Data Size, Storage, and Management Gigabytes to terabytes, depending on coverage and timeframe. A Waze event dataset (not including speed data) covering the entire nation from 2013 to 2016 contains about 120 million reports and has a size of about 120 GB. Waze data is managed on both the Amazon Web Services and the Google Cloud Platform louds (since 2013) and Waze uses cloud file storage, NoSQL databases, and relational databases to manage its data. Waze data is archived indefinitely. Data Accessibility Only accessible through partnership with Waze. Waze data is shared through its Waze Connected Citizen Program, which provides either processed, cleaned data or web applications such as Waze Traffic View or third-party applications using Waze data such as Genesis PULSE (EMS support application). Data Sensitivity User information that Waze collects may be sensitive. Users agree to Waze’s use of the data (with PII) but not to sharing this information with other entities. Data Costs Not typically any cost, only a requirement for data sharing. Waze’s Connected Citizen Program seeks to improve the use of the data for the community. States can develop a partnership with Waze and share data to access Waze data. Private entities willing to become Waze partners may have to pay a cost to access the Waze data, but the cost of that access is not public. Data Openness Limited openness (partners only). Data Challenges Waze data is a combination of both sensor data (speeds) and crowdsourced data (alerts or events), and as such does not contain perfect data. Although the error rate of location sensors on mobile phones is well known and can be circumvented using readings from other sensors in the vicinity, alerts sent by humans can be unreliable (e.g., pushing the wrong button, inaccuracies in what is happening/reporting). Free text is also used as part of Waze alert reports to provide additional details, which also allows for human error (e.g., misspellings, orthography). Waze does provide a way to assess the reliability and quality of its data by adding to its alert reports a reliability/confidence index ranging from 1–10. High-quality and highly reliable reports do not constitute most of the Waze alert reports, and some events/alerts may remain fuzzy or imprecise. Waze does not provide direct access to its raw data (e.g., how many people reported each incident, how many thumbs-up responses a report received), which may impair data users’ ability to assess the accuracy of Waze events/alerts.

160 Leveraging Big Data to Improve Traffic Incident Management Table A-18. Twitter data. Assessment Criteria Assessment Description of Data “Tweets” are generated by Twitter users using the Twitter app. Data includes tweet text (up to a 144-character stream), an associated timestamp, and possible attachments (e.g., photos, videos). When users allow Twitter to share their location, tweet locations (latitude, longitude) also are captured. Data elements relevant to TIM include incidents’ reported times, incident details (e.g., number/types of vehicles involved), incident photos or videos, incident clearance times, traveler sentiments. Who Collects, Maintains, and Owns the Data Twitter. How the Data Are Collected Twitter collects, stores, and publishes all its users’ “Tweets” submitted using mobile phone, website, or IoT devices leveraging the Twitter API. Machine-submitted tweets can relate sensor readings or alerts, and it is not uncommon for software architects to leverage Twitter as a communications layer for their own software platform. Data Structure Semi-structured (CSV, JSON). Data Size, Storage, and Management Terabytes. The data size of an average tweet is a few kilobytes, not counting attached media. Twitter manages and stores about 200 billion tweets a year, which is about 200 TB of data. Twitter manages its data using a custom developed and open-source data store including a large-scale, key-value store called Manhattan, a graph database called FlockDB, an open-source database called MySQL, as well as various storage and caching services. Tweets have been continuously archived by Twitter since 2006. Twitter provides a service (Twitter Archive) to allow its users to search and download its archive. Data Accessibility Twitter possesses multiple APIs allowing developers to process the real-time stream of tweets, to search tweets by text, user, hashtag, location, date, and so forth. Third- party applications use the tweet stream to create additional data mining and visualization interfaces that can help augment (e.g., text mining, categorize, reverse geocoding) and visualize the raw Twitter data to help discover its content. These third-party services often require users to register and pay to search, analyze, and visualize the data. Examples of Twitter third-party applications include web services such as Tweepsmap, Twitonomy, and Mentionmap. Data Sensitivity PII, including name, user profile, and sometimes real-time user location (sensitive even if voluntarily published). Data Costs Twitter API is free with some limitations (e.g., how much at once, frequency). Costs occur when using third-party APIs or software to mine the Twitter dataset. Data Openness Open (tweets are public).

Data Source Assessment Tables 161 Assessment Criteria Assessment Data Challenges Two of the main challenges of using Twitter data are the large quantity of tweets generated every minute and the free text structure of its content (except for hashtags). When processing the Twitter data stream to monitor for TIM-relevant information or events, the text of each tweet would need to be parsed, analyzed using text mining, correlated with similar tweets, and counted to establish the location and veracity of a detected event. This analysis is challenging, as it needs to be done in real time, there may not be enough tweets describing the incident, and users are likely to use different vocabulary to describe the incident. Twitter uses hashtags to qualify and categorize the free text content of its tweets. Twitter users can create hashtags and use them when needed within their message. Some commonly used hashtags (e.g., #accident) exist, but these are too general to allow tweets to be filtered to extract relevant TIM content, and there is no control over how hashtags are used by Twitter users. Not all tweets are geolocated, which can make it difficult to use tweet text to detect the occurrence of roadway events such as incidents or the free-flow recovery.

162 Leveraging Big Data to Improve Traffic Incident Management A.5 Advanced Vehicle Systems Data Table A-19. Automated vehicle location (AVL) system data. Assessment Criteria Assessment Description of Data AVL is a means for automatically determining and transmitting the geographic location of a vehicle with details that include date, time, address, longitude, and latitude. AVL is used to manage vehicle fleets, such as service vehicles, public transportation vehicles, emergency vehicles, and commercial vehicles. AVL data includes real-time temporal and geospatial data (polled every few seconds), as well as vehicle logs (e.g., vehicle number, operator ID, route, direction, arrival/departure times). Dispatchers can get a real-time snapshot of driver adherence to a route, provide customers with an estimated time of arrival, and communicate directly with drivers. Public safety agencies can use AVL technology to improve response times by dispatching the closest vehicles for emergencies. Who Collects, Maintains, and Owns the Data Fleet owners (e.g., Safety Service Patrols, public safety agencies, transit agencies, towing companies). How the Data Are Collected A vehicle’s position is located and tracked using a geographic positioning system (GPS) electronic device. The vehicle’s position is ether stored for later analysis or wirelessly communicated to the home base dispatch. Data Structure Semi-structured (CSV) or structured (SQL). Data Size, Storage, and Management Gigabytes to terabytes, depending on geographic coverage and timeframe. Stored in- house or via a cloud-hosted service. As of 2017, an available GPS transmitting device cost less than $20, was smaller than the size of a human thumb, was able to run 6 months or more between battery charges, and could communicate easily with smartphones.1 A transit system with about 200 vehicles will generate about 3,000,000 records annually. The leading GPS fleet management solutions should be able to retrieve historical data from any vehicle in a fleet as far back as when the vehicles were equipped with GPS tracking devices.2 Data Accessibility For data owners, data must be uploaded from the on-board computer to the central computer. Newer systems usually include an automated, high-speed communication device through which data is uploaded daily (e.g., when vehicles are fueled). Older systems rely on manual intervention, such as exchanging data cards or attaching an upload device, which adds a logistical complication.3 For non-owners, data may be obtained via FTP, data dump, or web services (if access is granted). Data Sensitivity For some agencies, the AVL data may include residential data for personnel that operate the SSP or law enforcement vehicles. Data may require redaction before sharing. Data Costs For already-equipped vehicles, there should be no costs for obtaining data from publicly operated systems. Data Openness The data can be shared upon request, but it is generally not open. Data Challenges The absence of an effective upload mechanism can render an otherwise promising data collection system useless for off-line data analysis.1 1 https://en.wikipedia.org/wiki/Automatic_vehicle_location. 2 Malcolm, J. Automatic Vehicle Location Technology is Valuable for Fleets of All Sizes (October 7, 2014). Online: https://www.hubs.com/power/explore/2014/09/automatic-vehicle-location-technology-is-valuable-for-fleets-of-all-sizes. 3 Furth, P. G., B. Hemily, T. H. J. Muller, and J. G. Strathman. TCRP Report 113: Using Archived AVL-APC Data to Improve Transit Performance and Management. Transportation Research Board of the National Academies, Washington, D.C., 2006.

Data Source Assessment Tables 163 Table A-20. Event data recorder data. Assessment Criteria Assessment Description of Data An event data recorder (EDR) is a digital recording device that allows the monitoring and recording of telemetric data reflecting activities inside and outside of an automobile. An estimated 92 percent of new passenger vehicles had EDRs as of 2006. EDRs have been required in new vehicles since 2013 and are required to record data in a standard format to make its collection and processing easier.1 A NHTSA regulation passed in 2012 provides that if a vehicle has an EDR, it must track 15 specific data elements, including speed, steering, braking, acceleration, seatbelt use, and, in the event of a crash, force of impact and whether airbags deployed.2 Who Collects, Maintains, and Owns the Data Data resides in the EDR of individual vehicles. Enacted in December 2015, the federal Driver Privacy Act provides that information collected belongs to the owner or lessee of the vehicle.3 How the Data Are Collected EDRs collect event information from the in-vehicle network and from the vehicle GPS antenna. EDRs record event data in a continuous loop in a memory bank capable of storing a few minutes of data. Most EDRs are built into a vehicle’s airbag control module and, upon a crash, are triggered to save the last 5 seconds of recorded information (e.g., airbag deployment, vehicle speed, engine throttle, and driver safety belt use) into a tamper-proof memory.1 Data Structure Semi-structured. Data Size, Storage, and Management Megabytes. Data from the EDR is stored on stacked memory boards inside a crash- survivable memory unit. Most EDRs are programmed to record data in a continuous loop, writing over information again and again until the unit is triggered to save the data in the event of a crash. When a crash occurs, the device automatically saves up to 5 seconds of data representing the moments immediately before, during, and after an incident.2 Data Accessibility EDR data can be retrieved two ways: (1) via a connection to the vehicle’s on-board diagnostics (OBD) port or (2) the EDR itself may be removed from the vehicle and the data retrieved directly. Downloading the data after a crash requires the use of a specialized data-retrieval tool kit that consists of hardware, software, and a special cable that plugs into the car’s OBD port or the EDR itself.4 The federal Driver Privacy Act of 2015 places limitations on data retrieval from EDRs.3 Police, insurers, researchers, automakers, and others may gain access to the data with owner consent. Without consent, access may be obtained through a court order. For crashes that do not involve litigation, especially when police or insurers are interested in assessing fault, insurers may be able to access the EDRs in their policyholders’ vehicles based on provisions in the insurance contract requiring policyholders to cooperate with the insurer. Some states prohibit insurance contracts from requiring policyholders to consent to access.4 Data Sensitivity EDR data characterizes driver behavior and as such can be used in court as evidence. Civil liberty and privacy groups have raised concerns about the implications of data recorders “spying” on drivers.2 Data Costs Crash data-retrieval kits cost between $2,000 and $10,000; however many law enforcement agencies have equipment or solicit vehicle dealerships for assistance. Data Openness The data is not open, as it requires custom equipment and the consent of the vehicle owner or a court order to be extracted from the EDR. (continued on next page)

164 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Challenges Due to current technology, costs and data privacy issues associated with EDR data collection, and storage, EDR data cannot be collected and aggregated. Typically, EDR data must be downloaded one vehicle at a time after receiving the consent of the vehicle owner or a court order. Alternative ways to access EDR-like data have been created by third parties such as auto insurance companies. On-board telematics devices (e.g., SnapShot® from Progressive insurance or the Automatic dashboard adapter and app by Automatic LabsTM) use the driver’s mobile phone to obtain some of the data collected by the EDR, streaming it to large data stores where the data is analyzed to optimize insurance company risks. These third-party devices require a user agreement to be signed by the driver that allows the third-party to collect and use its vehicle data, effectively circumventing the data privacy issue. The datasets created by these third parties may be an alternative way to access EDR data partially or fully without having to collect it one vehicle at a time (see vehicle telematics systems data). 1 Insurance Institute for Highway Safety, Highway Loss Data Institute, Event Data Recorders. Online: http://www.iihs.org/iihs/topics/t/event-data-recorders/topicoverview (accessed February 2017). 2 Rafter, M. V. Decoding What's in Your Car's Black Box, Who Owns the Data and Who Can Tap It? (Edmunds, July 22, 2014). Online: https://www.edmunds.com/car-technology/car-black-box-recorders-capture-crash-data.html. 3 National Conference of State Legislatures, Privacy of Data from Event Data Recorders: State Statutes. Online:http://www.ncsl.org/research/telecommunications-and-information-technology/privacy-of-data-from- event-data-recorders.aspx (accessed February 2017). 4 Vehicle Telematics: A Useful Litigation Tool for Attorneys, A Boon to Insurers and the Privacy Concerns Big Data Raises for Us All. Klieman & Lyons (September 20). Online: http://www.kliemanlyons.com/2014/09/vehicle- telematics-a-useful-litigation-tool-for-attorneys-a-boon-to-insurers-and-the-privacy-concerns-big-data-raises-for-us- all (accessed March 2017).

Data Source Assessment Tables 165 Table A-21. Vehicle telematics systems data. Assessment Criteria Assessment Description of Data Telematics is the transfer of data to and from a vehicle. Vehicle telematics systems combine a GPS system with on-board sensors and diagnostics to record speed, engine throttle, braking, ignition cycle, whether the driver was using a safety belt, airbag deployment, and the physics of crash events including crash speed, change in forward crash speed, maximum change in forward crash speed, time from beginning of crash event at which the maximum change in forward crash speed occurs, the number of crash events, the time between crash events and whether the device completed recording.1 Unlike Event Data Recorders (EDRs) that collect and store a few seconds of data immediately before and after a crash, telematic systems continuously record all types of second-by-second data about vehicles and driver behavior, sometimes for years at a time. Telematic technologies collect raw vehicle data and overlay this information with GIS mapping data (e.g., road type, speed limits). The data is then “broadcast” via data links such as Wi-Fi, GPS, Bluetooth, 3-axis accelerometers, and mobile broadband communications to auto manufacturers, fleet owners, and insurance companies. As the cost of enabling mobile broadband communications has fallen, automakers are increasingly embedding telematics in vehicles. Some form of telematics systems is now available in an estimated 70 percent of vehicles built since 2011.1 Advanced Automatic Crash Notification (AACN) is a component of telematics. The AACN Joint APCO/NENA Data Standardization Workgroup created the Vehicle Emergency Data Set (VEDS) to specifically address the need for an open standard format to be used for all providers and consumers of vehicle telematics information. VEDS is an XML-based data standard that provides useful and critical data elements and the schema set needed to facilitate an efficient emergency response to vehicular emergency incidents.2 At the fringes, the term telematics also is used to describe “connected car” features in general, which include live weather, traffic and parking information on the dashboard, apps, voice-activated features, and social media integration.3 Who Collects, Maintains, and Owns the Data Auto manufacturers, telematics service providers (TSPs), insurance companies, and fleet owners. How the Data Are Collected Data is collected by connecting to in-vehicle sensors using four distinct categories of telematics solutions—dongles, black boxes, embedded telematics, and smartphones:4 • Dongles are self-installed devices that are often provided by car insurers or may be purchased by the vehicle owner to monitor/record vehicle operation and/or driver behavior. • Black-box systems are professionally installed to monitor driving behavior and vehicle systems. • Embedded telematics are installed by some manufacturers and provide services such as remote diagnostics, navigation, and infotainment services. • Smartphones can work as stand-alone devices or be linked to vehicles’ systems (e.g., through Bluetooth) to transmit a variety of information to and from the car. Data Structure Raw data from telematics devices is in CSV format (semi-structured). Data Size, Storage, and Management Data is stored within the collection devices described above except where the devices interface with remote systems, call centers, and management systems. Some telematics systems, such as the ones deployed by auto insurance companies, store vehicle data in file storage, relational, or NoSQL databases for later analysis of the behavior of customers. Archiving of data varies depending on the data owner and chosen telematics solution. (continued on next page)

166 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Accessibility Each automaker and insurer uses its own proprietary telemetry or usage-based insurance (UBI) programs to access and store telematics data. The telematics data can only be accessed via a court order. Data Sensitivity Telematics data, and especially the aggregation of the data, presents privacy challenges for consumers, the courts, law enforcement, automakers, insurers, and the telematics industry. Privacy settings and arrangements depend on the service. For example, BMW’s ConnectedDrive may “collect and retain an electronic or other record” of a person’s location or direction of travel at a given time. The OnStar® system by General Motors “complies with its legal obligation to court orders or subpoenas” but doesn’t “share data with law enforcement absent a court order unless it is necessary to protect the safety of its customers or others.” Ford has said that its Sync program doesn’t track or transmit data continuously from a vehicle and that no data is transmitted from the vehicle without the customer’s consent, indicating that “[l]ocation data is only shared with our partners when necessary to fulfill the services requested by the customer.”1 Data Costs N/A. Data can only be obtained through a court order. Data Openness Not open, as it requires a court order to be accessed. Data Challenges Typically, these systems are used with the consent of the vehicle owner and access to data is restricted to uses defined by the user/owner. Telematics system user agreements may allow for the collected data to be reused or sold to others than the telematics systems owner and the driver. 1 Vehicle Telematics: A Useful Litigation Tool for Attorneys, A Boon to Insurers and the Privacy Concerns Big Data Raises for Us All, Klieman & Lyons (September 20). Online: http://www.kliemanlyons.com/2014/09/vehicle- telematics-a-useful-litigation-tool-for-attorneys-a-boon-to-insurers-and-the-privacy-concerns-big-data-raises-for-us- all (accessed March 2017). 2 Association of Public-Safety Communications Officials, Comm Center & 911, AACN/VEDS Overview. Online: https://www.apcointl.org/resources/telematics/aacn-and-veds.html (accessed February 2017). 3 Carter, J. Telematics: What You Need to Know, TechRadar, June 27, 2012. Online: http://www.techradar.com/news/car-tech/telematics-what-you-need-to-know-1087104 (accessed February 2017). 4 Karapiperis, D., B. Birnbaum, A. Brandenburg, S. Castagna, A. Greenberg, R. Harbage, A. Obersteadt. Usage-Based Insurance and Vehicle Telematics: Insurance Market and Regulatory Implications, National Association of Insurance Commissioners and the Center for Insurance Policy and Research (March 2015). Online: http://www.naic.org/documents/ cipr_study_150324_usage_based_insurance_and_vehicle_telematics_study_series.pdf.

Data Source Assessment Tables 167 Table A-22. Automated and connected vehicle, traveler, and infrastructure data. Assessment Criteria Assessment Description of Data Automated vehicles are those in which at least some aspect of a safety-critical control function (e.g., steering, throttle, or braking) occurs without direct driver input. Automated vehicles may be autonomous (i.e., use only vehicle sensors) or may be connected (i.e., use communications systems such as connected vehicle technology, in which cars and roadside infrastructure communicate wirelessly).1 NHTSA has classified vehicle automation into six levels:2 • Level 0: The human driver does all the driving. • Level 1: An advanced driver assistance system (ADAS) on the vehicle can assist the human driver with either steering or braking/accelerating. • Level 2: An ADAS on the vehicle can control both steering and braking/accelerating under some circumstances. The human driver must continue to pay full attention and perform the rest of the driving task. • Level 3: An ADAS on the vehicle can perform all aspects of the driving task under some circumstances. In those circumstances, the human driver must be ready to take back control when the ADAS requests the human driver do so. In all other circumstances, the human driver performs the driving task. • Level 4: An ADAS on the vehicle can perform all driving tasks and monitor the driving environment in certain circumstances. The human need not pay attention in those circumstances. • Level 5: An ADAS on the vehicle can do all the driving in all circumstances. The human occupants are just passengers and need never be involved in driving. Connected vehicles are vehicles that use any of a number of different communication technologies to communicate with the driver, other vehicles on the road (V2V), roadside infrastructure (V2I), and the cloud (V2C).3 A connected traveler is one that uses a mobile device that generates and transmits status data via DSRC, Wi-Fi, Bluetooth, or cellular. Messages generated and distributed by connected travelers could include data representing the traveler’s location, trip characteristics (e.g., speed), mode and status (e.g., riding in a car, riding on transit, walking, biking, etc.)(Gettman et al. 2017). DSRC technology generates, sends, and receives Basic Safety Messages (BSMs) to other vehicles and to roadside equipment (RSEs) at high frequency (10 times per second) and with very low latency (50 ms from transmission to receipt). A Probe Data Message (PDM) encapsulates a string of “snapshots” (a more comprehensive data element than the BSM) to provide vehicle trajectory information over a longer time frame than the local trajectories shared by the BSMs (Gettman et al. 2017). Connected infrastructure includes traditional ITS devices, such as traffic signals, ramp meters, CCTV, RWIS and may eventually evolve to include standard Internet-of-Things (IoT) protocols as IoT technologies continue to mature (Gettman et al. 2017). Who Collects, Maintains, and Owns the Data There is no clear property regime for ownership and control of such data. Thirty stakeholders, interviewed by RAND as part of the development of Autonomous Vehicle Technology: A Guide for Policymakers, were asked their opinion about who owned the data obtained by automated vehicles (AVs) as they move, gather, and transmit information. Not a single stakeholder was certain of the answer.2 (continued on next page)

168 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment How the Data Are Collected Data are collected via dozens of sensors that collect telematics, driver behavior, and environmental data. Sensors such as forward and side radar sensors, sonar, GPS, LiDAR, cameras, and monitoring systems generate AV and CV data. The amount of data generated is rather large and quickly exceeds the on-board data storage capacity; therefore, it is eventually stored in remote or cloud-based systems. AV and CV data can also be streamed directly to remote systems to be monitored in real time. Data Structure Semi-structured. ASN.1, XML, JSON, and CSV. Data Size, Storage, and Management Petabytes to zettabytes, depending on the number of vehicles collecting AV and CV data. It is estimated that connected vehicles may generate as much as 25 gigabytes per hour. It is assumed that not all this data will be stored and managed in its raw form, but at this scale cloud file storage and NoSQL databases will be required even for compressed or partial datasets. If all of the emerging data from connected vehicles, travelers, and infrastructure related to traffic operations is stored, the cumulative storage of a typical traffic management agency is estimated to be in the many thousands of terabytes by 2026 (Gettman et al. 2017). Data Storage Stakeholders interviewed in the RAND study identified policy questions concerning data use and legal issues (e.g., how long AV data should be maintained and by whom).2 Data Accessibility Stakeholders in the RAND study also raised the issue of whether data gathered, produced, or transmitted by AVs will be discoverable in legal proceedings.2 AV/CV aggregation and anonymization methods are being developed to facilitate accessibility. Data Sensitivity Some members of the AV industry are already working on how to anonymize vehicle data and aggregate it so that it does not reveal drivers’ PII. One stakeholder identified privacy concerning AV data as a critical issue that needs immediate policy attention. Two stakeholders made a comparison to the information captured by EDRs currently installed in automobiles.2 Data Costs Unknown and may not be applicable depending on ultimate privacy policies. Data Openness Not open at this point. Data Challenges Data ownership and privacy issues related to AV communications remain unsettled and an important policy gap.2 1 Automated Vehicle Research, U.S. Department of Transportation. Online: https://www.its.dot.gov/automated_vehicle/ (accessed February 2017). 2 Anderson, J. M., N. Kalra, K. D. Stanley, P. Sorensen, C. Samaras, O.A. Oluwatola. 2016. Autonomous Vehicle Technology: A Guide for Policy Makers. RAND Corporation, Santa Monica, CA. Online: http://www.rand.org/content/dam/rand/pubs/research_reports/RR400/RR443-2/RAND_RR443-2.pdf. 3 Center for Advanced Automotive Technology, Connected and Automated Vehicles. Online: http://autocaat.org/Technologies/Automated_and_Connected_Vehicles/ (accessed February 2017).

Data Source Assessment Tables 169 A.6 Aggregated Datasets Table A-23. RITIS data assessment. Assessment Criteria Assessment Description of Data An automated traffic and emergency management data consolidation, sharing, dissemination, and archiving system. Data include, but are not limited to third-party probe data, DOT ATMS data, National Performance Management Research Data Set (NPMRDS) data, road weather data, CAD data, virtual weigh station data, transit data, and parking spaces available. Who Collects, Maintains, and Owns the Data University of Maryland CATT Lab and partners, including state DOTs, public safety agencies, transit agencies, and third-party data providers. How the Data Are Collected RITIS data feeds from transportation agencies, public safety agencies, transit agencies, and third-party data providers. Data Structure Structured (relational database, geospatial databases) and semi-structured (XML, JSON, GeoRSS). Data Size, Storage, and Management Gigabytes, possibly terabytes, depending on the dataset or coverage area. RITIS also collects geospatial and raster (image) data, which is bigger than typical events datasets. All data within RITIS is archived indefinitely. Data Accessibility Public safety or DOT employees can register for an account to the RITIS platform by visiting https://www.ritis.org/register. Three types of feeds are available to users:1 • The RITIS Filter Web Service, a polling web service, allows consumers to receive data in several different formats (XML, JSON, and GeoRSS). Provides data from the widest array of agency sources and allows consumers to filter data by source agencies and by specific fields. • The JMS Filter utilizes a real-time publish/subscribe model using a Java Messaging Service broker. Upon the initial connection, the subscriber receives a full inventory of devices or events followed by asynchronous incremental updates (from a limited number of data sources). • The XML Filter, an SSL [secure sockets layer] secured web page, provides a list of GZIP-ed XML files with a snapshot of current data. Data consumers poll the page at a set interval to pull the latest snapshot in the XML format (from a limited number of data sources). Data within the RITIS archive can also be downloaded and/or exported so that users can perform their own, independent analyses. Generally, however, data are accessed through web tools that are designed for close inspection of defined events in space and time. Data Sensitivity Accounts are not given to the general public or the private sector due to the sensitive nature of some of the data. Data Costs The University of Maryland CATT Lab makes the RITIS platform available to registered users for a fee, which depends on the services purchased. Data Openness Limited openness. RITIS was first and foremost designed to support the transportation side of emergency management (command center coordination) and as such does not share its data with the general public. The RITIS platform focuses on providing visualizations and user interfaces designed to support emergency management real- time decisions and in addition provides web services that can allow other applications to be integrated with RITIS. Users may be limited to viewing only their own data. (continued on next page)

170 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Challenges Although RITIS provides analysis tools and visualizations, its data-sharing limitations do not allow its users to fully exploit the data it collects. It is unclear if the data that RITIS stores is stored in individual databases or if it is stored in a single data repository where all its datasets can be explored at once. Many of the visualizations provided in the RITIS documentation are GIS based and allow the geospatial merger of distinct databases and datasets without fully integrating them. RITIS does not provide information about data coverage, quality, or usability. Its documentation provides examples of advanced tools and visualizations in various transportation management aspects. No indication is given as to how many of the RITIS users can run these analysis and visualizations using their own data. Although RITIS contains data from a wide array of data sources, it is unclear what data sources are available for different locations and what data elements are included in the various data sources (e.g., ATMS data varies widely agency to agency and sometimes even TMC to TMC within an agency). 1 RITIS Platform, Features & Applications Overview, CATT Laboratory, University of Maryland (2015). Online: http://www.cattlab.umd.edu/files/RITIS%20Overview%20Book-2-2-15%20FINAL.pdf.

Data Source Assessment Tables 171 Table A-24. National Performance Measures Research Data Set (NPMRDS). Assessment Criteria Assessment Description of Data The NPMRDS provides vehicle probe-based data for passenger automobiles and trucks. NPMRDS is a monthly archive of average travel times, reported every 5 minutes when data is available, on the National Highway System. Separate average travel times are included for “all traffic,” freight and passenger travel.1 Who Collects, Maintains, and Owns the Data INRIX Traffic. How the Data Are Collected INRIX aggregates GPS probe data from a wide array of commercial vehicle fleets, connected cars, and mobile apps. Data Structure Semi-structured (CSV, shape files). Data Size Although the source data (INRIX) is Big Data, the size of the NPMRDS data files downloaded through RITIS’s “Massive Data Downloader” tool will depend on the size of the query (e.g., date range, number of roadways, etc.). Downloading the data from a website will eventually run into an upper limit to the size of the file than can be downloaded – e.g., client limitations, network bandwidth limitations (it could take 24 hours to download 100GB of data), limitations in the software handling the http transfer, storage capabilities of receiving desktop computer. Data Storage and Management Source data (INRIX) Before July 2017 the NPMRDS data was provided by HERE Technologies. Since July 2017, the data has been provided by INRIX through the CATT Lab’s RITIS system. A discontinuity in the data has been caused by the change in data providers. Agencies working with the dataset will have to adjust to a new kind of data/model (data doesn’t behave the same, doesn’t have same limitations). Data Storage Source data (INRIX)—Big Data infrastructure. Data Accessibility Available through the RITIS “Massive Data Downloader,” the official portal for all downloads of the NPMRDS. User must be a public agency and obtain a log-in to access the data. The Massive Data Downloader allows access to a sample of the data available to INRIX. The data is not accessible by a machine. Data Sensitivity None. Data Costs Free to states and MPOs. Data Openness Data are not open; only samples of data are available through the Massive Data Downloader; data are shared with state transportation agencies and MPOs only. Data Challenges Data cannot be used as a data source for Big Data (even though it’s based on a Big Data data source). The data cannot be accessed/custom-queried (the tool is designed for a pre-defined set of basic queries). Data has to be manually run on the RITIS system rather than put into a data lake for more in-depth analysis. Agencies would need to go directly to INRIX or a competitor, such as HERE Technologies to get the data for these purposes. Previously (when the data was provided by HERE) the data could be downloaded and put into a data repository for these purposes. 1 National Operations Center of Excellence. Online: https://transportationops.org/event/national-performance- management-research-data-set-npmrds-users-quarterly-technical-assistance.

172 Leveraging Big Data to Improve Traffic Incident Management Table A-25. Meteorological Assimilation Data Ingest System (MADIS) and MADIS Integrated Mesonet—National Oceanic and Atmospheric Administration (NOAA). Assessment Criteria Assessment Description of Data The Meteorological Assimilation Data Ingest System (MADIS) is a meteorological observational database and data delivery system. MADIS runs operationally at the National Weather Service (NWS) National Centers for Environmental Prediction (NCEP) Central Operations (NCO). MADIS subscribers have access to an integrated, reliable, and easy-to-use database containing real-time and archived observational datasets. Also available are real-time gridded surface analyses. The surface analysis grids assimilate all the MADIS surface datasets, including the highly dense Integrated Mesonet data.1 The MADIS Integrated Mesonet is a unique collection of thousands of mesonet stations from local, state, and federal agencies, and private firms that help provide a finer density, higher frequency observational database for use by the greater meteorological community.2 Who Collects, Maintains, and Owns the Data NOAA. How the Data Are Collected MADIS ingests data from NOAA data sources and non-NOAA providers, decodes the data then encodes all the observational data into a common format with uniform observational units and timestamps. MADIS collects data from 33 state DOTs. All DOT observations are part of the MADIS Integrated Mesonet.2 MADIS also performs multiple data validation, checks, and cross correlations of nearby sensors data to maximize the quality of its dataset. MADIS data can be accessed raw or corrected. Many of the implementation details that arise in data ingest programs are automatically performed. Data Structure The MADIS is stored using NetCDF files, a scientific file format commonly used to store weather data. Data Size, Storage, and Management3 Gigabytes to terabytes. Daily totals for the government, research, and education Integrated Mesonet dataset—680 MB (compressed), 5.67 GB (uncompressed).4 The data schedule is set by provider and ranges from every 5 minutes to once per day. Users can request data from July 2001 to the present. Quality checks are conducted, and the integrated datasets are stored along with a series of flags indicating the results of the various quality control checks. Data Accessibility MADIS provides several methods for users to access the data. MADIS data is made available through using multiple data transfer protocols via the Internet: file transfer protocol (FTP), Unidata’s Local Data Manager (LDM) software, web services using https, graphical displays.3 The web service API allows each user to specify station and observation types, as well as quality control choices, and domain and time boundaries. The provided MADIS web API and related utility programs allow easy access to MADIS observations without having to develop a program for reading NetCDF files. To access data, users must fill out a data application request. Some datasets are restricted by the provider. There are four distribution categories:5 • Distribution to government, research, and education organizations. • Sponsored access. • Public—full distribution. • Distribution to NOAA only. Restrictions are based on the provider. Most of the datasets are available without restrictions. Data Sensitivity No.

Data Source Assessment Tables 173 Assessment Criteria Assessment Data Costs Free. Data Openness Limited data openness due to some restricted content and need for NetCDF format knowledge. Data Challenges The NetCDF file format could also be challenging to use for non-scientific staff as it requires the implementation of dedicated API to access the data. NetCDF typically is used in scientific applications such as meteorological forecasting, not in Big Data analysis. NetCDF is not a Big Data–friendly format and its data need to be transformed into a simpler, more Big Data–friendly format to be processed. 1 Meteorological Assimilation Data Ingest System (MADIS), National Oceanic and Atmospheric Administration (June 16). Online: https://madis-data.ncep.noaa.gov/ (accessed February 2017). 2 Integrated Mesonet Data, National Oceanic and Atmospheric Administration (June 16). Online: https://madis.ncep.noaa.gov/madis_mesonet.shtml (accessed February 2017). 3 MADIS User Resources, National Oceanic and Atmospheric Administration (June 16). Online: https://madis.ncep.noaa.gov/user_resources.shtml (accessed February 2017). 4 MADIS Data Volume, National Oceanic and Atmospheric Administration (June 16). Online: https://madis.ncep.noaa.gov/madis_data_volume.shtml (accessed February 2017). 5 MADIS Dataset Restrictions, National Oceanic and Atmospheric Administration (June 16). https://madis.ncep.noaa.gov/madis_restrictions.shtml (accessed February 2017).

174 Leveraging Big Data to Improve Traffic Incident Management Table A-26. Third-party web service weather data. Assessment Criteria Assessment Description of Data Historical meteorological data and weather forecast data from various public and private weather data sources across the globe including temperature, wind, precipitation probability, pressure, visibility, wind speed, wind direction, cloud cover, visibility index, humidity, etc. as well as ancillary data elements such as nearby storms, moon phase, sunrise/set. Who Collects, Maintains, and Owns the Data Third-party real-time web service (e.g., Dark Sky). How the Data Are Collected Data is obtained from the datasets provided by multiple meteorological agencies from all over the world. Often mostly focused on U.S. and European datasets including MADIS and NEXRAD. Data Structure Semi-structured (JSON). Data Size, Storage, and Management Petabytes. Managed through Big Data database and cloud file storage. Data is updated. Data is typically updated every minute (Dark Sky). Data Accessibility Data is accessed through authenticated representational state transfer (REST)- based web service API. The API is not designed to support file downloads but can handle millions of requests at the same time. The API is used in the following way: A user sends forecast or weather data requests to the API specifying a time and location, and the API returns a very detailed historical reading or forecast for the next hours to days. Data Sensitivity No. Data Costs Low cost. Pay-as-you go. Low cost per transaction (e.g., $0.10 per 1,000 requests). First 1,000 forecasts per day free. Data Openness Open. Data Challenges The primary drawback is that the data cannot be accessed as a whole; rather, existing datasets containing location and time need to be augmented using the API.

Data Source Assessment Tables 175 Table A-27 National Fire Incident Reporting System (NFIRS) Data Assessment Criteria Assessment Description of Data The National Fire Incident Reporting System (NFIRS) is the standard national reporting system used by U.S. fire departments to report fires and other incidents to which they respond and to maintain records of these incidents in a uniform manner. NFIRS is the world's largest, national, annual database of fire incident information.1 Data elements relevant to TIM include fire department, location, vehicle fire, arrival time, firefighter casualty, firefighter deaths, civilian deaths and injuries. Who Collects, Maintains, and Owns the Data Every U.S. state and the District of Columbia report NFIRS data. Although NFIRS participation is not mandatory at the national level, about 23,000 fire departments report in the NFIRS each year. How the Data Are Collected After responding to an incident, a fire department completes the appropriate NFIRS modules using NFIRS-compatible software programs. Each module collects a common set of information that describes the nature of the call, the actions firefighters took in response to the call, and the end results, including firefighter and civilian casualties and a property loss estimate. The fire department forwards its data to the state agency responsible for NFIRS data. The agency gathers data from all participating departments in the state and reports the compiled data to the U.S. Fire Administration (USFA). As part of the collection and compilation process, various validation tools are used to ensure the quality of the entered data. Data Structure Structured and semi-structured. The public data release (PDR) uses a relational database containing 20 tables. The NFIRS PDR data provided online is composed of 19 data tables (files) (modules) in Dbase database file format (.dbf) format. The same data is available from www.data.gov in flat file formats (TXT, CSV). Data Size, Storage, and Management Megabytes to gigabytes. Participating fire departments report about 22,000,000 incidents and 1,000,000 fires each year. The PDR contains more than 2 million incidents per year (gigabytes). Due to large file sizes, the files available in the NFIRS Public Data Release (PDR) consist only of fire and hazardous condition incidents.2 Data Accessibility PDR is provided online or on a CD-ROM, as a set of Dbase (.dbf) files or as a set zip file on www.data.gov Data Sensitivity No. No sensitive data is loaded in the PDR. Data Costs Each year the USFA compiles publicly released incidents collected by states during the previous calendar year into PDR that is made available free of charge. NFIRS software is available as free desktop and web-based applications from the USFA or as NFIRS standard-compliant products purchased from fire software vendors. Data Openness Open. (continued on next page)

176 Leveraging Big Data to Improve Traffic Incident Management Assessment Criteria Assessment Data Challenges The USFA does not have a quality assurance system in place to check for codes that are not in the current data dictionary. Thus, the NFIRS PDR database contains invalid codes and may exhibit data inconsistencies that violate published documentation.3 Data is collected on a voluntary basis, so some areas may not have sufficient data. The distributed dataset is not a complete dataset. It only contains fire and hazardous condition incidents.4 The truncation of the dataset is apparently due to current data size limitations in the current storage and distribution system. This is rather uncommon these days, and it denotes either an obsolete system or obsolete data management practices, as the sharing of multi gigabytes files is a common occurrence today. The PDR dataset is published using the Dbase database file format (.dbf), which was created in 1978 to be used with the MS-DOS operating system. It is still common today on desktop-based database software but has had many iterations and variations. It requires software capable of parsing its binary structure to be read, which adds additional preparation work before it can be exploited by typical Big Data tools. JSON, XML, TXT, CSV should be used instead, as many databases capable of generating .dbf files can generate these Big Data–friendly formats as well. 1 https://www.usfa.fema.gov/data/nfirs/about/index.html. 2 https://www.usfa.fema.gov/data/statistics/order_download_data.html. 3 https://www.usfa.fema.gov/downloads/pdf/nfirs/nfirs_data_analysis_guidelines_issues.pdf. 4 https://www.usfa.fema.gov/data/statistics/order_download_data.html.

Data Source Assessment Tables 177 Table A-28. National EMS Information System (NEMSIS) data. Assessment Criteria Assessment Description of Data NEMSIS is a national repository of standardized EMS data elements from 49 states and 2 territories. Data elements relevant to TIM include timestamps associated with the TIM timeline (e.g., notification, dispatch, arrival/departure of EMS responders), EMS agency, type of service requested, type of delay (e.g., dispatch, response, scene), chief complaint, alcohol/drug use, and procedures performed. Who Collects, Maintains, and Owns the Data EMS agencies collect the data at the local level. There are three tiers of ownership and maintenance of the data: local, state, and national. The data are collected by local agencies and therefore owned at the local level. States own and maintain their individual state-level NEMSIS databases based on data submitted by the local agencies. States then submit a subset of data to the NEMSIS national repository. The subset of data that is submitted to the national repository is owned by the nation and maintained by the NEMSIS Technical Assistance Center (TAC) at the University of Utah. How the Data Are Collected Data is collected by EMS personnel in the field and entered into a NEMSIS-compliant software program, which electronically transmits (via web services) the data to a state database. A subset of data is then electronically transmitted (via web services) from the state databases to the national NEMSIS repository. The data flows from the local level to the national level in just a few minutes. Data Structure Structured and semi-structured (XML). Data Size, Storage, and Management 30.2 million records (gigabytes) were transmitted to NEMSIS in 2015. The national data repository is stored in-house at the NEMSIS TAC. Data Accessibility Although the data are not currently publicly available at the local level, they could be. 65 million records are available on the NEMSIS website. A full year of data can be obtained on a DVD from the NEMSIS TAC. NEMSIS TAC can release case-level data to researchers. Data Sensitivity Some data elements allow for the identity of the location (state) of the records (e.g., EMS agency, home zip code of patient, destination hospital). These data elements cannot be shared with the public. Data Costs The public-release dataset is available for free. Data Openness Limited openness because of lack of location resolution in aggregated datasets. Data Challenges Location data at the state and national level is limited to the zip code level, which could greatly limit data analytics because the resolution would be too low for meaningful analysis.

178 Leveraging Big Data to Improve Traffic Incident Management Table A-29. Motor Carrier Management Information System (MCMIS). Assessment Criteria Assessment Description of Data The Motor Carrier Management Information system (MCMIS) is a computerized system whereby the FMCSA maintains a comprehensive record of the safety performance of the commercial motor carriers who are subject to the Federal Motor Carrier Safety Regulations (FMCSR) or Hazardous Materials Regulations (HMR).1 Records are maintained in four broad categories: • Registration—Contains FMCSA registration data for all motor carriers (U.S. DOT#, company name, address, contacts, number of vehicles, number of drivers, and other registration information). • Crash—Contains data for each commercial motor vehicle involved in a crash (U.S. DOT#, report number, crash date, severity of the crash (tow-away, injury, fatal) and vehicle data, etc.). • Inspection—Contains data on roadside inspections conducted on motor carriers (U.S. DOT#, report number, inspection date, State, and vehicle and equipment information, and violations-related data, etc.). • Review—Contains information on reviews/investigations conducted on motor carriers and other entities (U.S. DOT#, review date, review type, safety rating, and so forth). Who Collects, Maintains, and Owns the Data State DOTs, state law enforcement, FMCSA. How the Data Are Collected Manual, electronic. Data Structure Structured. Data Size, Storage, and Management Terabytes, national data store. Data Accessibility Web service, web data files download, and requests via FMCSA data dissemination program. Data Sensitivity Contains PII. Data Costs Some dataset downloads are free via: https://ai.fmcsa.dot.gov/SMS/Tools/Downloads.aspx. Customized extracts and reports via the data dissemination program incur fees (e.g., crash file extract $36, personalized crash report $33, inspection file extract $70 per calendar year, and company safety profiles $27.50 each with discounts for more profiles purchased).2 Data Openness Data is shared as reports; data are not open. Data Challenges Data is not available in raw format, only through specific reports. 1 https://ask.fmcsa.dot.gov/app/mcmiscatalog/mcmishome. 2 https://ask.fmcsa.dot.gov/app/mcmiscatalog/c_chap3#crfe.

Data Source Assessment Tables 179 Table A-30. HERE data. Assessment Criteria Assessment Description of Data HERE Technologies aggregates and analyzes road transportation data from more than 80,000 data sources covering more than 180 countries, including “the world’s largest compilation of both commercial and consumer probe data, the world’s largest fixed proprietary sensor network, publicly available event- based data and billions of historical traffic records,” weather, events data as well as panoramic imagery and LIDAR data from its own vehicle fleet.1 HERE also relies on local source data and input from map users to generate constant daily map updates, such as real-time traffic, turn-by-turn directions, public transportation routes and information about local business and attractions. HERE combines “20 billion real-time GPS probe points a month with historical information and search queries to learn where people are traveling and what the conditions are like,” and per HERE, almost half of all the data is under one minute old and more than three-quarters is under five minutes old.1 Data relevant to TIM includes incident location (road segment), criticality, incident description, real-time traffic condition, three-dimensional (3D) visualization of incident surroundings (including roadway details), start/end times of incidents, construction data, venue data, weather data as well as estimated travel time to incident location and estimated traffic condition created by incident. Who Collects, Maintains, and Owns the Data HERE Technologies How the Data Are Collected Cell phones, connected navigation systems, fixed proprietary sensors, Twitter, state and local DOT data, email alerts, HERE map application, HERE 3D footprint vehicle fleet, as well as local businesses and attractions. Data Structure Structured and semi-structured. Data Size, Storage, and Management Terabytes to petabytes. HERE data is stored and processed using various combinations of on-the-premises and cloud-hosted relational databases, NoSQL databases, file storage and compute clusters. Most of the HERE datasets are real-time datasets designed to support real-time decision-making. Some of the HERE datasets are archived indefinitely to support HERE services such as its mapping, visualization, and predictive services. Data Accessibility HERE data is accessible through multiple web services, ranging from mapping and visualization services, traffic analysis, traffic prediction, and APIs to mobile application software development kit and toolkits. Web services are accessible to the public and businesses for a monthly fee. Data Sensitivity No. Data is anonymized. Data Costs HERE data is available through a monthly subscription plan to both the public and businesses. The HERE plan cost varies from free (under 15,000 transactions per month) to $500/month (150,000 transactions per month). Custom data plans are available for businesses requiring more transactions and services. Data Openness Limited openness. Accessed through web services. Data Challenges The primary drawback is that HERE data cannot be accessed as a whole (in raw format), but only through HERE web services. 1 Here 360, How to Really Outsmart Traffic (July 9, 2013). Online: http://360.here.com/2013/07/09/how-to-really- outsmart-traffic/ (accessed June 2017).

180 Leveraging Big Data to Improve Traffic Incident Management Table A-31. INRIX data. Assessment Criteria Assessment Description of Data INRIX collects massive amounts of information about roadway speeds and vehicle counts from more than 300 million real-time anonymous mobile phones, connected cars, trucks, delivery vans, and other fleet vehicles equipped with GPS locator devices. This data is enriched with event data (e.g., traffic incidents, weather forecasts, special events, school schedules, parking occupancy, road construction) to provide software-as-a-service (SaaS) and data-as-a-service (DaaS) solutions. Who Collects, Maintains, and Owns the Data INRIX Traffic. How the Data Are Collected Combination of a connected network of anonymized road sensors, devices, cars and drivers from more than 300 million sources, including commercial fleets, delivery and taxis, cameras as well as consumer vehicle data, parking data. This highly granular floating vehicle data is combined with traditional real-time traffic flow information as well as hundreds of market-specific criteria that affect traffic (e.g., construction and road closures, real-time incidents, sporting and entertainment events, weather forecasts, and school schedules). Data Structure Big Data infrastructure. Data Size, Storage, and Management 500 TB of data analyzed daily.1 Cloud infrastructure for storage and management. Data Accessibility Raw data generally is not available. Access to the data is obtained through a variety of ways, including traffic tiles, a monitoring site, flexible APIs, and the Transport Protocol Experts Group (TPEG) Connect, which delivers traffic and travel information to connected vehicles and mobile devices over the Internet.2 Provides a comprehensive collection of historic speed and travel time data available in archival or profile formats. Available through a series of on-demand, cloud-based analytics suites that leverage INRIX data. Data Sensitivity The data is reportedly anonymized, so PII may be low or limited. Data Cost Unknown. Must contact INRIX for various pricing structures. Data Openness Not open. Proprietary. Data Challenges N/A 1 http://inrix.com/resources/inrix-traffic-brochure/. 2 http://inrix.com/products/traffic/.

Next: Appendix B - Incident Response and Clearance Ontology (IRCO) »
Leveraging Big Data to Improve Traffic Incident Management Get This Book
×
 Leveraging Big Data to Improve Traffic Incident Management
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

"Big data" is not new, but applications in the field of transportation are more recent, having occurred within the past few years, and include applications in the areas of planning, parking, trucking, public transportation, operations, ITS, and other more niche areas. A significant gap exists between the current state of the practice in big data analytics (such as image recognition and graph analytics) and the state of DOT applications of data for traffic incident management (TIM) (such as the manual use of Waze data for incident detection).

The term big data represents a fundamental change in what data is collected and how it is collected, analyzed, and used to uncover trends and relationships. The ability to merge multiple, diverse, and comprehensive datasets and then mine the data to uncover or derive useful information on heretofore unknown or unanticipated trends and relationships could provide significant opportunities to advance the state of the practice in TIM policies, strategies, practices, and resource management.

NCHRP (National Cooperative Highway Research Program) Report 904: Leveraging Big Data to Improve Traffic Incident Management illuminates big data concepts, applications, and analyses; describes current and emerging sources of data that could improve TIM; describes potential opportunities for TIM agencies to leverage big data; identifies potential challenges associated with the use of big data; and develops guidelines to help advance the state of the practice for TIM agencies.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!