Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
9 This chapter briefly reviews proprietary data sources, services, their uses, as well as past procurement experiences. The focus is on emerging proprietary data generated by technology (e.g., GPS and crowdsourcing) in recent years. The content in this chapter is based primarily on the literature review and includesâbut is not limited toâthe following topics: â¢ Data sources and their uses, â¢ Past practices and studies on data procurement, and â¢ Legal issues relevant to proprietary data procurement. Data Sources and Their Uses This section reviews the proprietary data sources that are available for transportation applications. While in the past, agencies have acquired many types of data sets, such as freight commodity flow data, this study focused on data products derived from the passively generated data by ever-prevalent tracking devices. For example, in-vehicle navigation systems generate vehicle location data using GPS technology. Smartphones also generate locational data when they use location-based services (LBS), such as when searching for nearby restaurants. Data vendors acquire raw GPS traces or LBS data and process, integrate, and develop them into products such as travel speeds, routes, O-D, and volume. The data items reviewed are listed below. The review is based on information provided by survey respondents. As the study primarily focused on travel data enabled by new technologies, items such as the socioeconomic data and digital maps are included but with fewer details compared to the other data types. â¢ Speed or travel-time data; â¢ O-D data; â¢ Freight- and truck-specific data; â¢ Crowdsourced incident data; â¢ Crowdsourced non-motorized travel data; and â¢ Other data, such as socioeconomic and street network map. Speed or Travel-Time Data Speed or travel-time data are the central and most common pieces of information required for operational and planning applications. Traditional methods of collecting these data include the installation of instruments such as loops, radar detectors, Bluetooth readers, and license plate readers. On corridors with electronic toll collection systems, travel time can be estimated using toll tag readings as vehicles travel through the corridor. However, these methods require C H A P T E R 2 Overview of Proprietary Data
10 Practices on Acquiring Proprietary Data for Transportation Applications agencies to direct significant resources toward the deployment, operation, and maintenance of these sensors. Because of limited funding and federal requirements pertaining to the distribution of travel information, major highways such as freeways in urban areas are often well instrumented, while urban arterials and rural roads are less likely to have enough coverage (Crowson and Deeter 2012). The ubiquity of GPS tracking technologies in commercial fleets, passenger automobiles, and smartphones has carved out new opportunities for businesses to leverage the data generated by these devices. Companies such as INRIX, HERE (previously NAVTEQ), and TomTom have contracted with fleets and consumer automakers to access the GPS location data generated by these vehicles. With the increasing market penetration of GPS-enabled devices, the availability and quantity of speed or travel-time data have rapidly increased. Technological advances have resulted in the continuous improvement of data quality, as well. The greatest coverage of GPS data was found initially on heavily traveled urban arterials; however, as the market penetration of GPS devices has expanded, data are becoming widely available for collectors, local roads, and even in rural areas. In this study, the vehicles with GPS-enabled devices that contributed to real-time collection of speed or travel-time data are referred to as probe vehicles. The data collected from probe vehicles are referred to as probe-vehicle data or probe data. Other terms, such as third-party data or private-sector data that essentially have the same meaning and are used by agencies, will be used interchangeably with probe-vehicle data in this report. Agencies use probe-vehicle data in a variety of ways, which were summarized in a recent study (Athey Creek Consultants 2017). Data are commonly used to inform the public of travel times and traffic incidents on dynamic message signs (DMSs), to display traffic condition information on traveler informa- tion websites, to populate highway operating conditions displays at traffic management centers, and to confirm whether an incident has occurred. As such, they can be valuable for traffic and incident monitoring. The study also discussed additional uses in which speed data can be ana- lyzed using special tools, such as probe data analytics suites for applications like dashboards for congestion identification, work zone queue warning systems, and slowdown and delay warn- ings (Athey Creek Consultants 2017). Archived historical speed or travel-time data are often used for corridor studies, system performance measurements, and travel-model validation. Several examples illustrate the uses that state DOTs have found for probe-speed data. A typical use of real-time speed data is the display of traffic conditions as a color-coded layer on a 511 traveler information web page. Figure 2 shows a screenshot of the live traffic map hosted by the I-95 Corridor Coalition, a consortium of âtransportation agencies, toll authorities, and related organizations, including public safety, from the State of Maine to the State of Florida, with affiliate members in Canadaâ (I-95 Corridor Coalition 2018). The measured speeds are shown on the map, from the lowest speed range (less than 15 mph) to the highest speed range (greater than 50 mph). Users can also choose to display other measures, such as the level of congestion, defined as the ratio between the measured speed and unrestricted speed. Another use case of real-time data is the detection of non-recurring congestion. Indiana DOT cooperated with Purdue University to develop a real-time queue monitoring system using INRIX 1-minute interval speed data at the Traffic Message Channel (TMC) level (Li et al. 2015). The system continuously records the speeds on all links along a route and then computes the speed difference (i.e., delta speed) between any two adjacent links. If any delta speed falls below a predefined threshold, the system triggers an alarm notifying the traffic control manager. This information is disseminated to travelers to warn them of slowdowns, as well as to patrol officers, so that they can respond to non-recurring congestion more quickly. Figure 3 illustrates
Overview of Proprietary Data 11 Figure 2. A screenshot of I-95 Corridor Coalition live traffic map (I-95 Corridor Coalition 2018). Figure 3. Queue development over time (Mekker et al. 2016).
12 Practices on Acquiring Proprietary Data for Transportation Applications queue development over time caused by two crashes on a section of I-65 northbound in Indiana. The initial crash occurred at around 8:30 a.m. near Mile Point 215, with the southernmost large circles indicating the back end of the queue. Another crash occurred at 10:16 a.m. at the back end of the queue just as the queue from the first crash was about to clear. As a result, a longer queue formed and lasted for about 2 hours and extended nearly 10 miles upstream. Historical or archived speed data, on the other hand, are frequently used for performance measurements. Because of the large amount of data collected, it is possible to compute numerous performance measures, including travel-time reliability metrics, which normally require at least 1 year of data. When multiple years of data are available, agencies can derive performance measures annually and track performance over time. Figure 4 illustrates levels of travel- time reliability based on data from 2016 on Marylandâs state freewayâexpressway system (Mahapatra et al. 2017). Travel-time reliability is a measure of consistency in day-to-day travel time for the same trip (e.g., daily commute). Planning-time index (PTI) is the ratio between the 95th-percentile travel time and the free-flow travel time. A PTI value of 1.5 indicates that a traveler needs to plan 45 minutes for a trip that normally would take 30 minutes under uncongested conditions. The higher the PTI value, the lower the travel-time reliability. Figure 4. Maryland highway reliability map (Mahapatra et al. 2017).
Overview of Proprietary Data 13 The Kentucky Transportation Cabinet (KYTC) acquires historical probe-vehicle speed data to support its congestion management program and to evaluate roadway performance (Chen and Zhang 2017). Data have been used to develop travel-time-based performance measures and validate the methodology used for systemwide network screening and project selection. They also supplement the National Performance Management Research Data Set (NPMRDS) through the provision of speeds on many roadways outside the National Highway System (NHS). Another important application of speed data is validation and calibration of travel-demand models (Cambridge Systematics, Inc. et al. 2012). Travel-demand model validation and calibration is an integral part of the process for developing demand models and ensures the reasonableness of model outcomes. Speed data can have multiple uses. First, free-flow speed is an important input for the model and is traditionally estimated using the Highway Capacity Manual and similar methods that rely on geometric attributes. Free-flow speed can be derived from probe-speed data. Volume-delay functions (e.g., Bureau of Public Roads functions and AkÃ§elik functions) are calibrated to achieve a better goodness of fit between speed and congestion level. The Wisconsin DOT has calibrated speed flow curves using INRIX speed data for costâbenefit analysis. As NCHRP Report 716: Travel Demand Forecasting: Parameters and Techniques points out, agencies have increasingly looked at link speeds and travel timesâin addition to volume- and trip-length-based measuresâfor travel-demand forecasting validation (Cambridge Systematics, Inc. et al. 2012). Probe-speed data can be used to provide such information directly. Despite the immense amount of information conveyed by GPS data, they also have limi- tations. For example, the sample size of vehicles from which probe-vehicle data are collected varies significantly across road types. On less-traveled roads, especially in rural areas, probe-vehicle data can be sparse. Although congestion is less likely to be an issue for these roads, the utility of probe-vehicle data for systemically tracking their performance can be limited. Validation of probe-speed data is often required. More issues and caveats of the speed or travel-time data will be discussed in detail in the Use Experience section of Chapter 3. OriginâDestination Data O-D data are critical inputs for many transportation applications, including the calibration and validation of travel-demand models, detour planning, and corridor studies. Traditionally O-D data have been collected via household travel surveys, license plate matching, and roadside surveys. However, using these data collection methods to obtain sufficiently large samples is very costly. This is especially the case in large geographic areas. One downside of these methods of collecting O-D data is that their ability to capture long-distance travel is quite limited. For example, license plate matching only documents trips passing through locations at which license plate readers are stationed. Cellular and GPS location data offer insights into the origin and destination of probe vehicles, as well as the routes they take to complete their trips. Because of privacy concerns, data providers often need to anonymize data to remove any personally identifiable information. Consumer smartphones that have location services enabled are another major source of O-D data. For example, as a passenger in a car searches for a restaurant at the next high- way exit, the locations of this user would be part of the data generated by the LBS. This type of data have been used in business intelligence applications, such as helping billboard owners determine how much to charge for billboard advertisements based on the number of people passing it each day or to help merchants estimate the exposure of their advertise- ment better.
14 Practices on Acquiring Proprietary Data for Transportation Applications LBS data have been integrated with probe-vehicle data for transportation applications. Data providers have established cloud-based platforms that let users query trips between user- specified origins and destinations. These data offer a number of opportunities for agencies to gain insight into travelersâ route choices. These O-D data are collected from smartphones and navigation devices continuously, whereas traditional surveys only provide a snapshot of travel behaviors at the time of the survey. In densely populated urban areas, these data tend to be more abundant than in rural areas because of the higher market penetration rate of GPS-enabled devices. These data provide both O-D and travelâtime matrices in a way that is more appropriate for validation and integration with travel-demand models (Kressner 2017). Such data can cover large geographical areas and have the flexibility to aggregate trips into various spatial and temporal levels. The Idaho DOT has developed trip matrices using cell phoneâbased O-D data for the statewide travel-demand model (Stabler 2014). The agency procured O-D data for the average weekday resident home-based work and home-based other trips, as well as non-home-based trips for both residents and visitors. Data were obtained in super zone matrices to reduce the cost and then disaggregated to smaller model zones based on each model zoneâs share of population and employment in the super zone. Cell phone O-D data were used to synthesize travel demand and to estimate external travel. Reasonable goodness of fit was found between trip-length distributions for cell phone O-D data and Boise MPO travel survey data. However, significant differences were found for non-home-based trips, especially short-distance trips, possibly because of different definitions used in the MPO survey and vendor data. The data were licensed only for the travel-demand model project, but derivative tables and reports can be used for other applications. Recently, O-D and probe sample data have been used to estimate average annual daily traffic and average annual hourly volume. A study conducted by the Texas A&M Transportation Institute for the Minnesota DOT evaluated volume estimates from StreetLight and found that the errors of many estimatesâespecially on lower-volume roadsâare too high to be acceptable, demonstrating the need for further research (Turner 2017). Freight- and Truck-Specific Data Many state DOTs and MPOs need freight movement data for truck travel models. Tra- ditionally, these data have been gathered through commodity flow surveys. FHWAâs Freight Analysis Framework (FAF) commodity flow database includes O-D of freight by commodity type and estimates of flows on major routes and segments. A major issue with this data set is its very limited spatial resolution. For example, even though FAF data are available at the county level for internal use within the U.S. Department of Transportation (U.S. DOT), they are much coarser in the released version (Donnelly and Moeckel 2017). State DOTs and MPOs have acquired the commodity flow database from Transearch. Survey respondents (see Chapter 3) noted that their primary purpose in acquiring freight commodity flow data is to develop or update statewide or regional travel-demand models. Agencies have been pur- chasing access to these databases for many years. GPS data generated by commercial fleets are disaggregated, providing much finer-grained information on truck movements. One disadvantage of these data is their lack of particulars on cargo and truck behavior, which are obtainable through commercial travel surveys. A 2014 Florida DOT study used truck GPS data obtained from the American Transportation Research Institute (ATRI) for a number of applications (Pinjari et al. 2014). Data were collected via a joint effort between ATRI and FHWA to measure freight performance on the nationâs major highways. Evaluation of the GPS data indicated that they capture approximately 10% of
Overview of Proprietary Data 15 heavy truck volume observed on Floridaâs highways. Using these data, Florida DOT developed truck travel speed by time of day for the stateâs Strategic Intermodal System highway network. Truck trips and their characteristicsâsuch as trip duration, trip length, and speedâwere developed, as well. Truck O-D tables were also developed for a statewide travel model. Crowdsourced Incident Data With more consumers than ever owning smartphones, applications that crowdsource traffic incidents and delays are on the rise. One example is Waze, a smartphone app that allows users to post alerts when they observe or encounter incidents such as crashes, hazards, and traffic jams. Through its Connected Citizens Program, Waze partners with state and local transporta- tion agencies to exchange data. Waze gives agencies access to the incident and traffic jam feeds generated by its users, and in return agencies provide Waze information on construction, road closures, special events, evacuation routes, and other alerts they generate. The most frequently cited use of Waze data among agencies is the provision of incident awareness to traffic operation centers. Some agencies (e.g., KYTC) post Waze-generated incident alerts on their traveler information websites as a separate layer marked as Waze alerts. Others use Waze alerts as a secondary source of information for traffic incident management. Waze alerts provide a data source for highway traffic monitoring, especially in rural areas where state DOTs are less likely to have good sensor coverage. A study comparing 1 year of Waze incidents in Iowa with those recorded by the stateâs advanced traffic management system (ATMS) found that Waze offered a 12% increase in coverage over the current ATMS (Amin-Naseri et al. 2018). ATMS relies on sensors or probe-speed data to detect incidents. Usually, an ATMS detects incidents when significant speed reductions are observed. Relying on speed observations alone can produce a lag in incident detection. Additionally, on lightly trafficked roadways and during off-peak time periods, incidents may not always result in speed reductions. In many cases, Waze data provide faster incident detection in areas without cameras. However, there are additional considerations to examine with regard to Waze data. Because users generate incident reports, the same incident may be reported multiple times by different users. Redundancies, inaccuracies, and mismatches discovered in Waze reports may require an agency to dedicate a significant amount of time to processing and validating the data before using it. Although coverage is typically more extensive on rural roads than is provided by state DOT sensors, it is still likely that traffic incidents are underreported depending on the time of day, traffic levels, and the number of drivers using the Waze app. Freeways and heavily traveled arterials tend to be well represented throughout the day. Crowdsourced Non-Motorized Travel Data Evolving technology also benefits data collection targeting non-motorized travel. Crowd- sourced smartphone applications let runners and bikers track their activities, which state DOTs and MPOs can potentially use to gather information on where these trips occur. One study leveraged Strava cycling data to estimate bicycle trip volume for MiamiâDade County (Hochmair et al. 2017). Data on user rides across time periods were aggregated and attached to street network segments. Commuter trips were identified using either a commute status indicator, the duration and distance constraints of cycling trips, or predetermined rules created by Strava. These data were used to derive bicycle ride counts at the segment level, as well as bicycle kilometers traveled at the census block group or higher level, as shown in Figure 5. The study observed that the data have great spatial and temporal resolution and coverage. They
16 Practices on Acquiring Proprietary Data for Transportation Applications were used to develop regression models to evaluate bike ridership for non-commuter and commuter trips during weekdays and weekends, respectively. Watkins et al. (2016) documented a study that collected cyclistsâ data from crowdsourced smartphone apps in Atlanta, Georgia; developed an open source procedural standard for data cleaning and map matching; examined how the sociodemographics of cyclists affect their route preferences; and developed a route choice model for planners. They found that the self-reported demographic characteristics of cyclists are useful for classifying them as a particular type of rider. Further, the findings can be used to identify the most impactful routes and links, which, if improved, would drive up the rate of cycling. Other Data Socioeconomic data Nine states among this studyâs survey respondents reported that they have acquired proprietary data on population, employment, household information, economic conditions and forecasts, commodity costs, and revenue forecasts, among others. These data are used in demographic analyses; market studies; identification of employment centers and employment numbers; state highway user revenue forecasts; statewide, regional and MPO travel models; assessments of economic impacts from roadway projects; and evaluation of policies outlined in planning documents. Trip Counts Bicycle Kilometers Traveled Figure 5. Bicycle trip counts and bicycle kilometers traveled in MiamiâDade County (Hochmair et al. 2017).
Overview of Proprietary Data 17 Digital map and aerial imagery Six states have acquired street maps and imagery from private vendors. Agencies obtained these data because they are more cost-efficient than collect- ing and maintaining image collections in-house. These data have been mainly used for spatial analysis, routing vehicles during snow and ice removal operations, analyzing highway networks to create more efficient and smarter routes, adding geographic information system (GIS) layers into web and desktop applications, sharing information, and outreach. Analytic Tools Large data sets create challenges for state DOTs and MPOs because the data standards of these new data often differ significantly from those of their legacy systems. Real-time and historical speed data and crowdsourced data are often referred to as âbig data.â Big data require that agencies have immense storage capacity and specialized computing resources and technical expertise. Many agencies, however, lack these assets. When agencies issue RFPs to license these data, they often require analytic tools from the vendor or a third party to leverage computing power so that they can access a user-friendly interface to built-in functionalities that will meet their business needs. Several agencies that responded to the survey (see Chapter 3) indicated that they specified the right to have access to Regional Integrated Transportation Information System (RITIS) and Iterisâ Performance Management System (iPeMS) tools in their data acquisition. These tools and their applications are briefly described below. RITIS The wide range of tools offered in RITIS are used to process, visualize, and share data, with the goal of enabling âa wide range of capabilities and insights, reducing the cost of planning activities and conducting research; and breaking down the barriers within and between agencies for information sharing, collaboration, and coordinationâ (RITIS 2018). For example, the Maryland DOT has used RITISâs Massive Data Downloader, Bottleneck Ranking, and Major Corridor Summary Reports to address the need of congestion monitoring for its annual State Highway Mobility Report. South Carolina DOT used RITIS tools to conduct the after-action review for the massive congestion associated with the solar eclipse on August 21, 2017 (RITIS 2018). iPeMS Some agencies have been using this transportation network management tool to ingest and process third-party traffic data and to generate reports and visualizations (iPeMS 2017). The Oregon DOT recently used iPeMS to study congestion associated with the 2017 solar eclipse. Using the real-time and historic traffic data visualization tool, the agency pinpointed areas where bottlenecks occurred and referenced them to historic travel-time data. The tool gave the Oregon DOT the ability to warn residents in areas where heavy traffic is uncommon of impending events likely to produce significant congestion. These warnings help alleviate the magnitude of traffic jams in real time. After the eclipse, the Oregon DOT studied the effects of traffic during the eclipse. Based on this analysis, the agency will be better prepared for the next solar eclipse in 2024. Practices on Data Procurement Federal, regional, and state entities have acquired proprietary data in recent years. This section reviews their procurement practices and documents strategies recommended by several studies. The literature review focuses on the following practice and studies: â¢ Two editions of the NPMRDS acquisition, â¢ I-95 Corridor Coalition Vehicle Probe Project,
18 Practices on Acquiring Proprietary Data for Transportation Applications â¢ FHWA report, Private Sector Data for Performance Management, â¢ NCHRP Project 70-01: Private-Sector Provision of Congestion Data, â¢ California Partners for Advanced Transportation Technology (PATH) pilot procurement of third-party data, and â¢ FHWA report, Applying Archived Operations Data in Transportation Planning: A Primer. National Performance Management Research Data Set The Moving Ahead for Progress in the 21st Century (MAP-21) Act set performance manage- ment requirements for state DOTs and MPOs. To help them meet this requirement, the FHWA obtained the NPMRDS, which contains probe GPS travel-time data. The first NPMRDS contract was awarded to HERE. The data were delivered in two parts: a TMC static file, as well as travel times for passenger vehicles, freight trucks, and combined passenger and freight vehicles georeferenced to TMCs. The RFP for this acquisition required that actual data be provided without historical data substitutions. Also, the RFP specified that data should cover the entire NHS, as defined by MAP-21; present travel times in seconds, in 5-minute increments, 24 hours a day, 7 days a week; and that the data sets should be delivered monthly and be available from 2011 onward (HERE 2013). In 2017, the second NPMRDS contract was awarded to the teams at the University of Maryland and at INRIX. The RFP for this acquisition outlined data requirements and defined restrictions for data use (FHWA 2016). The RFP stated that probe-speed data of interest were required to be observed from probe vehicles and to include average travel times and average speeds in 5-minute increments for all vehicles and freight trucks. Imputed, predicted, or historical data could not be included. A path-based processing approach was allowed, and detailed explanations with regard to how the employed approach would work and when to expect it to be used were required. Specifications for geographic coverage were provided in a detailed list. FHWA stated that it would use absolute speed error to evaluate probe data accuracy. Supplied data had to meet the frequency and accuracy requirements. FHWA clearly defined data use and sharing in the RFP. It stated that âFHWA, public trans- portation agencies, and officially designated representatives shall have the right to use the vehicle probe data provided under this contract for transportation planning and operational analyses, service and data quality validation analyses, and all other internal organization applications.â FHWA also required a perpetual license to access the data set and that the same access be granted to âstate DOTs, MPOs, other operating administrations at U.S. DOT, and Federal partners involved in transportation analyses,â as well as contractors performing work on their behalf. It also held that authorized users would reserve the right to share aggregated results from the data set with the public. The vendors were asked to specify any additional restrictions to protect the commercial value of their data. To ensure the maintenance of data quality throughout the contract period, FHWA mandated submission of a data validation plan that would detail the vendorâs approach to implementing a viable data quality assurance methodology. Details of requirements are as follows: â¢ The plan shall be consistent with data requirements set forth in the RFP, such as 5-minute frequency, data accuracy measured by the absolute speed error and error bias, and temporal and spatial coverage. â¢ The contractor shall be responsible for developing and implementing a valid and reliable data validation methodology. â¢ The contractor shall perform data validation and assessment and also provide quarterly reports summarizing the results of data validation and what and how actions should be taken to meet the performance requirements set by FHWA.
Overview of Proprietary Data 19 The three criteria on which proposals were to be judged included technical competency, evidence of past performance, and costâprice, with each factor being weighted differently in the assessment. Technical competency was more critical than evidence of past performance, while the combined weight of these criteria approximately equaled that of price. Numerous studies and analyses have been conducted for various applications since the NPMRDS became available. FHWA has been applying NPMRDS data to produce the quarterly Urban Congestion Report that profiles the most recent congestion and reliability trends at the national and city level (FHWA 2017). Practice at I-95 Corridor Coalition The I-95 Corridor Coalition Vehicle Probe Project (VPP) began in 2006 with the goal of providing real-time traffic monitoring along the entire corridor rather than collecting data along discrete segments of the corridor. The initial RFP did not specify what technology should be used, but it instructed that the selected technology should meet the desired requirements for systems such as ATIS and ATMS. INRIX was awarded the contract for the VPP project in late 2007, with the initial launch in 2008. In 2009, data validation was finalized to ensure the quality of VPP data that were utilized (Young 2007). The RFP specified detailed data quality requirements, including accuracy, availability, latency, and granularity (University of Maryland College Park 2013). Bluetooth data were to be used as the ground truth to determine many of the data quality metrics. More specifically, the accuracy was reported based on average absolute speed error and speed error bias between the probe speed and Bluetooth speed. The data availability was determined as the percentage of uptime of the data service excluding the scheduled system maintenance and should be at least 99% of the time. The latency was defined as the time difference between the onset of a slowdown, according to the Bluetooth data and probe data. A slowdown was identified as when traffic speed drops 20 mph within 10 minutes, and the condition lasts for at least 15 minutes. The average data latency was calculated by averaging the latency of individual slowdowns identified in the validation data set. It was mandatory to have a maximum data latency less than or equal to 8 minutes on freeways and highly desirable to have a maximum data latency less than or equal to 5 minutes on freeways and 8 minutes on arterials. Finally, the spatial granularity was required to be 0.3 miles on urban roadways and 1 mile for rural freeways, with a required temporal granularity of at least 5 minutes. The RFP stipulated that data ownership would remain with the contractor, with the VPP retaining a perpetual right to use the data for purposes of internal applications and for archiving all the data. All data licensees were to sign a data use agreement. The RFP also specified that data licensees would work with the vendor to prevent their unauthorized use. Licensees would agree to prevent the alteration of restricted use notices, properly label data as proprietary information, and store and transfer data using mediums that âprovide reasonable protection against their unlawful copying and unauthorized access and use.â As a licensee of the data, users would not be permitted to sell or transfer the data to any party without making the vendor aware of the transaction. This would give the vendor an opportunity to prevent disclosure of proprietary data (I-95 Corridor Coalition 2015). To ensure the quality, timeliness, and consistency of travel time and speed data, VPP imple- mented an approach similar to the service-level agreement, which articulated that the payment would be tied to data quality and accuracy validation results. If the provided data met all the minimum quality requirements specified by VPP, full payment based on the fix-priced task order would be made to the contractor. However, if data failed to meet the requirements, payment would be reduced based on the Coalitionâs policy. The Coalition anticipates using a similar method for arterials, but research is still ongoing.
20 Practices on Acquiring Proprietary Data for Transportation Applications Extensive validation work has been performed by VPP. The 2009 validation summary report discussed the process for validating probe data (Haghani et al. 2009). First, the standard error of the mean was calculated for the ground truth data to create an uncertainty band around observations. Next, two measuresâthe average absolute speed error and the speed error biasâ were developed to compare VPP data to ground truth data. From 2009 to 2014, only INRIX data were used for monitoring. In 2014, efforts to validate the HERE and TomTom data sets began; at the same time the validation of HERE data began in Pennsylvania. It was the first reported validation of this kind. Past Studies Involving Data Procurement Private Sector Data for Performance Management Turner et al. (2011) conducted a synthesis study for FHWA that focused on the technical and institutional issues associated with using private-sector travel time and speed data for performance management. Important topics covered in the study pertaining to the data procurement are summarized as follows: â¢ Essential data elements that should be included in private-sector data for performance measurement were identified, including date and time stamp, roadway link identifier, roadway link length, and travel time and speed. To ensure consistency, they recommended that a standard definition of time be used to eliminate confusion in different time zones and during daylight saving time and that an attribute table be provided for link location references. â¢ Metadata providing supplementary information about primary data elements could be particularly useful in understanding how private-sector data are collected and processed. Example metadata elements included probe-vehicle sample size, travel time or speed standard deviation, confidence indicator, and gap-filling indicator. â¢ Data products and services provided by major private-sector vendors at the time of the study were gathered and presented. Identified vendors providing travel time and speed products included AirSage, ATRI, INRIX, HERE (previously NAVTEQ), TomTom, and TrafficCast. â¢ Although not directly intended for data procurement, the study outlined three data quality assurance methods that can provide helpful guidance on sample data validation during the procurement process. The first approach involves, principally, statistical analysis of the data and metadata to develop an overall understanding of data and to identify suspicious data points. This approach is the least costly option but often does not result in a definitive accuracy assessment. The second approach requires comparing private-sector data to trusted public-sector data (e.g., fixed-location sensor data). However, different data collecting mech- anisms and spatial segmentations may result in some uncertainty. The third and most costly approach is to install monitoring devices, such as Bluetooth readers, to collect benchmark data that can ensure high accuracy. The study suggested the use of this approach at locations that are most prone to have uncertainty in data accuracy and when there is no other data source available. â¢ Various data rights and legal issuesâincluding but not limited to data licensing, pricing, open records requests, and privacy issuesâwere also discussed. Similar issues will also be discussed in the next section of this chapter, with frequent reference to this study. NCHRP Project 70-01: Private-Sector Provision of Congestion Data NCHRP Project 70-01 recommended that agencies seeking third-party travel-time data adopt a competitive demonstration approach to procurement (Smith et al. 2007). The goal of this strategy is to foster competition among data service providers to minimize risks to agencies as
Overview of Proprietary Data 21 they gradually shift to licensing data and service from a third party. This approach consists of the following four steps: â¢ Step 1. Issue RFP. A comprehensive, detailed set of requirements should be included in the RFP that describe the data and services that transportation agencies intend to purchase. â¢ Step 2. Develop short list. Agencies should consider the following criteria when evaluating proposals: public-sector references; proposed cost structure; demonstrated ability to meet requirements; and demonstrated ability to provide a long-term, stable service. Vendors meeting an agencyâs criteria should be placed on a short list for further consideration. â¢ Step 3. Request competitive demonstration. Procuring agencies should ask short-listed vendors to provide sample traffic data as specified in the RFP and to demonstrate their ability to meet the spatial, temporal, and quality requirements. â¢ Step 4. Negotiate agreement. Agencies should reference other examples to determine if costs proposed by private vendors are fair and reasonable. Negotiations should endeavor to obtain a fair price for the data service. Pilot Procurement of Third-Party Traffic Data This study, conducted by California PATH, sought to acquire disaggregated GPS probe- vehicle data (Bayen et al. 2013). The final report summarized the past practice for procuring third-party data and includes the documents used for PATHâs data acquisition. The study recom- mended that before circulating an RFP, it is appropriate to first issue an RFI to collect and solicit information from the industry. The responses can then be used to refine a data request and develop targeted data specifications (e.g., spatial, temporal, quality, volume, and method of delivery). RFIs provide agencies the opportunity to ask questions about the methods vendors use to collect and process data, as well as their strategies for removing outliers from data. Since the PATH research team intended to obtain disaggregated speed data, it was specifically con- cerned with the possibility of an individual vehicleâs path being identified from the data and the measures taken by vendors to protect such information. An RFP should be carefully designed to attain a balance between scientific rigor and simplicity. It should be specific enough for procuring agencies to get the data they need but not so complex that it will discourage vendors from submitting proposals. When the PATH team issued its RFP, there was no pricing model available for disaggregated probe data. As a result, the team defined its own pricing model in the RFP based on the cost per highway segment for the duration of the contract. The criteria used for vendor selection included: (1) Management (previous experience and ability to meet the requirements); (2) Data information (quality, coverage, and amount); and (3) Technical aspects such as data collection experience, data procurement and validation expertise, and knowledge of traffic information systems. The goal of contract negotiation is to align the legal, regulatory, business, and technical requirements of a vendor and public agency. The agreement PATH struck gave it the ability to retain perpetual use rights for the data and combine the data with data from other sources. PATH agreed not to reveal the vendorâs data to its competitors. Based on the PATH teamâs experience, the report recommended that technical users of data meet with procurement staff early in the process so they have sufficient time to become familiar with the ongoing data services. Contracts should clearly articulate an exit strategy. A contract should explicitly state the procedure for terminating the agreement if the data or services received do not match what was requested in the RFP. Applying Archived Operations Data in Transportation Planning: A Primer While this study, conducted for FHWA, principally sought to outline guidance for transpor- tation agencies on the use of operations data for planning applications, the authors provided
Overview of Proprietary Data 23 data to anyone and for any purposeâ (Turner et al. 2011). These rights also extend to technical data (e.g., computer databases or software documentation). Conversely, the federal govern- ment enjoys limited rights to data if they embody trade secrets or are commercial or financial and confidential or privileged to the extent that they were developed with private funding (Turner et al. 2011). When the federal government possesses limited rights, it cannot legally release data to parties outside the government unless it reaches an agreement with the vendor. Public agencies, regardless of level of government, typically do not own nor can they freely distribute data unless they pay for the full cost of data collection. Accordingly, the report recommended establishing clear terms in procurement documents with regard to the circum- stances and conditions under which data can be released to and used by third parties. Open Records Laws Federal, state, and local public agencies in the market to license data from a private vendor must also grapple with questions with regard to their data usage rights and whether the data they acquire would be subject to open records laws or Freedom of Information Act requests. Many private vendors prefer restricting access to data they have collected, processed, and organized. However, depending on the jurisdiction, laws may dictate that proprietary dataâ even if their collection were not entirely paid for using government fundsâcan be inspected and used by the public. The study team reviewed the implications of open records laws for agencies seeking to license data from private vendors, focusing on a 2011 FHWA report and the open records laws in Arizona, Georgia, Kentucky, Ohio, and Wisconsinâstates that are discussed in Chapter 4 (Turner et al. 2011). The summary of state and federal laws is not exhaustive. While the information presented here is broadly representative of how propri- etary data are treated under federal, state, and local laws, agencies preparing to license data should thoroughly and independently assess the ramifications of state and local statutes, and, if necessary, seek legal counsel to determine whether data would be subject to release under open records laws. Turner et al. (2011) observed that there is considerable variation in how federal and state governments address open records laws and Freedom of Information Act requests. The report noted that some agencies have licensed proprietary data sets, and that proprietary data may be protected by law from public disclosure. To avoid wrestling directly with open records laws, some agencies have used outside contractors to aggregate data and produce summary reports for public distribution. The justification for this arrangement is thatâbecause the agency never possesses or controls the proprietary dataâit may be shielded from public requests. Some states are changing their laws so that this justification may no longer apply. Florida has recently enacted a law that renders all data processed through and handled by subcontractors working for state agencies as subject to open records requests. To present a fuller picture of how various states deal with open records laws when licensing proprietary data, the study team compiled information on public records laws in states that are the focus of case examples in Chapter 4 (Arnold 2010). Most states have laws on the books that grant the public the right to inspect public records. However, most also have affirmedâ through explicit statutes or judicial opinionâthat data classified as proprietary or a trade secret can be exempted from public records laws. Public records are open to inspection by any person in Arizona (Arizona Revised Statutes Â§ 39-121). If the custodian of records determines records should be withheld for any reason, the agency provides the requester with an index of records or categories of records that have been held back, along with the reason for their being withheld [Arizona Revised Statutes Â§ 39-121.01(D)(1)]. A person who is denied access to records may appeal the decision through
24 Practices on Acquiring Proprietary Data for Transportation Applications a special action in superior court. While the public can inspect bids and proposals after a con- tract has been awarded, a vendor has the option of designating material in their bid documents as trade secrets or as proprietary. If the vendor makes this designation, the State of Arizona preserves the confidentiality of bid documents and related material. Arizonaâs public records laws do not explicitly exempt trade secrets and proprietary information from public inspection, but state courts have granted exceptionsâfor example, when records are confidential or their release would be counter to the stateâs best interest. Trade secrets have also been protected by the confidentiality exception to disclosure. A 2009 opinion by the Arizona Supreme Court (Lake v. City of Phoenix) held that âif a public entity maintains a public record in an electronic format, then the electronic version, including any embedded metadata, is subject to disclosure under [Arizonaâs] public records laws.â Georgiaâs Open Records Act (Official Code of Georgia Annotated Â§ 50-18-70), which covers all state agencies and political subdivisions, holds that âthere is a strong presumption that public records should be made available for public inspection without delay.â Public records include documents, papers, letters, maps, books, tapes, photographs, computer-based or -generated information, and similar material that is prepared and maintained or received as part of a public office or an agencyâs operation [Official Code of Georgia Annotated Â§ 50-18-70(1)]. Bids and proposals likely fall under the Open Records Act, although trade secret protections may apply. There are several circumstances under which the Open Records Act does not apply. Most perti- nent to the release of proprietary data, the law exempts any trade secrets the state obtains from a person or business. If a vendor wishes to keep data protected under the trade secrets clause, the vendor must submit and attach to the records an affidavit affirming that the records constitute trade secrets. But if an agency determines information does not qualify as a trade secret, it will notify the party submitting the affidavit that it intends to disclose the information. In turn, an entity may petition a superior court to block the disclosure of records. Kentuckyâs Open Records Act (Kentucky Revised Statutes Â§ 61.870â61.884) states that all public records are open to public inspection unless otherwise specified in the statute. Bid and proposal information may be viewed after bids have been opened, although Kentucky law does not take a position on whether bids and proposals for other procurement methods constitute public records. Kentucky Revised Statute Â§ 61.878 specifies that records that are excluded from the application of Kentucky Revised Statutes Â§ 61.870â61.884âincluding records classified as confidential or proprietaryâcan be inspected if a court of competent jurisdiction grants access. Ohioâs public records can be inspected by any person (Ohio Revised Code Â§ 149.43). State law does not explicitly address whether competitive bids or proposals are subject to open records laws. However, competitive bids are opened in public. Documents may be exempted from this requirement if they include trade secrets. Likewise, Ohioâs Uniform Trade Secrets Act (Ohio Revised Code Â§ 1333.61âÂ§ 1333.69) prohibits the unauthorized disclosure of trade secrets, which encompass âinformation, including the whole or any portion or phase of any scientific or technical information, design, process, procedure, formula, pattern, compilation, program, device, method, technique, or improvement, or any business information or plans, financial information, or listing of names, addresses, or telephone numbersâ that both derive independent economic value from not being publicly available and are the subject of reasonable efforts under the circumstance to maintain its secrecy. Wisconsinâs Public Record Law (Wisconsin Statutes Â§ 19.31â19.39) establishes as standard complete public access to government records and documents. The state defines records as any materials on which information is recorded or preserved regardless of physical form or characteristics and that has been created or is preserved by an authority. State agencies deny access to records only in rare, exceptional circumstances. Bids and proposals likely qualify as records, although restrictions are placed on the release of trade secrets. The Wisconsin Supreme
Overview of Proprietary Data 25 Court has observed thatâalong with explicit provisions that limit the release of public recordsâ state courts have recognized other limitations on disclosure, âincluding the requirement that the harm to the public should be balanced against the benefit of disclosure to the publicâ (State v. Beaver Dam Area Development Corporation). Furthermore, Wisconsin Statute Â§ 19.31(5) states that authorities may withhold a record or any portion thereof if it contains materials that qualify as a trade secret under the Uniform Trade Secrets Act. The Wisconsin DOT worked through the legal implications of declaring proprietary vendor data as a trade secret when it licensed freight performance measurement data (see Chapter 4). Privacy Concerns When licensing proprietary data, privacy concerns are a final matter to consider. If the traffic data that a public agency intends to purchase, license, use, or disclose has been anony- mized, then the proposed transaction is unlikely to trigger legally recognized privacy rights under current law (Turner et al. 2011). Conversely, if an agency plans to license or purchase personally identifiable information, then constitutional, statutory, and common law privacy rights designed to guard against the intrusion created by the unwarranted and unauthorized distribution of personal information become implicated.