Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
26 This chapter discusses the practices transportation agencies use to acquire proprietary data. The findings are drawn primarily from survey responses submitted by representatives from state DOTs and MPOs. Findings from follow-up interviews and literature reviews are integrated to facilitate the discussion. The topics covered in this chapter include: â¢ Acquisition decisions pertaining to motivations, obstacles, budget, and DBP; â¢ Procurement process involving the development and issuance of RFPs; â¢ Use agreement pertaining to data-use restrictions, sharing policy, and legal concerns; â¢ Use experience with regard to a wide variety of data types discussed in Chapter 2; and â¢ Peer advice with regard to data acquisition and use. Acquisition Decision This section focuses on agenciesâ motivations and concerns with regard to proprietary data acquisition, as well as the roles DBPs and budgets play in acquiring data. Driving Factors As the previous chapter highlighted, the availability of proprietary data affords agencies unprecedented opportunities to meet their data and business needs. When asked about what the main driving factors are behind the decision to acquire proprietary data, 32 agencies responded. The summary of the responses broken down by different factors is illustrated in Figure 6. The top two driving factors were unmet data needs and new insights provided by the data. Each of these justifications had 17 responses (53.1%). An example is that transporta- tion agencies generally need to monitor transportation system performance but may lack the resources to collect traffic data on roadways other than major corridors, areas in which pro- prietary data prove very valuable. Similarly, agencies have had difficulty collecting truck data; this problem has been mitigatedâand, in some cases, resolvedâthrough the availability of proprietary trucking data. One example of the new insights agencies have generated through proprietary data is the use of crowdsourced cycling data to better understand bicycle use on the system and inform infrastructure improvement decisions. Meeting new program needs, achieving cost-effectiveness, and complying with new legal requirements were also common motivations for licensing proprietary data. The MAP-21 requirements appear to be a primary driver for some agencies to look to proprietary data for performance monitoring and reporting. Other factorsâsuch as the rich information provided by the proprietary data, knowledge of data suppliers, active engagement from vendors, and positive experience from other agenciesâcan also play an important role in spurring some agenciesâ data acquisition decisions. C H A P T E R 3 Practices on Proprietary Data Acquisition
Practices on Proprietary Data Acquisition 27 Main Concerns When asked about the aspects of proprietary data agencies find most concerning, 29 responses were received. Figure 7 summarizes the responses by different concerns. Data quality was the most frequently cited concern, with 23 responses (79.3%). This is understandable considering that agencies have only just begun to use proprietary data, and many of them are still working to verify and validate data. Another contributing factor is related to the lack of transparency in proprietary data with regard to the data sources and processing methodology used by vendors (Lemp 2017Aâ2017C). Also, because proprietary data are often crowdsourced, probe 1 1 1 2 2 3 6 17 17 Positive experience of other agencies Knowledge of suppliers and analysts Active engagement from vendor Detailed information provided by data Cost effectiveness of proprietary data Meets new legal requirements Meets new program needs New insight offered by data Unmet needs for data What are the main driving factors behind the decision to acquire the proprietary data? Figure 6. Driving factors behind agenciesâ decision to acquire proprietary data. What concerns does your agency have about acquiring proprietary data? 1 1 1 1 2 9 14 17 18 22 23 Ability to share data outside agency Longevity of firms offering products Staff time Staff changes Conditions of contracts Requirement on IT infrastructure Legal and privacy concerns Finding the right product to meet the needs Staff expertise Limited funding Data quality concerns Figure 7. Concerns agencies have with regard to proprietary data.
28 Practices on Acquiring Proprietary Data for Transportation Applications penetration rate plays a critical role in data quality. Some agencies expressed concerns about low sample sizes on roadways in lower functional classes. Funding the acquisition of proprietary data imposes a financial burden on agencies, and 22 respondents (75.9%) cited this circumstance as a main concern. Data licensing is a relatively new practice for many agencies, and it is critical to justify data purchasing by demonstrating their value and to gain internal support. Agencies have found data licensing to be particularly onerous because of funding restrictions. For instance, large purchases must go through legislative budget requests in Florida. At least 18 states (62.1%) mentioned that staff expertise was a main concern. Because the proprietary data use different data definitions and are often massive in size, data integration and analysis may be challenging. One example is that a respondent performs GIS integration with vendor-provided street maps each year. But when the local staff changes, familiarizing everyone with the integration process and determining who should be responsible for dealing with new acquisitions and archiving data can be cumbersome. Other sources of concern included the challenge of selecting data products appropriately matched to agency needs (17 respondents, or 58.6%), legal and privacy issues (14 respondents, or 48.3%), and the added IT infrastructure required to handle the immense data storage and analysis needs (9 respondents, or 31%). Less frequently noted concerns included the conditions imposed by contracts, the longevity of vendors, and the ability to share data with parties outside the agency. Budget When asked if agencies have an annual budget for the data acquisition, 29 states responded to the question. Of those, eight states (27.6%) responded with âYes.â Two respondents specifically indicated that their funding comes from the ITS or Operations budget. Nineteen (65.5%) states reported that they do not regularly allocate a specific portion of their budgets to the acquisition of proprietary data. The reasons for this omission vary among agencies. One respondent noted that their state places strict restrictions on data purchases. Often, purchases must be approved through the legislative budget process. Another explanation is that many agencies make only a few purchases, so they see little purpose in allocating a set amount for data. Several other respondents commented that data are acquired as needed and covered by specific project funding. One respondent indicated that their DOT pools resources across units and divisions for data purchases and established data-sharing agreements so that they can all enjoy access to data. Two states are currently developing annual budgets to maintain data purchases or subscriptions. Data Business Plan A DBP is an institutional plan that links agency business needs, programs, and processes to data products, services, and management systems. It helps an agency to identify its current and future data needs and to prioritize data investments accordingly. DBPs play an important role in coordinating available resources; facilitating data collection, processing, analysis, and sharing; and transitioning to more data-driven, transparent, decision-making models. The study team asked agencies whether they have a DBP in place and received 38 responses. As shown in Figure 8, six responding agencies (15.8%)âAlabama, California, Florida, Georgia, Minnesota and Oregonâhave DBPs. Fourteen agencies (36.8%) indicated that they are in the process of developing DPBs, while 18 agencies (47.4%) do not have DBPs or have no plans in place to prepare them.
Practices on Proprietary Data Acquisition 29 Oregon and Minnesotaâs DOTs provided the study team with their respective agencyâs DBPs. Both plans underscored proprietary data as an alternative to collecting and managing data in-house. The Oregon DBP acknowledges that âadvances in technology and emerging private- sector products provide new opportunitiesâ (Oregon Department of Transportation 2016). Minnesotaâs DBP suggests establishing âprocesses to regularly assess what data are needed, what data can be eliminated, what data can be provided internally and what data can be obtained from other public and private sector sourcesâ (Minnesota Department of Transportation 2011). Both agencies have begun working with proprietary data to gain new insights into travel patterns and conditions. Evaluations are still needed to identify the best uses of proprietary data. The DBPs contained no specific guidelines or instructions on the process of acquiring proprietary data. Nonetheless, the fact that these plans identified proprietary data as offering an alternative way to fill data gaps or meet business needs represents recognition and acceptance of proprietary data at the agency level. Procurement Method The procurement process plays an essential role in successfully acquiring the right proprietary data to meet an agencyâs business needs. This section provides details on procurement methods adopted by transportation agencies for acquiring proprietary data. Issuing RFPs The survey asked respondents to comment on which parties are involved with data acquisi- tion. Responses with regard to 53 data sets were received. Thirty-seven data acquisitions (70%) were handled directly by state DOTs. Ten data acquisitions (19%) were completed through con- sultants (including universities). Both the state DOT and a consultant were involved in acqui- sition of the remaining data sets. The responses also indicate that at least 28 RFPs were issued by agencies, while at least seven RFPs were issued by consultants. Some of the data sets were acquired without issuing an RFP. Yes 16% In Development 37% No 47% Does your agency have a data business plan? Figure 8. Agencies with DBPs.
Practices on Proprietary Data Acquisition 31 Requirement Utah Missouri Ohio Michigan Wisconsin Kentucky Arizona Data Quality Accuracy: At least 90% accurate or have a maximum error rate of 10%; Availability: At least 90% available between 6 a.m. and 10 p.m.; Quality: fitness of data for all purposes that require such data Accuracy: Required for vehicle flows exceeding 500 vphc; Availability: At least 99%; Reliability: At least 95% of all segments at all required time- reporting intervals Accuracy: Plus or minus 4 mph or more accurate; Availability: Must be available for a minimum of 99% of the time for a billing period. In a given day, at least 97% of data must be provided. Accuracy: Absolute average speed error and average speed error for each speed bucket; Availability: At least 99.5%; Completeness: At least 95% of all segments at all required time-reporting intervals Availability: Based on specified resolution (may separate by day and night, car and truck); System coverage: Based on roadway functional classification Data Archive Shall be allowed to archive real- time data by DOT DOT has the right to indefinitely archive data; vendor provides a web-based archiving service DOT reserves the right to archive speed data All data provided shall be available for archiving to be used in future DOT uses DOT retains the right to archive and use the data perpetually for analysis and research purposes Jurisdictions should have the ability to archive all real-time traffic data for future use Integration Not required, but desirable for vendors to provide information with regard to their past data integration services, including timeliness and level of effort for these services The offeror shall describe the typical manner in which data may be integrated into ATMS and DOT- sanctioned websites The contractor is required to integrate real- time data from DOT MVDSd in existing formats or schemas into the contractorâs real-time data feed The vendor shall provide services to assist with integrating real- time traffic data into the current Statewide Traffic Operations Center TransSuite ATMS for DOT and its members The offeror shall describe data standards and how data can be integrated with Arizona DOT; Maricopa County DOT; and local applications, such as travel-time dissemination aXML = Extensible Markup Language. bCSV = comma-separated values. cvph = vehicles per hour. dMVDS = microwave vehicle-detection system. Coverage Statewide Select roadways Statewide Freeways Roadways and routes surrounding the I-39 construction area Statewide Statewide Data Manipulation Not be a blend of live and historical data nor historical data only No estimates or projections based on historical data; vendor may integrate DOT sensor data to supplement existing data feeds Allowed Allowed, but reflected by the status flag Must not be modeled to fill the gaps when and where no probe data exist Predictive traffic data along with the historical data and assurance of the prediction quality Table 2. (Continued).
32 Practices on Acquiring Proprietary Data for Transportation Applications as well as data delivery format. The data were to be acquired for the whole state and needed to be in 1-minute intervals with 1- and 3-minute update frequencies during the 5 a.m.â9 p.m. and 9 p.m.â5 a.m. periods, respectively. A copy of the RFP can be found in Appendix D2. The Arizona DOT utilized a somewhat different approach in that its RFP enumerated all the data services the agency was interested in and asked vendors to provide information on whether and how they plan to offer those services. A complete copy of the Arizona RFP can be found in Appendix D4. Third-party web-based applications have received increasing interests among transportation agencies for data analytics and storage. Five of the seven states sought such applications in RFPs. The two remaining states leveraged resources at universities. Agencies had different requirements related to how or whether gaps in data sets are to be filled. Three agencies specifically asked that imputed, modeled, or historical data not be included in the provided data. Three other agencies allowed data imputation but held that it should only be done through careful interpretation or verification. Specifying requirements on data integration is another important aspect of RFPs for propri- etary data acquisition. Private-sector firms often use their own data and network definitions that are very different from those used by transportation agencies. This makes it a necessity for agencies to integrate proprietary data with an agency-maintained data system. Yet, many agencies find it difficult to perform data integration, which usually requires intensive efforts and staff resources. To deal with this issue, two states explicitly required vendor assistance with data integration, while three other states asked vendors to provide instructions on data integration and a narrative of the time and resource commitments needed to accomplish it. Data validation and service evaluation The most common concern with regard to pro- prietary data among state DOTs is data quality. As such, RFPs often mandate data validation as part of the contract. The Utah DOT asked vendors to provide documentation on the processes and tools used to validate data. The Michigan DOTâs RFP specified that it retained the right to select which data would be used for validation and that the vendor must cooperate with data validation, whether it was performed by the DOT or by an independent contractor working on behalf of the agency. Furthermore, similar to the service-level agreement concept used in I-95 Corridor Coalition RFP, Missouri and Ohioâs DOTs specified the requirements of routine data inspections and evaluations. If the data inspections and evaluations produced unsatis- factory results, the agencies reserved the right to reduce their payments or even terminate the contracts. Agencies also required vendor presentations and on-site demonstrations to clarify proposals, evaluate whether the vendor has the requisite technical capabilities to deliver the proposed data services, and determine whether the proposed data services or solutions meet agency business needs. Cost proposal There is no universal pricing model for proprietary data acquisition. Models vary across vendors, and agencies also have different preferences. Four pricing models were identified in Private Sector Data for Performance Management, which are based on mileage covered, population, number of users, or a percentage of analysis cost (Turner et al. 2011). Among the RFPs reviewed by the study team, the Arizona DOT required vendors to submit a detailed cost model for statewide, regional, and corridor data, while KYTC asked vendors to provide a fixed price but also encouraged vendors to submit alternative financial proposals. Given that the cost proposals of different vendors can be structured in dissimilar ways, some agencies prefer to define specific requirements for cost proposals up front in RFPs so that they
Practices on Proprietary Data Acquisition 33 have a common basis to compare proposals. The Utah DOT requested a cost proposal with separate costs for each item, including real-time data, historical data, analytics tool, and pro- fessional service. Real-time data were further partitioned into two groups (i.e., interstates, other freeways and expressways, and other principal arterials in one group and minor arterials and minor collectors in the other group). The Missouri DOT required cost breakdowns by start-up cost and recurring subscription, which is similar to the practice used by the I-95 Corridor Coalition. Vendors received a pricing table (Table 3) to structure their cost proposals. The pricing per centerline mile model (Table 4) was also used in case the agency decided to only purchase portions of a data set given the budget available to them. To avoid repeating the acquisition process every year or every few years, state agencies often reserve the option to extend negotiated contracts into future years. To ensure transparency and eliminate any disagreement on pricing, agencies require submission of cost proposals for future years. For example, in addition to an itemized budget for the first year, the Missouri DOT requested that vendors submit estimated costs for each subsequent year, out to the fifth year. The agency instructed vendors to specify the maximum cost increase or decrease (in percentage) for the renewal periods based on the original contract period prices. The Wisconsin DOT provided vendors a table (Table 5) so they could include price estimates for multiple items for the five optional annual renewals. Table 3. Missouri DOT template by start-up and recurring cost. Pricing per Centerline Mile Total Centerline Miles Subscribe To Miles 0â100 100â200 200â300 300+ Start-Up (one-time) $_____ per mile $_____ per mile $_____ per mile $_____ per mile Recurring Subscription $_____ per mile per month $_____ per mile per month $_____ per mile per month $_____ per mile per month Table 4. Missouri DOT template by cost per mile.
34 Practices on Acquiring Proprietary Data for Transportation Applications Vendor and product evaluation Twenty-seven states responded to the question with regard to whether the agency has formal guidelines to evaluate data products and vendors. At least 19 states (70.4%) have formal guidelines (Figure 9). One agency expressed interest in developing a guidance for evaluating proprietary data contracting, data-sharing agreements, and determining whether the time and effort required to pursue a pilot with new firms can be justified. The evaluation criteria used by agencies have many commonalities. Typical criteria include data quality, vendor experience and qualifications, use restrictions, and cost. Table 6 summa- rizes information from RFPs on the criteria that seven agencies used for vendor and product evaluation. The percentage or points contained in parentheses for the first five states indicate aSTOC = Statewide Traffic Operations Center. Table 5. Wisconsin DOT cost proposal including future renewals. Yes 70% No 30% Does your agency have formal guidelines to evaluate vendors and products? Figure 9. Product and vendor evaluation guidelines.
36 Practices on Acquiring Proprietary Data for Transportation Applications License Agreement This section summarizes the survey responses on handling licensing and legal issues, including use and sharing restrictions and open records and privacy concerns. Use Restriction Most respondents stated that their agencies have no restrictions imposed on data use for the applications specified in agreements. In a handful of states, the agreements would restrict uses to particular applications and projects. For instance, one response indicated that the agencyâs speed data were solely for the purpose of highway performance management. Another response pointed out that truck data could only be used for monitoring and assessing truck travel patterns and truck trip modeling. Similarly, most respondents could not cite any applications for which their agencies wanted to use proprietary data but were prohibited from doing so by licensing restrictions. This issue goes hand in hand with use restriction specifications in the agreement; many states agreed to terms that impose no restrictions on their use. The only exception was Florida, as the agency is not allowed to use acquired digital data for map visualizations, network development, and federal submissions. With respect to data licenses, all agencies had obtained perpetual data licenses except for one state, which indicated that their licenses are not perpetual. Most states did not face any restriction on the number of users who can access data. There are, however, restrictions often placed on access to analytics tools or cloud applications. Even so, the number of authorized users in these scenarios is sufficiently high that the restrictions have not constituted a burden. Data-Sharing Policy Sharing agreements for raw data are usually more restrictive than those for derivative works. The amount of raw data that can be sharedâand with whomâvaries by agreement. In most cases, raw data can be shared with public agencies, contractors, and universities, as long as they are part of the agreement or sign a user agreement indicating that they will abide by the contract. However, raw data typically cannot be shared with the general public. Among the responding agencies, the Atlanta Regional Commission is the only one that has signed a contract that allows it to share raw data with the public (see Chapter 4). Some agreements mandated that raw data cannot be shared with groups or people not affiliated with the licensing agency; a few agreements restricted access to individuals working on projects specified in the user agreement. For example, one respondent indicated that their agency could not share data with other MPOs or local government agencies, which greatly limited the dataâs utility. This example underscores just how important it is for agencies to negotiate agreements with private vendors with the most favorable terms possible. Typically, there are few restrictions on sharing derivative works or aggregated results. Most respondents indicated these data can be shared within the agency, with groups outside the agency, and the general public. Open Records Laws and Privacy Concerns Half of the respondents said that their agencies have not had any experience with open records requests for proprietary data. However, a few respondents described how their agency would process this type of request. Two agencies placed explicit terms in the contract related to
Practices on Proprietary Data Acquisition 37 open records requests. Under the terms of these contracts, the agencies are to notify the vendor when open records requests are received and the vendor would be responsible for taking action, such as defending its right in state court to preserve the confidentiality of its data. Three states indicated that they would not maintain data records that require non-disclosure agreements. Instead, they would acquire data through third parties (e.g., universities and contractors). One respondent mentioned the passage of state legislation that places sub- contractors under the open records law, which effectively ended the practice of acquiring data through a third party. Another respondent indicated that their agency does not collect, receive, or maintain the raw data from vendors and is able to fulfill the requirements of open records laws by sharing aggregated reports or project analyses. Two respondents said their agencies would refer open records requests to their legal offices, which handle any possible conflicts between non-disclosure agreements with data vendors and the requirement of open records laws at the state level. Sixty-five percent of the respondents indicated that privacy issues have not been a source of concern because there is no personally identifiable or confidential information in their data. Two respondents said their agencies would refer cases over privacy issues to their legal depart- ments were they to arise. The remaining respondents said that they would not disclose data with identifying information or that access to and management of such data would be restricted to very few agency personnel to preclude disclosure. Use Experience and Caveats This section discusses the experiences of agencies using proprietary data. More specifically, it focuses on reported data uses and applications, caveats with regard to data, and overall satisfaction with data. Table 7 provides a summary of data vendors being used and typical data uses. Speed or Travel-Time Data Speed or travel-time data is the most common data item purchased by agencies, as they are used in a wide range of applications. At least 20 states have acquired speed data, with four states purchasing data from more than one vendor. Uses vary and are contingent on whether the data are real time or historical. Table 7 shows various cases of how speed or travel-time data are currently being put to use by transportation agencies. Respondents gave the speed or travel-time data an average Satisfaction rating of 8.2 out of 10, indicating great enhancements to existing applications. The Virginia DOT represents a typical practice with use of real-time and historical data. It acquires TMC-based real-time data and INRIX high-definition network-based real-time data. The high-definition network has finer spatial granularity at a shorter link level and broader coverage compared to the TMC-based network. Both data feeds refresh every minute. These data are archived and aggregatedâusing the RITIS tool for TMC data and iPeMS tool for high-definition dataâfor further analysis. Speed data have been incorporated into numerous facets of the Virginia DOTâs operations, including posting travel time to DMS, populating web maps on 511Virginia.org, conducting before-and-after studies to assess project impacts, and generating performance measures. Data use has also expanded to the Smart Scale initiative, a project-rating process adopted by Virginia DOT for project selection. Speed data contributes to congestion and travel-time reliability metrics. Survey responses and follow-up phone interviews mentioned several caveats with regard to speed or travel-time data. Data coverage is generally sparser on arterials, collectors, and local
38 Practices on Acquiring Proprietary Data for Transportation Applications roads because there are fewer probe vehicles sampled than on freeways. Some agencies may choose a gap-filling option when observed speeds are unavailable, but using imputed speeds may cause unexpected results. For instance, during a road closure event when actual speeds are not available, the imputed speeds may not reflect actual traffic conditions. Latency issues with probe data have been reported, as well. In 2014, Kim and Coifman evaluated 2 months of probe data from a private vendor on an interstate corridor against loop detector data and found that the probe data tended to lag the loop detector data by almost 6 minutes. Sharma et al. (2017) reported that the average latency of probe data was about 5 minutes compared to fixed-location sensor data, with latencies varying by corridor. Those findings have very important implications for time-sensitive applications, such as traffic responsive ramp metering or queue warnings. Probe-vehicle data are aggregated for each highway segment; not individual lanes. For urban freeways with high-occupancy vehicle (HOV) lanes, these data cannot distinguish speeds on HOV lanes from general-purpose lanes. As such, probe-vehicle data have limited application for the evaluation of HOV operations and other managed-lanes strategies. The integration of proprietary data, which is attached to a proprietary network whose segmentation differs from the networks maintained by state and local agencies, is often cited as a significant challenge. Although vendor networks contain a wide range of information, many critical attributes needed to generate performance measuresâsuch as volumeâare not Data Type Reported Vendors Typical Uses Speed or Travel Time Data HERE, INRIX, TomTom, RITIS, Iteris Real time: Traveler information system, including DMS and 511; queue detection and warning; variable speed limit Historical: performance measures, demand model calibration and validation, corridor study, work zone analysis, project prioritization, traffic incident management, speed zoning O-D Data AirSage, INRIX, StreetLight O-D analysis, demand model calibration and validation, turning movement analysis, special event travel behavior, detour planning Freight and Truck Data ATRI, Polk, Transearch, Waybill Development and validation of freight models, corridor study, performance measurement, fleet breakdown analysis Crowdsourced Incident Data Waze Traffic incident notification, slow speed notification, 511 system, traffic incident management, hurricane evacuation Non-Motorized Travel Data Strava Identification of optimal bike counter locations, bike use on the system, countywide planning and programming, safety risk-factor analysis Digital Maps and Aerial Imagery ESRI, FleetRoute, Google, Maponics, NAVTEQ, Onterra Spatial analysis, information sharing, vehicle routing, preliminary engineering, project delivery, outreach Socioeconomic Data Chainstore Guide, Dun & Bradstreet, IHS Global Insight, Infogroup, InfoUSA, TREDIS, Woods & Poole Development of travel models, long-range statewide planning, demographic analysis, revenue forecasting models Table 7. Data, vendors, and typical uses.
Practices on Proprietary Data Acquisition 39 available from them and must be obtained through the Highway Performance Monitoring System (HPMS) or other agency-maintained databases. Therefore, a need exists when proprietary data are licensed to integrate the vendor network with the existing state network so that the speed data, volume, and other inventory data can be combined and made available across the network. Four disparities between vendor and agency networks can hinder conflation: differences in linear reference systems, segmentation definitions, coverage levels, and geometries. The Kentucky Transportation Center (KTC), in a project conducted for KYTC, developed a procedure for network conflation (Green et al. 2013). Quality assurance and quality control were performed to identify and correct any mismatched segments. Daneshgar et al. (2018) devised a workflow to conflate the HPMS used by Maryland State Highway Administration and a private vendorâs TMC network. They used an iterative procedure to identify overlapping segments from the vendorâs network for each HPMS segment and determine their associated percentages on the HPMS segment of interest. Manual checks were also needed for segments that may have had erroneous results as flagged by predefined criteria. Additional concerns make network integration an onerous task. If a vendor makes periodic updates to their networks, the conflated network will need to be updated, as well. Different vendors may have different standards and practices for metadata and network segmentation; thus, if an agency wants to switch vendors, it will have to dedicate time to understand the data and updating or redoing integration. Because integration requires extensive GIS knowledge and tools, this task proves especially challenging for agencies, cities, and municipalities with limited GIS resources and assets. Three survey respondents indicated that their agencies perform integration in-house, whereas the remaining agencies sought assistance from consultants, universities, or vendors. OriginâDestination Data Proprietary O-D data are increasingly used by transportation agencies for traffic movement analysis, as well as for the development, calibration, and validation of travel-demand models. According to the survey results, at least 15 states have acquired this type of O-D data from major vendors, such as AirSage and StreetLight. Despite the limitations of crowdsourced O-D data (discussed later), such data generate many insights into travel choices not possible through the use of travel surveys and traditional data collection methods. Respondents gave the O-D data an average rating of 8.3 out of 10, indicating considerable enhancements to existing applications. The respondents also observed that their agencies have been mostly satisfied or very satisfied with proprietary O-D data, based on costâbenefit analysis. One respondent expressed a neutral attitude toward proprietary O-D data, commenting that the agencyâs sole use has been for travel-demand model development. Because many agencies are in the early stages of using proprietary O-D data, research has focused on developing a better understanding of them. Venkatanarayana and Fontaine (2018) compared the quality of StreetLight O-D data to benchmark data collected through Bluetooth and automated license plate readers at four study sites. Various performance indicatorsâsuch as percentage different, percentage of missing data, and trends in factoring ratioâwere used to assess the data quality. Because of concerns over the comparability of the benchmark data, the studyâs observations on the accuracy were not conclusive. Ohio State University investigated whether proprietary O-D data can effectively replace or complement traditional cordon surveys (Miller et al. 2016). Researchers evaluated AirSage and StreetLight data with a benchmark O-D table for Allen County, Ohio. The overall goodness of fit using absolute error measures was relatively poor for externalâexternal flows, while relative error measures suggested better fit, implying the patterns of externalâexternal flows
40 Practices on Acquiring Proprietary Data for Transportation Applications from proprietary data and Ohio DOT data were similar. For externalâinternal and internalâ external flows, both absolute and relative measures suggested a poor fit, implying that proprietary O-D data did not confirm the trip distribution pattern manifested in Ohio DOT data. The Ohio DOT data tended to locate a high percentage of traffic in the urban area, whereas proprietary data showed a high concentration of traffic along major highways. One limitation of crowdsourced O-D data is the absence of traveler and trip characteristics. They also lack information on trip purpose, vehicle type, and vehicle occupancy. Relying entirely on these data is also problematic because navigational GPS data come from in-vehicle navigation systems, which are more likely to be installed in newer, more expensive vehicles. As a result, navigational GPS data are demographically biased toward travelers who drive these vehicles. Figure 10 illustrates this bias. It compares O-D data from the 2015 Ohio statewide model to 2016 LBS data and GPS data. The bias of GPS-based O-D information toward high income areas is evident. Additionally, these data are difficult to validate as there is no ground truth to verify the accuracy or representativeness of data. Several other caveats were also noted. First, data must be processed carefully to remove biases and to be properly expanded before being applied to a travel-demand model. Second, depending on the data source, O-D data may be unable to differentiate route choice if parallel roads exist. One respondent also indicated that current data are not able to provide pedestrian and cyclist travel information. In-vehicle GPS devices are the main source of probe data. Although these systems are precise, they are mostly installed in either newer or higher-end vehicles, skewing data on passenger trips because of the overrepresentation of high-income drivers. Likewise, larger commercial vehicle operationsâcompared to smaller local carriersâuse more semi-trucks on which GPS devices are installed. Accordingly, interstates are overrepresented, and local roads are underrepresented. Furthermore, data are collected only from carriers that supply data to the data vendor (in this case, INRIX). Location-based services from GPS-enabled phones also carry limitations. They only transmit data when the phoneâs GPS function is enabled and in use. Although the result- ing data are spatially precise, coverage is sometimes sparse, and it is not possible to distinguish cars from commercial vehicles. Nevertheless, there would be less demographic bias because of Figure 10. Ohio DOT study showing potential income bias (TAZ = travel analysis zone) (Giaimo 2017A).
Practices on Proprietary Data Acquisition 41 widespread market penetration of smartphone devices, and it is possible to infer home and work locations because of long-term device persistence. A final challenge is trip-length bias. With no scientific or experimental design underpinning data collection, longer trips are potentially overrepresented in the data sets. Freight and Truck Data At least 12 agencies have purchased freight data from private vendors. Overall satisfaction was relatively high, with respondents commenting that the data enhanced their agencyâs appli- cations. Freight data received an average rating of 7.6. Since truck data are generally hard to acquire, proprietary data offer a good alternative to in-house data collection. The data have been used to develop and validate freight-demand models, to develop freight plans, to conduct freight bottleneck studies, to determine the impact of roadway projects and closures on freight, and for other applications. Depending on the use restrictions negotiated with vendors, the data acquired by some states are restricted to certain applications, thus reducing the dataâs utility. For instance, Arkansas and Tennesseeâs DOTs cannot access raw data, receiving only post-processed data from their consultants. As with O-D data crowdsourced from personal vehicle navigation systems, the data quality of freight- and truck-specific data acquired through GPS devices depends on the sample size and whether there is any bias toward certain carriers and source data providers. Analysis carried out by the Ohio DOT indicated that the trucks of interstate carriers tend to be overrepresented because they are more likely to be equipped with GPS devices. Trucks operated by smaller, more local firmsâwhich use mostly non-freeway routesâare likely to be underrepresented. In addition, not all trucking firms carry devices that are part of the original source for such data products. Figure 11 compares truck trips in the Ohio statewide model to truck GPS data. This clearly shows the absence of UPS trucks in the data set. Respondents noted other issues, including the sharing policy that restricts the extent to which data can be shared outside an agency. One respondent indicated that their agency cannot store data for future uses. Data quality is also a concern, with some agencies finding inconsistent results compared to the FAF and a few noticeable commodity errors that required alteration. Some agencies also expressed concern over the cost of proprietary freight data. Figure 11. Ohio DOT study showing potential bias with trucks (Giaimo 2017A).
42 Practices on Acquiring Proprietary Data for Transportation Applications Crowdsourced Incident Data A minimum of nine states have acquired and used crowdsourced incident data to complement real-time traveler information. The utility of incident data was rated as a 7.5, indicating that they offer a good enhancement to existing applications. Agency experiences with crowdsourced incident data have not been uniformly positive, however. One respondent said that their agency was dissatisfied with the data because it was very localized and concentrated in metro areas. Another respondent commented that the data are of limited use because of sparse coverage on arterials and inconsistency found with other validation data. Other agencies have been very satisfied with the data. The Iowa DOT integrated crowdsourced data into its TMC through email notifications and made it available internally for various GIS applications. The availability of such data has enabled the Iowa DOT to respond to incidents more quickly when there are no camera coverages. About 12% of their initial notifications in the TMC have been from Waze alerts. Pennsylvania has also integrated the Waze data into its traffic management center to help make decisions with regard to the dispatch of safety patrols. Crowdsourced Non-Motorized Travel Data While the availability of pedestrian and bicycle data has been limited historically, agencies need such data to make informed decisions about infrastructure investments. Crowdsourced data help agencies better understand the location of popular routes on the network, O-D, and trips durations, as well as factors that influence cyclistsâ decision making. At least four states reported acquiring cycling data from Strava, a technology company thatâthrough its mobile applicationâallows users to track and upload their cycling, running, and swimming activities. The data have been used by agencies to develop safety risk factors for bicycles, document bicycle use on the system, identify optimal locations for bike counters, perform corridor studies, and assist in countywide planning and programming. Respondents rated the utility of these data as a 7 based on two responses, suggesting that the data offer enhancements to agency applications but may have some limitations. One response indicated that in some cases, sample sizes may not be sufficiently large to draw valid conclusions. Bicycle count data are heavily skewed toward men and younger individuals and are typically more concentrated in cities and higher-income neighborhoods. Because the data are processed using GIS, this could pose a challenge for agencies with limited GIS resources. One respondent indicated that their agency is having difficulty finding applica- tions for the data. Other Data Socioeconomic data At least nine agencies have purchased socioeconomic data from private vendors. Overall, they have been satisfied with the data, giving it an average rating of 8.1. Respondents said that the data have performed up to expectations in applications such as travel and econometric models. Socioeconomic data agreements carry fewer use restrictions compared to other data types, according to survey responses. Accordingly, agencies are using the data in multiple applications throughout their states. One agency has used the data to supplement population and employment data in rural areas and develop control totals for future land-use decisions. One caveat associated with socioeconomic data is that they are developed by third parties without significant oversight. Nor are vendors transparent about their data generation. This makes it challenging to validate data if questions arise. Digital maps and aerial imagery At least six agencies have procured street maps and imagery data from private vendors. Respondents were inclined to endorse these data, giving
Practices on Proprietary Data Acquisition 43 them an average utility rating of 8.1. Licensing or purchasing these data from a vendor elimi- nates the burden on agencies to collect and maintain data in-house. One potential drawback of these data is the additional effort or cost needed to convert imagery projections to achieve consistency with agency standards. Before purchasing or licensing imagery, an agency should also inspect the imagery quality to ensure it meets its needs. In addition, licensing agreements for digital maps may not allow agencies to integrate maps with their own inventory network and submit the integrated product to a federal program. Agencies should be aware of this restriction. Peer Advice The study team asked respondentsâbased on respondentsâ previous experiences acquiring and using proprietary dataâto offer advice to peer agencies interested in procuring data from vendors. This section summarizes their responses, with recommendations split into four categories: Legislative and Institutional Support, Staffing, Procurement, and Data Uses. Legislative and Institutional Support â¢ One respondent remarked that states would benefit if legislatures revisited and revised existing laws so that agencies could take better advantage of emerging data sources and more easily navigate issues with intellectual property rights. For example, before the adoption of H.B. 369 on May 10, 2016, the Utah DOT could not gather or use crowdsourced data that were collected based on personally identifiable information. Previously, the Florida DOT outsourced the procurement of third-party data to contractors because they would not be subject to the stateâs open records laws. However, new laws make Florida DOT subcontractors subject to open records laws. As a result, this practice is no longer used widely. â¢ Agencies will benefit from establishing procedures to facilitate proprietary data acquisition and their applications. The Oregon DOT plans to develop guidance on proprietary data contracting, data-sharing agreements, and assessments focused on whether the agency should conduct pilot projects with new firms based on the time and effort required. Staffing â¢ Having staff with expertise in the types of data being acquired and their potential applications is invaluable. One respondent noted that staffing changes prevented their agency from rapidly executing a contract to acquire data. â¢ Agencies can benefit from having experts in data analytics on staff because such knowledge is critical for validating and processing data, which is particularly important during product and vendor selection. Procurement â¢ One respondent urged agencies to thoroughly prepare for procurement by determining their data needs and identifying funding sourcesâincluding federal sourcesâto license or purchase data. â¢ It is important for agency procurement departments to circulate RFPs as widely as possible to reach a broad audience and to ensure a competitive bidding process. â¢ Two respondents commented on the importance of carefully attending to the terms and conditions laid out in contracts and user agreements. Contractual language is often complex and difficult to follow, which can foster misinterpretations by involved parties. To reduce the
44 Practices on Acquiring Proprietary Data for Transportation Applications likelihood of misinterpretations, agencies will benefit from involving their legal departments in the contracting process. â¢ Agencies should allocate sufficient time for solicitation, contract negotiations, and data integration, as delays can potentially occur during any stage of the procurement process. â¢ If city agencies and MPOs want to acquire data but lack the staff or resources to manage procurement, they will benefit from working with a state DOT. The state DOT could assume responsibility for issuing RFPs and contract negotiations, making sure to include the city agencies and MPOs in the final user agreement. This arrangement would relieve smaller entities of the logistical and managerial burdens inherent to procurement. But successful collaborations demand coordination and robust communication between all of the involved stakeholders. â¢ Before committing to a licensing or purchasing agreement, agencies should request sample data sets from vendors selected as finalists. These samples can be used for data suitability analysis and quality checks. â¢ If an agency decides to purchase an analytical tool, it should arrange for training to familiarize prospective users with its capabilities. One respondent said that while their agencyâs first RFP emphasized data at the expense of analytical tools, future RFPs would place a greater priority on those tools. Agencies should invest time to understand the proper balance of data and analytical tools necessary to meet their requirements before developing RFPs. Data Uses â¢ Agencies should not underestimate the amount of time and resources they need to perform data integration. It is a very time-consuming task. â¢ Several respondents advised that agencies should ask vendors whether they can modify their products to reduce the amount of preparatory data cleaning that is required before data can be used by the agency. â¢ It is important to do as much outreach as possible with internal partners to build additional use cases. Summary This chapter summarized the findings of a survey that asked agencies to comment on their experiences licensing or purchasing proprietary data, the acquisition process, and data use. Forty-two state agencies and three MPOs took part in the survey. Most agencies have found a wide variety of uses and applications for proprietary data. Agencies use similar procedures for developing and issuing RFPs, evaluating proposals and vendor quality, validating data, and negotiating contracts. Overall, experiences with proprietary data have been positive and encouraging, although a small number of respondents said that their agencies were discouraged by either the data quality or by use agreements that were too restrictive. Reflections of agency respondents on their past procurements were summarized and presented as peer advice. The next chapter builds on the high-level summary data presented here by offering five detailed agency case examples.