Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
88 Appendix C â Telephone Interview Findings The purpose of the telephone interviews was to enable the research team to learn more about a specific technology project, the data collected, and the way in which the data were being managed by the project team or supporting consultants/vendors. In all, 11 telephone interviews were conducted with individuals from the following agencies: â¡ City of Las Vegas â¡ City of Los Angeles Department of Transportation (LADOT) â¡ City of Portland Bureau of Transportation (PBOT) â¡ Delaware Department of Transportation (DelDOT) â¡ District of Columbia Department of Transportation (DDOT) â¡ Georgia Department of Transportation (GDOT) â¡ Indiana Department of Transportation (INDOT) â¡ Kentucky Transportation Cabinet (KYTC) â¡ Los Angeles County Metropolitan Transportation Authority (LA Metro) â¡ Texas Department of Transportation (TxDOT) â¡ Utah Department of Transportation (UDOT) A summary of each individual interview is provided herein. C.4 City of Las Vegas The city of Las Vegas has five separate projects that have all been deployed and are in the operational phase. These projects include the Las Vegas Connected Vehicle Pilot, the AAA Driverless Shuttle, LiDAR Pedestrian Safety Analytics, Stationary Traffic Volume Count Sensors, and a set of traffic and environmental sensors for the Connected Corridors project. To the greatest extent possible, all five projects share a common data structure and fall under the same data management leadership structure. These five projects operate under a citywide data management plan and open data policy. This policy is revisited and updated annually based on feedback from a steering committee. In accordance with this open data policy, the city of Las Vegas maintains an open data portal containing over 135 data sets that are freely available to the public. While the open data sets can be accessed anonymously with no logins required, the data portal also hosts some number of private data sets that can only be accessed by logging in as a registered user who has been granted specific access to that file by the dataâs owner. One important factor in why the city of Las Vegas has been so successful in executing their open data policy is the strong support from executive leadership. The previous city manager was a big proponent of open data who told each department director to appoint an Open Data Coordinator (ODC). This ODC was responsible for organizing their departmentâs efforts to publish as much of their data to the public as possible and coordinated directly with IT to anonymize and secure each dataset. Because of this top- down support structure the interviewees report that there were no major administrative hurdles to implementing the cityâs open data policy. Las Vegas currently operates a hybrid system architecture, storing some data locally and some data in the cloud. Because they work with multiple vendors, a wide variety of data types are stored to accommodate the vendorâs systems and software. The data workflow ranges from small Excel files sent
89 to the system each week by the automated shuttle vendor to very large amounts of streaming data from LiDAR and other sensors. One improvement in progress for the city is to move towards edge computing to better handle the large volume of data and improve data streaming speeds. Two challenges arise out of the sheer quantity of data that Las Vegas handles: organizing the data and ensuring the quality of the data. The city tackles the organization challenge by requiring each dataset provider to include useful metadata and descriptive information that conforms to a specific format. This not only makes the data more usable but is posted on the open data platform itself to make the data searchable and identifiable to the various users accessing the site. Similarly, ensuring the quality of the data is the responsibility of the data provider. This approach sacrifices control and active monitoring of the data quality and should be avoided especially if decisions are being made based on data (without any idea of the quality of the data). C.10 City of Los Angeles Department of Transportation (LADOT) LADOT is a large organization with a history of data innovation in the field of transportation. A primary goal of their data team is to create a strong data foundation that enables decision-makers to see the full picture of what is happening on city streets. Recent efforts towards this goal include building a culture of data awareness, pursuing transportation technology initiatives, and growing a newly created Bureau of Transportation Technology. They maintain detailed, publicly available documentation online that details their technology action plan, data protection principles, and other procedures. One key part of these efforts is building the right team, both through hiring new personnel and retraining existing personnel. To improve hiring, they are looking at revising job and talent classifications to accelerate the hiring process and to ensure they are bringing in modern skillsets. When third-party contractors are brought on board a project, LADOT seeks opportunities for knowledge sharing between the contractors and their own data science team. They also grow their team internally, moving six positions from the IT group to the data science group during a recent re-organization. LADOTâs data team serves many individual groups that deal with data at varying levels of technology. For example, their scooter data are accessible on a cloud-based structure hosted by Amazon and managed by a database team in the IT Department. Meanwhile, their taxi group stores and distributes their data reports on CD-ROMs. They are planning on moving a lot of their transportation data to an open data portal, but first want to fully understand and implement tested procedures for aggregating and anonymizing the data. LADOT carefully monitors their third-party data providers to ensure technical compliance and operational compliance with their service level agreements (SLAs). For technical compliance they monitor the uptime of the API, the functionality of the data feed, and the reasonableness of the data coming from that feed. Operational compliance is manually tested in the field to see if the data match real-world conditions. For example, to validate that a third-party scooter companyâs data feed is functional they will use that companyâs app to take a scooter ride around the block then check to see if that ride is reflected in the streaming API data. They also have crowdsourced some of the verification process by allowing citizens to file scooter-related service requests in the cityâs 311 mobile app. These service reports are then used to verify the data coming from the API.
90 Like many transportation agencies, LADOT has taken a particular focus on protecting privacy in the data they collect and analyze. This can involve a balancing act where they need to clearly communicate to the public about what data are collected, how they are collected, and whether or not they are openly accessible while not revealing too much about the nature of the sensitive data. The department pursues rich data that includes origin and destination, which they then fully utilize internally in a protected and confidential manner. They would like to share as much of these data as they can but must be very methodical in determining what information is appropriate to share. They are currently exploring options involving some level of obfuscation of the data, after noting the success that other cities have had in openly publishing anonymized scooter data. One important contribution made by LADOT is the creation of the Mobility Data Specification (MDS). Now an open source project maintained by the Open Mobility Foundation, the MDS provides a way for regulatory agencies to ingest data from mobility service providers and express regulation in machine- readable formats. This is accomplished via a set of API endpoints that can be used in concert with each other or individually, according to the needs of the agency (Mobility Data Specification Project, n.d.). C.6 City of Portland Bureau of Transportation (PBOT) Building on a Smart City Challenge grant, the city of Portland is pursuing several projects that are in various stages of development. These projects range from a network of 200 roadside sensors on high crash corridors that has recently been implemented to an automated vehicles project that is still in the early stages of development. In developing these projects, the city has found that its traditional implementation of local SQL servers has proven insufficient to meet evolving data needs, and they are working towards building a cloud-based data lake for support moving forward. They also have faced non-technical challenges such as contract negotiation and enforcement, organization-wide data knowledge, and balancing data usability versus privacy risk. Supporting their SQL server RDBMS implementation proved costly as their data needs grew. They reported having to constantly obtain new servers just to retain the data that they already had. This led to a sense that they would need to upgrade their storage technology before they could react to the new technologies that were emerging each week. To obtain approval to build a new cloud-based data lake, they put great effort into defining the needs, demonstrating a solid plan on how it would be used/implemented, and clearly outlining the cost savings of using a cloud-based approach. Portland works with several different vendors to collect and analyze data, and those relationships have not always been successes. For example, one partner sent them the data sporadically in a difficult to use spreadsheet form that often suffered from data quality issues. To avoid these situations in the future, Portland now focuses on strictly enforcing data availability in their contracts, requiring partners to provide API feeds to the data, and issuing fines to companies for every 24 hours of data unavailability though those platforms. Another roadblock identified was the lack of data knowledge among employees in the organization. Because there are only a handful of people that understand data, there are many needs that go unaddressed or new tools whose implementations are delayed by red tape. One example of this is when the team requested an upgrade from Tableau Desktop to Tableau Online. This one change took over a year before it was approved. Within the organization as a whole, there is a widespread desire for data, but there is no widespread knowledge of what it takes to obtain, process, and use data.
91 When faced with the decision of gathering more raw data for analytics use or restricting the amount of data obtained to avoid PII, the city of Portland has strongly chosen to avoid PII risk. To accomplish this, video data gathered from sensors are deleted within one second of recording, and vendors are requested to aggregate the data they send to the city to keep it from becoming identifiable in some way. This approach dramatically reduces their exposure to handling PII; however, they now have no way of reliably verifying that their video sensors are performing as intended because they delete the video footage. They remain concerned that this approach is inadequate because they cannot truly benefit from their data in its current form. C.9 Delaware Department of Transportation (DelDOT) DelDOT, recognizing the critical importance of data to its transportation management program, has deployed sensors to collect a wide variety of data including traffic volume, occupancy, and weather data. They focus especially on the collection of real-time data and have been adding both roadside units (RSUs) and onboard units (OBUs) specifically for their connected vehicle efforts. They soon realized that all these sensors were generating so much data that no human being could keep up with it, so with the help of an Advanced Transportation and Congestion Management Technologies Deployment grant (ATCMTD) they are making a big push towards using machine learning techniques to analyze the large amounts of data streaming into their systems. DelDOT has found that not only do they need advanced techniques to analyze their torrent of data, they need modern techniques to store it as well. To this end, they are progressing towards moving their data from local data silos into an online cloud-based data lake environment using the Google Cloud Platform (GCP). They intend to leverage this platform to merge their disparate data sources into a single repository where end users can gather all necessary data, as opposed to the current state where data must often be manually pulled from multiple sources and joined before being useful. That being said, DelDOT will still prioritize keeping some mission-critical data stored locally in an effort to preserve data availability. That way, if some crisis arises, they are not reliant on connecting to a cloud provider; they have critical data closely held, readily accessible, and with locally stored redundancies. There are even efforts being made to build an integrated fiber network with the Pennsylvania and New Jersey Departments of Transportation so that the DelDOT team could relocate to a different state and still perform critical functions in an emergency. DelDOTâs data development efforts straddle the line between the performance of the cloud and the emergency availability of local storage. On the one hand, they are taking advantage of GCPâs available tool sets, which not only have a much lower maintenance burden than custom-built local apps, but are also expected to be richer and quicker than on-premise tools. On the other hand, they also plan to continue development on a core set of local tools that are sufficient to maintain critical operations in an emergency setting. An overall goal with their data development is to create tools that help the business teams answer their own questions directly from the data. As a culture of data awareness grows within DelDOT, these business teams are making more and more requests of the data team. Some of these requests are for raw data, some are for finished visualizations or analysis, but they all take time away from developing software. The quantity of data itself can be very intimidating, so it is key to create tools that are both powerful enough to work with that data while also being accessible enough to non-technical users.
92 DelDOT participates in an âopen data council,â and like many other agencies is working through the process of deciding what exactly their long-term open data policy will be. In the meantime, they are known to provide their data streams for free to partner agencies and other researchers who ask for them. To manage data quality, DelDOT has set a minimum level of reliability and will flag data that fall outside of valid ranges. They have found that the third-party data providers they have partnered with do not tend to do any quality analysis themselves, so they are inherently skeptical of that incoming data until they are able to validate the quality themselves. Because they work with such massive data sets, they are able to discard data that is overly noisy or problematic without losing predictive power from the data. DelDOT has learned that maintaining a good working relationship with their centralized IT department is critical to their efforts in building a mature big data practice. While centralizing IT personnel can create cost savings and other efficiencies, it can also create additional layers of bureaucracy. The rapid evolution of technology and agile development priorities of the data team do not always align with the priorities of the IT department, for example. Only after developing strong communication and securing buy-in from the right individuals was DelDOTâs data team able to make the progress they have made. C.11 District of Columbia Department of Transportation (DDOT) DDOT is managing several emerging technology projects, including the District Mobility Project, a dockless bike and scooter program, and some work with connected vehicle projects. The District Mobility Project collects performance measures across all methods of transportation to get a visual representation of congestion and user experience for drivers, pedestrians, and bus riders. For its dockless bike and scooter program, DDOT partners with SharedStreets to collect and store the vast amount of tracking data that it requires from vendors such as Uber, Lyft, and Via. Although these data come from several different vendors they are standardized to a common geographic reference, allowing visualizations to be built and analyses to be performed for all vendors at once (District Mobility: Multimodal Transportation in the District, n.d.). Their connected vehicle efforts include data from 60 adaptive signals that they make available to partners via an API. While the use of the SharedStreets data is a success story, DDOT has encountered challenges working with some third-party datasets. The department purchased 2 years of INRIX Trips data that had the potential to add vehicle trajectories to their current transportation reporting; however, the department did not know how to make the best use of the over one billion waypoints. Furthermore, the sheer size of the dataset made it difficult for their SQL architecture to handle it, leading them to conclude that the next solution will probably need to be cloud-based. DDOT is currently building a centralized data inventory that will catalog both the data that they already have as well as new data that are becoming available. DDOT maintains high standards and enforces strict requirements on all data they collect. Data must be fully anonymized, enriched with a standard set of metadata features, and have some way to validate the quality. The department frequently rejects new third-party data sources due to their inability to verify data quality. DDOT makes significant effort to offer access to its data to those who ask for it. While they do not have a separate open data policy document, their general Data Policy (Mayorâs Order 2017-115) is available
93 online and includes detailed policy on data access and sharing. Their goal is to eventually implement a subscription-based model for sharing their data. While some other agencies are primarily pursuing real-time data, DDOT has found most of its success in historical data analysis. This analysis could be general performance measure visualizations to support high-level decision-making or could be as specific as investigating curbside regulations for shared mobility services. DDOT primarily uses GIS, Tableau, and Python for this historical data analysis. DDOT reports that one of the biggest challenges they face is building the technology to support big data. Spending time developing their processes and having a clear approach to long-term decision-making have both proven useful to DDOT as they build their own big data architecture. While they currently rely on local servers and shared drives, they are working to establish a data lake. Another reported challenge is successfully negotiating favorable terms with third-party vendors. There is a natural inclination for elected officials or other leaders to engage in conversations and agreements with new âinnovativeâ companies. DDOT has found that including the data team early in the negotiating process helps keep the engagement focused on specific business needs. Creating a list of questions that the organization wants to have answered has also helped DDOT ensure that new data sets are acquired for their worth to relevant analysis and not just because they are new. C.1 Georgia Department of Transportation (GDOT) GDOT first put their connected vehicle project into operation in June 2018 with DSRC equipment installed at 54 intersections that interact with 50 different smart vehicles. They plan to continue growing this project over time, with a goal of installing 1,700 roadside units by June 2020. GDOT already sees a direct benefit from this project in the form of data dashboards and hopes to become a reliable provider of traffic data for universities and private corporations. Due to the highly technical nature of the project, GDOT is implementing their project with the assistance of a contractor. In this relationship GDOT owns all data collected; however, the collection and management of that data are handled entirely by the contractor. The first time GDOT interacts with the data is through a visual dashboard created by making calls to an API provided by the contractor. This API provides access to connected vehicle data that have already been fully cleaned and processed. During the data collection and processing, the contractor stores the data on the cloud in NoSQL databases. Any data that GDOT decides to keep after retrieving it from the provided API is stored on- premise in a traditional RDBMS schema on a SQL server. One challenge they have encountered with this local storage is that at the time of the interview, the connected vehicle data had not yet been integrated with the DOTâs data backup plans and procedures. This led to a situation where the SQL server hosting the data crashed, and it took 2 weeks to recover the database. As the project is still growing, there are several areas that GDOT is targeting for improvement both on their end and on the contractorâs end. For the DOTâs part, they would like to enhance their data sharing technology beyond the simple FTP style data export they have now as they do not see it as being sustainable for the amount of data they will be working with. They also would like to set up a big data environment; however, the DOTâs IT Department wishes to avoid cloud storage due to security concerns and any on-premise architecture capabilities will require a great deal of effort to implement.
94 GDOT would like to see improvements in data quality capabilities, as currently there are unresolved data quality issues with the GPS data, yet there were no data quality metrics or dashboards available at the time of the interview. GDOT is also highly concerned with securing the data against attack and safeguarding PII during the collection process. Both areas are considered by GDOT to be the sole responsibility of the contractor. In the interview GDOT identified two ongoing challenges that they face as they grow their connected vehicle program: legal team concerns over data sharing and on-premise IT security. The legal team has advocated extreme caution when pursuing their open data objectives, as the legal repercussions of inadvertently broadcasting PII or a data breach can be severe. There is also some concern over the size of the DOTâs IT Security team who must safeguard any data stored locally on the DOT SQL server against attack in addition to their existing workload. C.2 Indiana Department of Transportation (INDOT) INDOT is considered a leader in transportation data management and decision support systems. They have created several data dashboards based off data gathered from a wide variety of sources that are actively used both inside and outside of INDOT. Data sources include INRIX data, crowdsourced data, and will soon include Waze data once contract negotiations between INDOT and Waze are complete. One key factor in INDOTâs success is that they retain sufficient engineering resources in-house that they can be involved in developing, assessing, and improving nearly all their own software. When they encounter some new problem, they will regularly engage a local university to partner on an applied research project to solve that problem. At the end of the project, they are left with a useful piece of software or technique that they will then provide as an open source solution to other interested agencies. Over time this approach has resulted in a library of useful code and expertise that benefits not just INDOT but their partners as well. In addition to this sharing of developed software, INDOT also sees themselves as a provider and distributor of data. Once a day they will upload the data to a state-wide database hosted by the Indiana Office of Technology where it is widely shared and accessed. In addition to this, INDOT has built several APIs whereby authorized users can pull data hosted on their own data warehouse. These APIs do not directly access the database itself, but rather a limited purpose-based access to a view of the data. This is to prevent rogue queries from impacting performance, preserving the availability of the database. Their internal data warehouse is built on a hybrid architecture with a preference for hosting as much data on the cloud as possible. Everything in the data warehouse is open source, and the data are stored in a PostgreSQL RDBMS schema. By investing in a cluster of high-speed solid-state drives to store the data INDOT was able to realize improved read/write performance on this relational database system. INDOT currently stores its data in a hybrid data warehouse architecture. These data are backed up directly by INDOT. Most of these data are also uploaded daily to a second database managed by the Indiana Office of Technology, who also performs regular backups. At this time INDOT has not deleted any data from this data warehouse, and there are plans regarding what data to delete and what data to move to long-term archival storage when the time comes. INDOT maintains full personal control over all data that it works with, although some data sources do have limitations on how INDOT is able to distribute the data to other stakeholders. Access to the data is
95 made easy through the availability of purpose-built APIs, though to protect the speed and integrity of the database, nearly all access is limited to only a particular view of the data rather than accessing the entire database itself. Though INDOTâs data management is perhaps more advanced than most, there are still some common challenges as they maintain and expand upon their capabilities. Cybersecurity is an ongoing concern, and although they could not share details, they are actively working on improving their security posture and mitigating known vulnerabilities. Contracting with partners must also be done with care for INDOT to maintain full ownership and use of the raw data while keeping their data products open source and as publicly shareable as possible. In the future privacy will also become a greater concern. INDOT has been able to avoid gathering or storing any PII thus far; however, as data sources continue to expand, they anticipate that they will be unable to avoid such information forever and are already making plans on how to anonymize and protect it. C.3 Kentucky Transportation Cabinet (KYTC) KYTCâs approach to emerging transportation technology data has been to understand the data before deploying the technologies. Therefore, the focus of the interview with KYTC was on the challenges and successes experienced in building a big data system and practice. Their journey to a big data system began in 2014 after becoming part of the Waze Connected Citizen Program (CCP). In the years since, KYTC has gone from having no understanding of data from an IT perspective to successfully building a data pipeline that combines 14 data sources, produces 12 automated outputs, and creates actionable insights, and it is an ongoing effort to continuously improve, integrate, and tweak the data coming in to make it work. KYTC overcame various technical and administrative challenges by having a clear data goal in mind, using the right tools and the right partnerships, and demonstrating value to data stakeholders. KYTC had been gathering snow and ice data from automatic vehicle location (AVL) devices, which included useful information including (e.g., location, heading, speed, plow status), and they wanted to combine the AVL data with Waze data. One significant barrier to doing so was that the AVL location data were expressed in terms of road and mile markers, while the Waze data used latitude/longitude coordinates. Initially they attempted to address this issue by geo-shaping line segments with start and end mile markers, but that approach led to mismatched locations when the data were entered into a dashboard. Converting the values into a range solved that data mismatch issue; however, the data visualization tool didnât support ranges, so they now keep track of location data as standardized arrays. It is important to note that in addition to these arrays, they still keep the location range, start mile marker, and end mile marker values in the data for compatibility purposes. This approach allows data users to employ a variety of tools without worrying about mismatched formats. The business requirement for the data along Kentuckyâs 27,000 miles of roadway is to be able to view the data for any 1/10th mile roadway segment for any 2-minute period and to provide that data openly to the public as much as possible. To accomplish this involves some feature engineering where they keep the original timestamp but generate an additional field that rounds the Coordinated Universal Time (UTC) timestamp to the nearest 2-minute interval. The data are currently accessed in one of two data stores. The Elastic data store supports the use of the data visualization tool, Kibana, and serves as an entry point for users to explore and play with the data. The second is in a Hadoop Distributed File System (HDFS) data store that supports more specific Hadoop-related tools and was built to support the
96 most advanced data users. As part of their efforts to decentralize access to data, they hope to install a free version of Microsoftâs Power BI Desktop on the work machines of all 3,900 potential users at KYTC. These installations would connect to the existing Elastic data store by way of a newly implemented Open Database Communication (ODBC) driver and would allow everyday users easy access to the data at low or no cost. Perhaps the biggest barrier to getting their data stores built was administrative in nature. In fact, very few people understood the technology, and even the IT department did not want an HDFS data store built due to the open source nature of Hadoop and a preference for using vendor-supported hardware implementations (to reduce troubleshooting and keep people producing). This has been a constant battle, as only 6 months after successfully implementing a Hadoop cluster, the IT department lobbied to get rid of it. Only the departmentâs reliance on vital snow and ice data on the cluster saved it. Today the HDFS cluster has grown to 25 nodes, storing a total of 9 TB of data (3TB of data replicated three times to guard against data loss). A popular data product enabled by the cluster is automated email alerting every 60 minutes associated with the monitoring of snow and ice conditions in 120 counties. Each email alert includes a hyperlink to a Kibana dashboard that is pre-filtered to focus on the county that triggered the alert. Data products such as these not only help protect the infrastructure, they drive growth as other groups within the organization gain real use from and therefore understand the value of the data. Building trust took time, but now multiple colleagues have expressed excitement over the data products, especially when those products include analysis that was impossible to achieve within their own data silos. Hiring for the project was another roadblock that had to be overcome. The data team was originally a three-person âshadow IT departmentâ within the organization, which was focused on getting data from other departments copied from their Oracle and SQL servers and onto the growing HDFS data store. When the team included an experienced big data engineer, they were able to make swift progress, building the bulk of their data lake in only 6 months. When this employee moved to the private sector, he/she was replaced with a project manager and business analyst with different skillsets and little understanding of data. It then took 18 months before they were able to hire new developers and recreate what they already had. This highlights the value in hiring the appropriate skillsets for big data work and retaining that talent long-term wherever possible. As of now KYTC collects both processed and unprocessed data from 14 incoming data feeds: AVL, dynamic message signs, HERE, iCones, Mesonet, road weather information systems, TriMarc, Twitter, Waze alerts, Waze jams, traffic operations center reports, county weather, internal crowdsourcing, and incident detection. At this point, all data are stored on-premise, although they are considering migrating to cloud storage after the consolidated IT Department recently approved cloud storage. The IT department collects significant monthly fees for server rental and storage, and the delay in approving cloud storage has been partly attributed to a fear of the IT department losing that revenue. While there are additional data sources of interest given more financial resources, KYTC has been very successful with the data they currently have leading to 12 automated outputs in use today. Working with data vendors, KYTC has had valuable learning experiences that have changed how they approach contracts. For example, KYTC receives a consolidated data feed every 10 seconds from an AVL vendor. For the first 3 years of this relationship, KYTC was using the vendor-provided dashboard to view
97 the data and performance measures. More recently, KYTC obtained access to an API that allowed them to pull the raw data into their data lake. When they were able to see the data firsthand (not simply the aggregated data in a dashboard), they became aware of data quality issues. As a result, KYTC is resolved to not enter into a contract with a data provider unless they can negotiate direct access to the raw data. As an example, KYTC is beginning to deploy DSRC units in the field. The contract negotiated with the vendor is that they will have direct access to the raw data coming from the units. While challenges will continue as the data management process grows and matures, but KYTC recommends to first start building an understanding of the data. Once an organization understands how big data works, then new data sources become far easier to plug into the existing system. C.5 Los Angeles County Metropolitan Transportation Authority (LA Metro) LA Metro worked with Via to provide a Mobility on Demand (MOD) service offering first mile/last mile transportation starting January 2019. Under this program, Via provides shared rides to and from bus rapid transit (BRT), heavy rail, and light rail stations within a 6 square mile service area. Riders pay for the service using a credit/debit card and a rideshare app, although special accommodations are made for riders without cell phones and riders with disabilities. Via collects, cleans, aggregates, and processes the raw data before using them to update an online dashboard. LA Metro has full access to the dashboard and processed data but does not have access to the raw unaggregated data. Via does not share raw data access out of a concern over the disclosure of trade secrets. LA Metro plans to store this aggregated data for at least five years to remain compliant with the departmentâs data retention policies. Pending a final agreement, Via will handle the storage, backup, security, and quality of the data. LA Metro is working towards building a data lake, but that data lake is not specifically tied to this project nor is it expected to host the applicable data. Researchers and LA Metro employees will have password- restricted access to the Via provided database, and that data will already be scrubbed of all PII. Via will be obligated to provide performance related KPIs, but no explicitly outlined agreements to provide data quality KPIs, dashboards, or other data quality monitoring capabilities will be in the contract. One major challenge that LA Metro encountered was negotiating a contract. Very few employees at the department had any experience with highly technical IT contracts, so it took time working with their legal counsel to navigate all the details. The indemnification of LA Metro regarding legal liability for the data was a particularly difficult sticking point, and it took time to clearly communicate to their partner how important it was for LA Metro to not be held legally liable for security breaches or privacy violations for database servers that they had no control over. LA Metro noted that while both sides wanted ownership and control of the data, neither wanted to assume the associated risks of ownership. Under the agreed upon contract, user log-in data and credit card information is only stored on Via's servers, LA Metro itself will not have access to this PII. This reduces liability for LA Metro at the cost of accepting limited data access. Via also monitors data quality per its SLA and must meet or exceed explicitly outlined key performance indicators (KPIs) to fulfill their contractual obligations. The base-level analysis for which the system was designed is performed by Via and incorporated directly into the dashboards provided. Any additional data analysis requires copying the data from Via's servers using the provided API before work can be done on it. The dashboard data are updated once a week via
98 a batch process, making streaming analysis of the data impossible. The software used to provide the data is proprietary to Via. Obtaining access to the data for additional users is a time-intensive process that typically involves agreeing to a separate nondisclosure agreement with Via. Soliciting cooperation from internal data owners and other stakeholders to support this new initiative proved challenging. Because LA Metro did not have any internal policy or technology to handle data sharing, they had to coordinate individual tasks with each data silo owner, many of whom had concerns about sharing their data with an outside company. Having an institutional data lake architecture or unified data management procedures would have greatly expedited this process. C.7 Texas Department of Transportation (TxDOT) TxDOT has several emerging transportation projects underway, including the Connected Freight Corridor Project, which is in the planning phase, and the I-35 Connected Work Zone Project, which was being implemented at the time of the interview. The Connected Freight Corridor Project is a collaborative effort between TxDOT and the freight industry to deploy connected vehicle technology to over 1,000 commercial vehicles that will improve traveler information, asset condition management, and system performance. The I-35 Connected Work Zone Project expands upon the Texas component of the USDOTâs Freight Advanced Traveler Information System (FRATIS) to improve in-vehicle messaging for commercial vehicles via cellular and DSRC technology. TxDOT collects a wide variety of data for these projects, including INRIX data, cellular data, AVL probe data, Bluetooth data, and data from basic safety messages (BSMs). For the I-35 Connected Work Zone project, there are nine RSUs installed in construction areas on I-35 between Austin and Dallas. Currently these data are managed by the various districts, though eventually the goal is to centrally collect and archive this RSU data. One improvement option they are considering is whether to make use of âsmart RSUsâ that could aggregate their data across 30 second intervals before sending it on to the TMC, thus reducing the processing load downstream. One challenge that TxDOT discovered when collaborating with private industry stakeholders is managing the sharing of information between partners. Every freight company has their own data in their own format that they consider proprietary and do not wish to disclose, especially to competing firms. TxDOT is obliged by the Public Information Act to account for this business sensitive data to some extent and has a Security Credential Management System (SCMS) in place to ensure privacy. To support these projects, TxDOT is moving from on-premise storage towards secure cloud storage. Currently data are stored in data silos hosted by various departments within the DOT. These data are stored on SQL servers and are backed up once a day. The goal is to de-silo and migrate these data to a cloud-based data lake within the next 4 years. This data lake will include a metadata catalog and may use a structure more appropriate for the geospatial data; however, they are still in the early planning phases. TxDOT has measures in place to ensure the quality and security of the data; however, as the unified data lake architecture and related procedures are still in the planning phases, the final list of policies and procedures is not yet fully developed. For example, TxDOT ensures that media access control (MAC) addresses for Bluetooth devices are anonymized in the data they collect; however, at this time different vendors use different anonymizing algorithms rather than following a single governed process. For data
99 quality concerns there is already a screening process in place to identify and remove poor quality data, but it is still being improved and expanded as the projects mature and develop. These examples in data quality and data security are specific elements of a broader challenge facing TxDOT as they move forward: establishing and formalizing a set of data standards. There are many separate projects and initiatives that are being simultaneously pursued within TxDOT, each with their own data needs and existing data structure. Unifying these projects and tools takes a significant effort both from a technology standpoint and from an institutional standpoint. C.8 Utah Department of Transportation (UDOT) UDOT has successfully run a fully operational connected vehicle program for signal timing and pre- emption since November 2017. This program has been applied in at least three areas: a standard bus route, a BRT system, and several snowplow routes. UDOT also works with researchers from Brigham Young University (BYU) to test and evaluate different signal change request thresholds related to their connected vehicle applications. They currently have data archives and dashboards available and are entering into a potential 5-year contract with Panasonic to further develop a fullyfeatured cloud-based connected vehicle analytics platform. For both the standard bus routes and BRT routes, the primary objective in using their connected vehicle system is to maximize schedule reliability while minimizing split failures. When a bus that is far enough behind schedule approaches an intersection, it will automatically request signal priority. This process is fully automated and entirely transparent to the bus driver. At a threshold of 5 minutes, that is to say when buses only request priority once they are 5 or more minutes behind schedule, UDOT found that equipped buses were requesting priority 15-20% of the time. This approach yielded a 6% increase in schedule reliability during peak hours while only causing an increase in split failures once every 2-3 days. UDOT is collaborating with BYU to test alternative threshold settings to find the optimal configuration for their routes. These thresholds are both timing based, such as requesting priority when a bus is only 3 minutes late rather than 5 minutes late, and passenger load based, such as only requesting priority when a bus has at least nine passengers. One helpful approach employed in this research is the use of mobile dedicated short-range communication (DSRC) units. Rather than installing their four test DSRC units into the buses themselves, the researchers created four suitcase-size mobile DSRC units. These mobile units can be placed on any bus at the start of the day and retrieved at the end of the day to easily gather data from a variety of routes. These thresholds are only being tested and applied to buses; they are not used at all for snowplows. The way that snowplows interact with equipped intersections is much simpler: if a vehicle is actively plowing the road and approaches an intersection it will not simply request priority, it will preempt the signal and force a green light. Due to how much more effective snowplows are when able to plow uninterrupted and the need to get roads clear as fast as possible, UDOT sees no need to try and balance snowplow signal priority with cross-traffic impact. UDOT maintains general documentation online as easily accessible pdf files. UDOT preserves all source data collected, and contracts with a third party to perform data quality filters for them when the data are used for analysis. The creation of these filters is a manual process that does not include the use of
100 automated data quality monitoring or visualizing. Their data are secured against unauthorized or anonymous use, however granting access to the data is a fast and painless process. UDOT works with at least three specialized vendors and collaborates with other states to develop new software and analytical processes for their data. These processes never destroy or irrevocably alter the source data. UDOTâs new cloud data platform will include an open development environment when it is completed. UDOTâs CV projects do not currently collect any sensitive data or PII, so they have no methods developed yet to obscure or separate these data. They do have plans to implement secure data collection processes as part of the rollout of their new cloud data platform, however those processes are yet to be implemented. UDOT incorporates data analytics that run against the data without the need for large-scale copying to a separate system. Their analysis handles some streaming data, but most data are stored in traditional RDBMS. As of this writing UDOT's CV program data are stored and processed locally. They have so far been able to archive everything, and the data they work with is naturally anonymous so there are no complex data privacy concerns have been encountered. Over the next 5 years, UDOT plans to move to a cloud- based solution that is being built in partnership with Panasonic. This cloud-based system will store, manage, and analyze UDOTâs connected vehicle data in near real-time so that actionable results from these analyses can be used in operational dashboards and by the vehicles themselves.