National Academies Press: OpenBook

Data Sharing Guidance for Public Transit Agencies—Now and in the Future (2020)

Chapter: Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing

« Previous: Chapter 2 - Guidance
Page 20
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 20
Page 21
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 21
Page 22
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 22
Page 23
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 23
Page 24
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 24
Page 25
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 25
Page 26
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 26
Page 27
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 27
Page 28
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 28
Page 29
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 29
Page 30
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 30
Page 31
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 31
Page 32
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 32
Page 33
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 33
Page 34
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 34
Page 35
Suggested Citation:"Chapter 3 - Factors Impacting Transit Agency Decisions about Data Sharing." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 35

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

CHAPTER 3 Factors Impacting Transit Agency Decisions about Data Sharing As described in Chapter 1, transit agencies share data with researchers, private companies, other public agencies, and the broader public for a variety of reasons. The interviews conducted and information gathered in this research effort revealed that, when transit agencies decide whether to share their data, who to share it with, and which model to use to share it, they consider several factors. Public transit agencies are motivated to share their data by diverse expected benefits, such as transparency and innovation, but they also evaluate risks and consider the costs of preparing data to be shared. Legislation around data sharing and data privacy underlies these decisions. 3.1 Benefits The benefits associated with sharing data are wide-ranging and can be difficult to quantify. Transit agency interviewees frequently commented on the need for methods to assess the value of data and particularly the value of sharing data. Transparency and Increased Awareness of Transit Services According to the transit agency interviewees, the general public expects that public agencies publish data in free and open formats. Recent years have seen an emphasis on transparency in government and public agencies, precipitated from the federal level—a 2009 Office A Pilot to Increase Transit Visibility of Management and Budget memo encouraged transparency and In a recent pilot program, Denver Transit prompted local governments to develop open data portals. partnered with Uber to integrate transit Publishing data helps transit agencies meet this transparency goal, service information for the City of Denver which can positively impact public perception. Two of the transit agency into the Uber app. Residents in Denver interviewees identified transparency as a reason for sharing data. can use the ridesharing company’s app In addition to transparency, data sharing can serve to publicize the to plan trips on Uber and transit, and transit agency, and may even encourage citizen engagement (Kassen they can also buy transit tickets from 2013). Increased awareness of transit services was identified as one within the app. The pilot program aimed benefit of GTFS data sharing in a recent study (Schweiger 2015). One to integrate multiple mobility service transit agency interviewee discussed how their agency’s open data alternatives (including transit, bikes, spurred engaging online content and helped “build the agency brand.” scooters, and ridesharing) into one app to help reduce residents’ dependence on cars (Bosselman 2019). It also increases Innovation and Research the visibility of the transit agency’s service In a report on the value of data, Abella et al. (2017) identify the to ridesharing riders. ability of data to spur innovation as a key benefit of data sharing. The innovation impact of open data was pinpointed in the context 20

Factors Impacting Transit Agency Decisions about Data Sharing   21   of public transit route, schedule, and vehicle location data in TCRP Synthesis 115: Open Data: Challenges and Opportunities for Transit Agencies (Schweiger 2015) and also in a report on the value of Transport for London’s (TfL’s) open data (Deloitte 2017). Across the United States and abroad, private developers have responded to open streams of public transit route, schedule, and vehicle data by developing travel apps that provide trip planning and Value Generated by Open Data vehicle arrival information to customers. Transit agency interviewees The gross value added to the economy commented on the ability of external partners to innovate in quick- from companies that develop apps using changing contexts, such as app development. Open route, schedule, TfL’s open data was estimated to be and vehicle location data have also led to the development of addi- between £12 and £15 million ($13 to tional open-source resources, including products such as OpenTransit $18 million) and to directly support Indicators, which calculates performance indicators from this data, and approximately 500 jobs (Deloitte 2017). TransitWand, a tool for collecting route and schedule data in the field (Lawson 2016). Standardization is a key factor that has increased the innovative impacts of route, schedule, and vehicle location. The GTFS has been widely adopted. (Data in this format is available for more than 1,350 public transportation providers as of August 2019.) The standardized format means that innovative tools and products that utilize GTFS can easily be applied across transit agencies. This increases the potential return on investment for innovators and enables sharing of innovations across transit agencies. In addition to innovative products, data sharing supports public transit research. Nearly all the transit agency interviewees discussed the benefit of external research conducted using their agencies’ data. Interviewees noted that within their transit agency there often is not time to focus on research-oriented questions. They named specific examples of research conducted by exter- nal partners that benefited their transit agency. These include the following: • A bus turnaround dashboard • An origin–destination inference algorithm • A passenger segmentation model • An electrification study • Optimization of dispatcher assignment of work Sharing data can also spur innovation through the combination of public transit data with external data sets. One transit agency interviewee discussed this potential, noting that transit data might be combined with other data sets, such as health care or census data, to create new insights. For transit agencies that seek to pursue innovative, multimodal collaborations, some level of data sharing is often a necessity. Cost Savings Outsourcing data analysis and processing work through data sharing can save transit agencies money. For example, although some transit agency interviewees noted their agencies have devel- oped their own transit planning and real-time information apps, several interviewees noted that developing their own apps in house would be time-consuming and inefficient, compared with allowing external partners to develop them. One transit agency interviewee also noted that, by open- ing up data, external users of the data help the transit agency more quickly identify problems with the data sets. Increasing the data user pool saves the transit agency time spent looking for missing data and data anomalies. Clearly, this benefit must be weighed against the risk of releasing data that has not been fully vetted. However, for some data sets and partners, this may be a useful model. Cost savings can also be accrued by releasing data publicly in batches, rather than repeatedly releasing data on a case-by-case basis through individual public records requests. Publishing

22   Data Sharing Guidance for Public Transit Agencies—Now and in the Future frequently requested, nonsensitive data online saves transit agency staff time in the long term. Fees for Data? The electric utility sector is facing similar Revenue Generation questions about whether fees should be charged for data products and data- Several transit agency interviewees expressed concerns over the driven services. The industry working risk of negative perception of data sales. They felt the public would not group, Green Button Alliance (2018), support the idea of the transit agency profiting off of data on the indi- believes that if data is to be sold, the viduals who use the public transit system. In addition, several transit pricing should be uniform for all poten- agency interviewees mentioned their agencies could not sell data, tial customers. That is, data cannot be because they are required to provide it to anyone who requests it with free for some users and for a fee to a public records request. Under many state laws, transit agencies can other users. However, in practice, some charge public records requesters for the effort required to fulfill their state public utility commissions allow requests; however, the transit agency interviewees did not report on any utilities to charge a fee for data and instances in which their agency had charged requesters. others mandate the data be provided Outside of public transit, there are examples of public entities gener- for free. Some organizations have devel- ating significant revenue from the data they collect. Most state depart- oped revenue streams to support an ments of motor vehicles charge for the release of vehicle registration internal team of information technology data. Many states charge as much as $5 per record. Although restric- specialists, and still others see competitive tions prevent data users from using the data to contact individuals advantage in developing a platform directly, this data can be used in aggregate for market research and is that can be licensed to other utilities frequently purchased both by vehicle manufacturers and data aggre- that want to offer data products and gators such as LexisNexis. The Florida Department of Highway Safety data-driven services. and Motor Vehicles was reported to have made $63 million in 2010 through fees on registration records (Local 10 News 2011). There have been examples of individuals and organizations successfully challenging unreasonably high public records request fees (Grube 2013). Is There a Market for Transit Agency Data? Currently, the market for transit agency data is limited compared with the market for vehicle registration data (a significant revenue generator for state departments of motor vehicles). The market for transit ridership data is limited because transit riders are generally only a small share of a region’s residents and alternate sources of mobility data (cellphone, GPS) cover a larger market. Route, schedule, and vehicle location data are widely used in private apps but have been made available for free. It is unclear whether or not these app developers would pay for this data. Other types of transit data are infrequently requested from transit agencies, based on the information received from the transit agency interviewees. Representatives from Location-Based Services (LBS) companies that the research team interviewed for this study expressed interest in collaborating with transit agencies to find better use cases of their combined data sets. Examples from some geospatial mapping technology companies may also shed light on the possible use of transit data for the retail business. One spatial location mapping company interviewee indicated their company had developed visualization and mapping products from the open U.S. Census data. Their date-derived analytical results generate additional values for their business partners. The private mobility provider interviewee indicated that they use transit data that is publicly available. Responses as to whether they would pay for this data varied. Several expressed particular interest in station geometry data, including station entrance locations and parking facility locations, because this information enables more detailed maps for multimodal connections.

Factors Impacting Transit Agency Decisions about Data Sharing   23   Given the expectation of free and open data, public agencies have seen pushback when they attempt to sell data. Dutch agencies attempted to release data to some partners for free while selling data to others, which led to conflicts (Conradie and Choenni 2014). The tone of news stories about TfL’s potential financial gain from Wi-Fi data similarly suggested that public agencies profiting off their data can provoke a negative response (Cheshire 2017). Advertising Although selling or charging fees for transit agency data is not a major consideration, several transit agency interviewees described the monetary benefits accrued by using their agency’s data to increase advertising revenue. At least five of the transit agency interviewees indicated their agencies had already used data to generate advertising revenue or were considering doing so. One transit agency interviewee noted their agency is in the process of estimating the value of advertising in their transit system, including data-driven, targeted advertising. Two others described how their agencies used ridership data to price space in their transit system. According to TfL’s 2017–2018 Advertising Report, 20% of the UK’s outdoor advertising by value is owned by TfL. In 2017, TfL provided customer segmentation data to advertisers. This depersonalized and grouped data from smart ticketing was overlaid with demographic market segmentation data from a private marketing company (Experian) to help prove to advertisers that they are reaching their target audiences. According to the report, advertising revenues in the fiscal year 2017/2018 were £152.1 million ($185 million) (Transport for London 2018). With ongoing research on location-based advertising and its increased prevalence, there may be increasing latitude for transit agencies to generate revenue by leveraging their data. Customized Data One transit agency interviewee noted that their transit agency had occasionally sold bespoke analysis to clients. This consisted of specially requested analysis that would otherwise not be performed by the transit agency. This type of model avoids the privacy risks of sharing data directly and may alleviate the public perception risk of profiting off data that is perceived as a public good. Another potential revenue generator discussed in the transit agency interviews was the potential for transit agencies to sell the data infrastructure expertise they developed to share data, particularly expertise in the development of APIs that feed large volumes of real- time data to developers. Customer Benefits Perhaps the most significant benefit that transit agencies consider when sharing data is its potential to positively impact customers. Travel apps that help customers plan public transit trips and alert them to bus and train arrivals can save customers time. According to a study of open transit data, the primary reason transit agencies have cited for releasing route, schedule, and vehicle location data is to provide customers with more information (Schweiger 2015). In London, 42% of residents use mobile phone apps that use information from TfL’s open data feed. These open data feeds provide customers with greater certainty about their journeys and potentially save passengers time. These benefits to customers from TfL data provided via apps were estimated at between £70 and £90 million ($85 to $109 million) per year in time savings (Deloitte 2017). The innovative studies spurred by open data can impact customers as well. Research that helps transit agencies operate more efficiently or plan service better ultimately translates into benefits for public transit customers. The impacts of external research on customers may be

24   Data Sharing Guidance for Public Transit Agencies—Now and in the Future Getting More Value from Third-Party Apps Transit agencies may be able to generate more customer benefits from third-party apps in return for the data they share. In Tampa, Florida, a pilot program embedded Open311, a service that allows users to report issues to the local government and transit agencies, in the open-source OneBusAway app (Barbeau 2018B). Transit agency interviewees also described two additional information types that could be presented in customer-facing apps: crowding information and fare information. As data and processing methods improve, the possibility to provide reliable crowding information in real time is increasing. Although some transit agencies are concerned that reporting on crowded trains and buses may discourage public transit use, others see this as an important way to increase customer knowledge and improve their experience. Fare information is part of the GTFS standard but is not supplied by all transit agencies. Wang (2014) noted that the existing GTFS standard is insufficient in the way it describes fares and proposed an extension to GTFS to model complexities in fare structure, such as time of day variance, distance-based, and free transfers. At least one transit agency interviewee noted their agency was working to include fare information in its public information feeds and ultimately in transportation apps. significant, particularly in data sharing models where transit agencies are able to influence exter- nal research to target their needs. Facilitating Community Functions and Multimodal Mobility Transit agencies also reported that data on passengers is requested by real estate developers, municipal planners, and law enforcement officers. Transit agencies attempt to support these community needs while also protecting private information on their customers. In a recent example in Boston, Massachusetts Bay Transportation Authority (MBTA) video surveillance data and fare card data were used to locate a kidnapped woman (Flanigan 2019). Public transit is just one part of a multimodal transportation system. In some cases, public transit agencies partner directly with TNCs or micromobility providers. Data sharing is often critical to building a well-functioning multimodal transportation network. Some argue that the integration of public and private mobility options, which generally requires data sharing, makes cities more attractive to investors with private capital, increasing the number of skilled jobs available and widening the city’s tax base (Hemerly 2013). Benchmarking Transit agencies share data for benchmarking, which helps them understand, track, and improve their performance. The National Transit Database (NTD) is a repository of transit agency information. Transit agencies that receive funding from FTA under the Urbanized Area Formula Program (§5307) or Other Than Urbanized Area (Rural) Formula Program (§5311) are required to submit data to the NTD. The data is frequently used by researchers to understand trends in public transit performance. The fact that it is standardized across agencies makes it easy to use for cross-agency studies.

Factors Impacting Transit Agency Decisions about Data Sharing   25   Transit agencies can also pool data privately to benchmark performance. The American Bus Benchmarking Group is a consortium of bus agencies that share data and best practices. It aims to help its members understand their performance by making comparisons about practices and outcomes across agencies. 3.2  Costs and Effort There are many steps required to prepare data for sharing. Figure 4 summarizes the com- mon elements in preparing data for sharing. These steps require staff time and often also require contracting with external vendors. In many cases, these steps are required even for internal data use, a factor identified by several transit agency interviewees. Interviewees noted that good internal data management practices make data sharing easier. For example, a well- documented internal data repository helps transit agency staff make use of data and also reduces the additional steps required to distribute data. However, several transit agency inter- viewees noted that much of their agencies’ data was not collected with analysis in mind, was not stored in a centralized location, and was not documented for external use. As such, pre- paring data in response to data requests often requires significant effort. There is additional effort required to conduct privacy and other risk assessments and to develop any licensing agreements necessary. Data Cleaning Although data cleaning is important, it can require significant effort. If a planning goal depends on high-level metrics that aggregate data across months or years, an imperfectly cleaned data set may be sufficient. In contrast, when data is shared to provide customer infor- mation, errors in the data can be problematic. If the route, schedule, and vehicle location infor- mation that transit agencies share are inaccurate, it may dissuade customers from using transit services, and could even have implications for customers’ safety. Transit agencies interviewees emphasized the importance of vehicle arrival prediction quality and described issues such as “ghost buses”—in which bus arrivals are predicted but do not occur—that they are actively working to combat. Figure 4.   Process of preparing data for sharing.

26   Data Sharing Guidance for Public Transit Agencies—Now and in the Future This same level of data quality is not necessary for all data types. Two transit agency inter- viewees of agencies that release the most data publicly both noted that data does not have to be perfect to be released. They see benefits from releasing data even if it has minor flaws. As long as the issues and caveats are described in the data documentation, releasing data promotes trans- parency and can spur research and innovation. In some cases, external data users can actually help the transit agency identify and fix problems with the data. Data Merging Often transit agencies merge multiple data sources to produce more useful data products. Common examples include merging automated passenger counter (APC) and farebox data, connecting data to GTFS identifiers, and assigning data to trips or vehicles. These processes make data easier to use and increase its value both internally and externally. Adding Value to Data Sometimes internal analytical effort by transit agencies can pay off by making data more useful and desirable to other users. Looking to the energy sector, utilities in New York and California are creating “interconnection maps” to show third-party service providers exactly where they can provide distributed energy resources, and the price that the utility is willing to pay for load at those network nodes. Creating these maps is one step in streamlining the procurement of third- party services by publicly sharing localized electricity needs and creating a standard process for third-party service integration. In California, the utilities work independently, posting their interconnection maps on their websites (California Public Utilities Commission 2008). In New York, the utilities are collaborating with each other as well as the regional transmission operator to standardize data collection, management, and load forecasting methods (Joint Utilities 2016). Transit agencies may create similar data products that make their data more accessible and valuable to third parties, including private developers and private mobility providers. Transit agencies can evaluate the potential benefits of creating these products, which could include revenue generated from processing fees and transit-supportive development against the effort required and the potential strategic risks of releasing these data products. Other research questions require merging transit data with external sources, such as weather or census data. Transit agencies may opt to do this task internally, or they may share the data sets publicly or with a research partner that will complete the task. Data Aggregation Data aggregation refers to any process in which individual records are combined to produce summary data, for example, combining individual boarding or origin–destination data to provide estimates of average weekly ridership on a route. Transit agencies must make decisions about aggregation prior to sharing data. Transit agency interviewees reported that they aggregate data for a variety of reasons, including making data easier to use and understand (particularly for non- technical audiences), minimizing data storage needs, and protecting individuals’ privacy.

Factors Impacting Transit Agency Decisions about Data Sharing   27   Aggregation is an important tool given the variety of audiences for transit data. One transit agency interviewee noted that different audi- Why Aggregate? ences are interested in different levels of aggregation. Although researchers typically prefer disaggregate data, journalists, advertisers, and real estate For internal use: developers typically seek some level of aggregation so they can draw • Aggregation reduces data conclusions and make decisions about actions to be taken without having storage needs and protects to perform a significant amount of analysis themselves. Providing aggre- against cyberattacks of gated statistics on things like ridership, on-time performance, and vehicle individual records. crowding can also prevent some types of data misuse, in which external users misunderstand aspects of the data and perform analysis that leads For external use: to incorrect conclusions. However, more detailed disaggregate data, when • For some audiences, aggregation analyzed correctly, can spur research that generates new insights that can helps them understand the data benefit the transit agency. Some transit agencies provide both disaggregate and prevents misuse. data for download and an interactive dashboard that allows the user to • Aggregation of individual records view aggregated information, with data grouped by time period and route. prior to sharing can protect The transit agency interviews also revealed many examples of data individuals’ privacy. aggregation for privacy protection. This is described in Section 3.3. Data Formatting Transit agencies may format data to make it easier to use or to conform to data standards. Standardizing data prior to sharing can produce additional value, for example, by encouraging standardized, open-source tools, as has been the case with GTFS. However, standardizing data also requires additional effort. A discussion of the advantages of data standards for data sharing and the challenges of developing and adopting data standards is included in Section 6.2. Data Documentation In general, some form of data documentation, typically including the development of a data dictionary, is required prior to sharing data. Although good, detailed documentation is critical when data is shared publicly, more basic documentation may be sufficient if data is shared with a partner under an ongoing collaborative relationship. Data released without sufficient context and metadata (including information on assump- tions inherent in the data and data dictionaries) is susceptible to misuse (Conradie and Choenni 2014). Data users need to know field definitions as well as any assumptions and caveats. Data field definitions and possible values are typically provided in a data dictionary. This process is especially important when data is shared externally. All the transit agency interviewees indi- cated their agencies provide some documentation with the data they share. Some expressed that this process can require significant effort and that it is sometimes a challenge to determine what level of detail of documentation is sufficient. Good documentation can help prevent misinter- pretation and misuse of data but takes time to develop. The use of data standards can address this challenge, because transit agencies can rely on centralized documentation of data following the standard format. Data Cataloging Not all transit agencies have data catalogs, but they can be useful for data sharing. In fact, most of the transit agency interviewees noted that their agencies do not have a centralized data repository, and that data was stored in a variety of locations across the transit agency. The advantages and need for a centralized data catalog were explained in several transit agency

28   Data Sharing Guidance for Public Transit Agencies—Now and in the Future interviews. A centralized data catalog can serve both internal data analysis and data sharing. Parts of the data catalog may be made open to the public, with access to other parts granted to certain partners or limited to transit agency staff. The catalog can help internal staff find and use data collected across divisions and can also ease the data sharing process, saving the transit agency time responding to data requests. One transit agency interviewee indicated their agency developed a public-facing dashboard where users can view and download many types of data. The interviewee noted that the dash- board saves time responding to internal data requests as well, because people from other divi- sions can “help themselves” to data. Having such a catalog requires staff effort to maintain. Many transit agency interviewees noted that the lack of a staff member or group dedicated to such an effort was the reason their agency did not have a catalog. 3.3 Risks The primary risks that may impact public transit data sharing decisions are privacy, security, data misuse, and strategic risks. Section 2.4 includes checklists and guidance to assist transit agencies in identifying and addressing these risks. This section provides context and examples to illustrate these risks based on the interviews conducted and the review of literature and information. Privacy There are several sources of privacy concerns with public transit data, including the following: • Personal data collected, such as registration information associated with fare cards (names, addresses, etc.). • Anonymized individual data that risks re-identification when combined with other data sets. • Anonymized individual data that risks re-identification even without additional data sets (PII). As public transit agencies increasingly integrate their electronic fare systems with other modes and payment systems (such as credit cards) re-identification becomes increasingly possible. • Facial recognition of video data. Examples of re-identification of anonymized data occur across fields. For instance, in 2008, Netflix released data on movie ratings by individuals that they believed had been anonymized, but researchers at the University of Texas at Austin proved that they could identify individuals (National Academies of Sciences, Engineering, and Medicine 2018). Similarly, when the New York City Taxi and Limousine commission released data on taxi rides in 2014, a data scientist was able to identify individual trip origins and destinations and amount paid by combining the data set with medallion numbers visible in celebrity photographs (Lubarsky 2017). Transit agency interviewees expressed the need for guidance and protocols to follow to assess and reduce privacy risks. How Important Is Privacy? In assessing privacy risks, transit agencies may consider how important privacy is to their customers and their customers’ willingness to provide personal information to public agencies in return for benefits. Studies have shown that people are willing to trade privacy for benefits. According to a 2012 Pew study, almost three-quarters of smartphone owners get location-based information on their phones. However, people also appear to be selective in which sources they provide information to. The study found that more than half of app users surveyed had uninstalled or chosen not to install an app because of privacy concerns (Brakewood and Paaswell 2017). In a focus group on transit agency apps, most users said they did not read app pri- vacy policies, although 72% said they understood that their smartphone’s locations could be identified. In a survey on the same subject, most respondents said that transportation

Factors Impacting Transit Agency Decisions about Data Sharing   29   apps should know their location (71%), and 60% said they were not concerned about this (Brakewood and Paaswell 2017). However, people are concerned with their data being shared, especially if it is not for trans- portation planning purposes: 50% were “strongly concerned” about having data shared for marketing purposes, compared with only 13% “strongly concerned” about having it shared for transportation purposes. Brakewood and Paaswell (2017) also found that, although 35% of survey respondents were “strongly concerned” with data from transportation apps being shared with a private agency, only 18% were “strongly concerned” with this data being shared with a public agency. Understanding these tradeoffs is important because there is a cost to maintaining data privacy. Erhardt (2016) argues that strong privacy restrictions, such as data obfuscation requirements, can limit the usefulness of smart card data. Lerner (2012) discusses data privacy regulations in the context of online advertising and suggests that they may inhibit innovation by posing obstacles to start ups and thus favoring large established companies. Transit agencies may be similarly burdened by privacy regulations relative to private mobility providers and private mobility data collectors. Opt-in Models and Standards for Data Privacy Researchers often want to access individual records, which some transit agencies are hesitant to share due to privacy concerns. There may be potential to address this challenge with opt-in models, in which individuals agree to share their data (Inter- national Association of Public Transport 2018). One transit agency interviewee discussed this option for accessing users’ Wi-Fi and app usage data. If transit agencies can show customers that they will use data to the customers’ benefit and establish trust with their customer bases, opt-in models can allow transit agencies to maintain sensitive data internally and use it for planning purposes. In some cases, transit customers may even opt in to sharing their data with external trusted partners, such as researchers and municipalities if they are made aware of potential benefits. A possible model for opt-in data sharing comes from the electric utility industry Green Button Alliance. The Green Button Alliance’s DataGuard enables informed consent for customers to opt-in to data sharing and allows customers to decide how and when data is shared. The standard also includes secure maintenance and disposal of data, and self-enforcement or auditing to ensure security. The Green Button Alliance has taken DataGuard a step further to develop two industry standards: (1) UtilityAPI (2) and Green Button. These standards require companies to educate customers of their data’s existence, and provide opt-in consent with options to set expiration dates on data sharing. Customers can either download their Extensible-Markup-Language-formatted (XML-formatted) data and send it to a third party, or they can use their utility website log-in credentials (as they would, for example, with a Facebook or Google account to log into multiple websites) to share their data with approved companies via Transport Layer Security (TLS) 1.2 encryption. The industry chose XML-format and TLS 1.2 encryption, because these are software standards for open sharing. Green Button data protects user privacy by splitting a user’s data into two parts: usage data and personal data. The usage data does not have personal identifying information, such as name, address, and geographic location. Personal data does not have any usage data.

30   Data Sharing Guidance for Public Transit Agencies—Now and in the Future How Much Aggregation Is Necessary to Protect Individual Data? The majority of transit agency interviewees indicated that their agencies never shared indi- vidual records. Those that share individual records do so only with trusted partners who sign a nondisclosure agreement and undergo training in the handling of such data. Instead, transit agencies typically opt to aggregate individual records prior to sharing. Transit agency interviewees revealed that the level of aggregation varies. One transit agency interviewee indicated that their agency never releases aggregated data containing fewer than 10 records within a given sample bin. For example, if a data requester asked for hourly board- ings at a stop, and the stop had fewer than 10 boardings in 1-hour period, they would not release data for that hour. Another transit agency interviewee described a similar rule, but their agency set their minimum at five records. A third transit agency interviewee indicated that their agency only releases data aggregated to a census tract or Transportation Analysis Zone (TAZ) level. Yet another one specified that their agency only supplies average daily boarding information, typically aggregated to an entire year. The interviewee noted that this choice was not only for privacy reasons but also due to lack of consistent data and data quality concerns. Multiple transit agency interviewees commented that their agencies’ privacy policies felt arbi- trary and that guidance on privacy protection would be appreciated. Data Aggregation to Protect Privacy—Lessons from the Energy Sector In the electric utility sector, publicly available data are also aggregated to protect individual users’ privacy. State regulation of customer electricity usage generally covers the release of individual customer data, wherein a customer can release their electricity usage alone, or with their personal identifying information, to third parties. States also address how utilities make customer electricity usage data available for planning purposes, whether to state energy agencies and regulators or to third-party service providers. Balancing the need for temporal and geographic granularity, which provides insights into consumer demand for electricity at a given time at a specific location, and the need for privacy is a challenge for state regulators. To achieve this balance, regulators define acceptable levels of geographic granularity. • Vermont allowed utilities to release aggregated customer data, without personal identifying information, at the municipal level (i.e., aggregated across an entire town or city). • Colorado developed the “15/15 rule,” whereby utilities provide 2 months of customer usage data on a rolling basis (Colorado Public Utilities Commission 2015). It is aggregated across at least 15 customers of the same classification (e.g., large home, small home, small business) within the same ZIP+4 area, without PII. A single customer’s load must not comprise more than 15% of the customer group. If there are fewer than 15 customers in a ZIP+4 area, or a single customer’s load is more than 15% of the total data, the utilities expand the geographic area to ZIP+2. California (Lee and Zafar 2012) and Illinois (Illinois Commerce Commission 2014) also adopted the 15/15 rule.

Factors Impacting Transit Agency Decisions about Data Sharing   31   Data Censoring Data censoring may be required prior to sharing data that consists of written descriptions. For example, one transit agency interviewee noted that, although most of the information in their agency’s incident reports was likely not sensitive and could be released for transparency pur- poses, some reports may occasionally contain descriptions of individuals that present a privacy risk. As a result, sharing this data would require significant effort to review the data and scrub any sensitive information. Strategies and Lessons Learned About Data Privacy Transit agencies have developed a variety of techniques to address privacy concerns when they collect and share data. One transit agency interviewee discussed the importance of trans- parency, noting that their agency used several methods to inform customers of Wi-Fi data collection in their stations. Notices described that data would be used to benefit customers through improved service planning. Being upfront about data collection helps mitigate the risk of privacy concerns being raised after the fact. Establishing the benefit to customers may create buy-in. Another transit agency interviewee specified that their agency has a privacy officer who reviews data requests that have privacy concerns and who also conducted a privacy impact assessment. This organizational structure and proactive approach may also mitigate privacy risks. Outside of transit, there are other frameworks for privacy that can guide transit agencies. One framework for assessing privacy risk categorizes data in three tiers: open data, restricted data, and highly restricted data that are collected under a pledge of confidentiality (National Academies of Sciences, Engineering, and Medicine 2018). Open data is data for which privacy concerns do not exist. Restricted data may have privacy concerns associated with it and should only be shared with appropriate provisions. Highly restricted data generally should not be shared, and individuals should be informed of its collection and uses. The National Center for Health Statistics follows a “Five Safes” framework to guide decisions about data access (National Academies of Sciences, Engineering, and Medicine 2018). The Five Safes are as follows: • Safe projects, in which they consider the specific use of data and determine whether it is “appropriate, lawful, ethical, and sensible”; • Safe people, in which they evaluate the researchers who will be analyzing data; • Safe data, in which they look at the information contained in the data and evaluate any potential confidentiality breach; • Safe settings, in which they consider the security of the facilities where data is stored and accessed; and • Safe outputs, which considers what types of findings will be released based on the data analysis and evaluates risks, particularly re-identification risks. There are also technical approaches to privacy protection. One transit agency interviewee described their agency’s process of encrypting data using a salt, which is an unknown character string that is added to a unique identifier prior to encryption. This serves as protection against decryption. The interviewee from this transit agency, which has a pilot to collect mobile phone data in collaboration with a private company, also noted that their agency had a process for automatically randomizing data relating to a sample of fewer than 10 individual devices. Chen et al. (2012) describe the potential of the differential privacy framework, a statistical process for protecting user privacy in data sets consisting of individual user data by adding noise to the data sets. Their case study for the Montreal Transportation System demonstrated

32   Data Sharing Guidance for Public Transit Agencies—Now and in the Future that they could successfully apply the differential privacy framework to smart card data, pro- ducing a privacy-protected data set from which the transit agency could perform standard analysis tasks. Security In the context of public transit data sharing, physical security risks are defined as the risk of someone using transit data to inform an attack on transit infrastructure. As opposed to privacy issues, security concerns were not emphasized in the transit agency interviews; however, one transit agency interviewee noted that their agency was often prevented from releasing data (e.g., on stop-level boardings) that was deemed security sensitive. Another transit agency interviewee noted that their agency releases data if the requester can demonstrate a research or business need for the data. If not, the agency infers that the request may produce a security concern. Security was not mentioned in the other transit agency interviews with the exception of one interviewee who specifically noted that their transit agency does not perform a security risk assessment for data requests. Cybersecurity is also a risk. Cyberattacks can compromise private data housed within a transit agency. When the transit agency shares data with an external partner, there is an additional risk that the partner is susceptible to a cyberattack. Cybersecurity risks were not raised in the transit agency interviews. However, this subject has been raised in forums on transit data sharing. For example, this was discussed at the Twin Cities Shared Mobility Data Workshop in July 2019. Additionally, the private company interviewees expressed concerns that transit agencies lack the capacity to guard or manage sensitive information that their companies share with the transit agency. Misuse Although security was discussed only occasionally in transit agency interviews, the risk of data misuse was raised in almost every interview. Misuse may be deliberate or accidental, with most transit agencies more concerned with accidental misuse, which they perceive as much more likely. One interviewee noted that data users often do not have the full picture. Because they see only part of the data, they may make incorrect conclusions. Another interviewee noted that they were concerned that users would select the wrong data source or use old, stale data to drive their analysis. One transit agency interviewee described an example in which a third-party app misrepresented the data the transit agency had published, leading to complaints to the transit agency from their customers. Information about transit agencies that is relayed to customers through apps or published on websites and in newspapers can significantly impact the way customers view transit agencies. Although transit agencies cannot prevent misuse of published data, they can take steps to reduce instances of it. Transit agency interviewees noted the importance of checking data for errors before it is published and of fully documenting data that is published online or provided to partners. In terms of route, schedule, and vehicle arrival data shared with customers through third-party transit apps, several transit agencies are taking steps to actively manage what information is shared (see Section 4.3). Strategic Risks Strategic risks consist of any consequences of data sharing that impact the transit agency’s ability to serve its function. For example, if data sharing can impact the way the transit agency is perceived by its customers or its ability to provide good service to its customers, there is a strategic risk.

Factors Impacting Transit Agency Decisions about Data Sharing   33   Several transit agency interviewees described concerns about public perception. Particularly when asked about the possibility of selling data, they noted that this could cause their agencies to lose their customers’ trust. There are also varying perspectives on the strategic risks of open data. One interviewee commented that some transit agencies are concerned with releasing data that shows things like poor on-time performance or overcrowding on their transit system. In contrast, the interviewee believed that releasing data promotes transparency and provides their customers with the best information available to navigate the transit system. In short, there may be strategic risks associated with releasing and not releasing data. The UITP identifies a different set of strategic risks in their guidance document on data sharing. They discuss that there may be a strategic risk of sharing data of high commercial value for free (International Association of Public Transport 2018). Their report hypothesizes that, when certain data sets are shared, it may actually cause power to shift away from the transit agency. As an example, when transit agencies share GTFS and GTFS-Realtime (GTFS-RT) data, and these are used by third-party apps, the third-party apps collect information on customers that the transit agencies may not have access to. This information asymmetry may disadvantage transit agencies and hamper their ability to best serve their customers. This particular instance of information asymmetry was mentioned in several of the transit agency interviews and is discussed in more detail in Section 4.3. 3.4  Rules and Legal Issues Laws around data privacy and data management can guide transit agencies in their data sharing practices. However, as Hemerly (2013) cautioned, technology has been developing more quickly than the legislation to keep up with it, which can lead to concerns and conflicts over what data is public and what data is private. Legal Protection of Data Privacy At the federal level, there is no general constitutional right to privacy of one’s personal data. It is guaranteed only in two cases: “(1) where the release of personal information could lead to bodily harm . . ., and (2) where the information released was of a sexual, personal, and humiliating nature. . . .”1 However, the Federal Trade Commission (FTC) has broad authority to protect consumers from unfair or deceptive practices that put consumers’ personal data at unreasonable risk. For example, the FTC has pursued enforcement actions against companies for “failure to maintain reasonable and appropriate data security for consumers’ sensitive personal information.”2 There is precedent, however, that transit agencies, as agents of the state, may be immune from FTC jurisdiction in many contexts.3 Similarly, a transit agency may be held liable under state tort law for mishandling a customer’s personal data; however, some transit agencies may have sovereign immunity from these suits (Thomas 2017). For some sectors, federal privacy laws regulate the collection and dissemination of certain types of information. For example, the Health Insurance Portability and Accountability Act (HIPAA) applies to sensitive health information. There are no transportation sector-specific federal laws that govern data sharing by transit agencies; however, FTA has published Open Data Policy Guidelines, which discuss best practices for public sharing of data by transit 1  Lambert v Hartman, 517 F.3d 433 2  FTC v. Wyndham Worldwide Corp., 10 F. Supp. 3d 602, 607 (D.N.J. 2014), aff ’d, 799 F.3d 236 (3d Cir. 2015) 3  N.C. State Bd. of Dental Examiners v. FTC, 135 S. Ct. 1101, 2015

34   Data Sharing Guidance for Public Transit Agencies—Now and in the Future agencies (Catalá 2016). These guidelines state that FTA should encourage transit agencies to embrace open data practices for data that does not contain private or personal information or that could create security or safety concerns. The guidelines do not address sharing this more sensitive data with specific parties. State-Level Legislation Most privacy law that applies to transit agencies in the United States is at the state level. Certain state constitutions protect an individual’s right to privacy. In addition, some state courts have held that an individual’s right to privacy must be balanced against a compelling state interest in disclosure (Thomas 2017). At least 28 states have enacted data security laws that apply to state, and sometimes local, government entities (National Conference of State Legislatures 2019). Although state laws vary widely in their scope and requirements, they generally require the development of guidelines and standards for data collection and retention. In addition, some of these laws require agencies to take specific measures to protect sensitive information from unauthorized access, destruction, use, modification, or disclosure. Some state laws also require public agencies to develop an information security plan based on standards and guidelines developed by the state’s chief information security office. State data security laws also vary in terms of which agencies are subject to their requirements. Certain state laws only apply to state-level agencies, not to local or other government entities [e.g., Cal. Civ. Code § 1798.14 (“Each agency shall maintain in its records only personal information which is relevant and necessary to accomplish a purpose of the agency required or authorized by the California Constitution or statute or mandated by the federal government.”); Fla. Stat. § 282.318 (State’s Information Technology Security Act applies to each “state agency.”)]. Other state data security laws apply more broadly to other types of government entities [e.g., Ala. Code 1975 §§ 8-38-1 to 8-38-12 (This Alabama law, which requires covered entities to “implement and maintain reasonable security measures to protect sensitive personally identifying information against a breach of security,” applies to “government entities,” defined as “the state, a county, or a municipality or any instrumentality of the state, a county, or a municipality.”)]. The applicability of these state data security laws also depends on how the transit agency is formed, that is, whether the transit agency is a department within a city or state government, an independent authority, or a private operator that is publicly funded and overseen by a state entity. State laws generally distinguish between publicly available information and personal infor- mation. Personal information is typically defined as a name or some sort of unique biometric or genetic print in combination with a Social Security number, driver’s license number, or other identification number. Personal information is the subject of states’ most stringent regula- tions. Aggregated, anonymous/de-identified, or publicly available data is often exempt from regulation (e.g., Md. State Govt. Code §§ 10-1301 to 10-1302). One example of a state data security law is the Minnesota Government Data Practices Act, Minn. Stat. § 13. It regulates how government data is collected, created, maintained, used, and disseminated and applies to “political subdivisions,” including transit agencies that were formed pursuant on a local ordinance (this includes, for example, Metro Transit in Minneapolis– St. Paul). Under the Minnesota act, “Private or confidential data on an individual” may not be disseminated by a government entity for any purposes other than those stated in a warning provided to an individual at the time that individual is asked to supply private or confidential data (known as a “Tennessean warning”), unless the government entity receives the indi­ vidual’s informed consent or the dissemination has been authorized by statute. The warning must inform the individual asked to supply private or confidential data of the following: • Purpose and intended use of the requested data within the collecting government entity; • Whether the individual may refuse or is legally required to supply the requested data;

Factors Impacting Transit Agency Decisions about Data Sharing   35   • Any known consequence arising from supplying or refusing to supply private or confidential data; and • Identity of other persons or entities authorized by state or federal law to receive the data. State data breach notification laws may also apply to transit agencies (e.g., the Alabama Data Breach Notification Act of 2018). A closer examination of these laws is required to determine whether any of them would impose a notification requirement on the transit agency if an entity with which it shared data experienced a breach. Public Data Disclosure Laws The federal Freedom of Information Act (FOIA) provides persons with the right to request access to federal agency records or information. Federal agencies are required to disclose the requested records unless an exemption applies. The federal FOIA does not apply to state or local government agencies, including most transit agencies. Instead, transit agencies are subject to state-level disclosure laws. States vary in what types of records and which agencies are subject to their disclosure laws. In some states, disclosure laws do not apply at the local government or political subdivision level, meaning that they may not apply to a transit agency (depending on how that transit agency is structured). Some state disclosure laws include exemptions for disclosure of personal informa- tion. For example, Illinois exempts private information from disclosure requirements (unless otherwise required by statute to be disclosed). Private information is defined as “unique identi- fiers, including a person’s Social Security number, driver’s license number, employee identifica- tion number, biometric identifiers, personal financial information, passwords or other access codes, medical records, home or personal telephone numbers, [] personal email addresses[,] . . . home address[,] and personal license plates.” In addition, in some states any information retained by an agency is subject to disclosure, even if the records originated outside the government. For example, New York’s Freedom of Informa- tion Law required disclosure of insurance company meeting minutes that were voluntarily and confidentially given to the New York Insurance Department.4 In some states there are special laws to govern data that private entities share with transit agencies. At least one state has issued regulations governing the sharing of PII by TNCs. Colorado Rules 6723(l) and 6710(e) require that these companies obtain consent from customers before disclosing any PII to a third party and that they keep records of any disclosure. However, the Colorado Public Utilities Commis- sion has the authority to require these companies to disclose PII in specific situations without obtaining customers’ consent. A few states have laws specifically about sharing video data that require documentation of the “custodians” with whom data is shared (Thomas 2018).  Washington Post v. Insurance Dep’t, 463 N.E.2d 604, 607 (Ct. App. N.Y. 1984) 4

Next: Chapter 4 - Models for Sharing Public Transit Data »
Data Sharing Guidance for Public Transit Agencies—Now and in the Future Get This Book
×
 Data Sharing Guidance for Public Transit Agencies—Now and in the Future
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Transit agencies are beginning to harness the value of external data, but challenges remain.

The TRB Transit Cooperative Research Program's TCRP Research Report 213: Data Sharing Guidance for Public Transit Agencies – Now and in the Future is designed to help agencies make decisions about sharing their data, including how to evaluate benefits, costs, and risks.

Many transit agencies have realized benefits from sharing their internal data sets, ranging from improved customer information, to innovative research findings that help the transit agency improve performance.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!