National Academies Press: OpenBook
« Previous: Chapter 5 - Models for Accessing External Data Sources
Page 49
Suggested Citation:"Chapter 6 - Major Challenges." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 49
Page 50
Suggested Citation:"Chapter 6 - Major Challenges." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 50
Page 51
Suggested Citation:"Chapter 6 - Major Challenges." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 51
Page 52
Suggested Citation:"Chapter 6 - Major Challenges." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 52
Page 53
Suggested Citation:"Chapter 6 - Major Challenges." National Academies of Sciences, Engineering, and Medicine. 2020. Data Sharing Guidance for Public Transit Agencies—Now and in the Future. Washington, DC: The National Academies Press. doi: 10.17226/25696.
×
Page 53

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

CHAPTER 6 Major Challenges In the quick-changing data management and sharing environment, the transit agency inter- viewees identified a variety of challenges, which are also reflected in the literature and shared across other sectors. Some challenges and needs are internal: protocols, organizational structures, and other changes that are required within the transit agency. At the same time, transit agencies are looking for external guidance and even regulation to govern these internal changes. Transit agency interviewees expressed frustration at the challenge of working out every detail of a data sharing agreement internally. They recognize the potential efficiency that could be gained through standardized protocols and policies. They also see regional or federal policies as a potential mech- anism to encourage more cooperation from vendors and private mobility and data providers. 6.1  Internal Data Management Structure and Protocols The majority of the transit agency interviewees identified a lack of coherent organizational structure for managing data internally as well Data Silo Problem as for data sharing. Interviewees noted that data was collected and stored across a variety of divisions or groups within the transit agency, Transit agencies discussed challenges and responsibilities for data sharing were therefore also spread across with data being stored and managed in staff in different parts of the organization. For example, maintenance silos across the United States. This prob- staff collect and manage maintenance data, operational staff have lem occurs in organizations across sec- operations data, other data is housed by the transit agency’s IT depart- tors. Electric utility sector interviewees ment, and planners access yet another set of ridership and route and echoed the sentiment. Although data schedule data. These data silos present challenges for internal use of repositories and dashboards can help data as well as for external sharing of data. organizations make timely decisions, the Many transit agencies described the fact that data requests may be dashboards are only as good as the data sent to a variety of divisions within the transit agency. These are handled available to them. Without a complete by different staff depending on the data type. Only one transit agency data set, insights may go undiscovered. interviewee noted their agency had created an information management and governance group to handle all outside data requests. Due in part to these organizational challenges, transit agency interviewees noted that responding to individual data requests can be resource intensive. Several transit agency interviewees described these challenges in terms of personnel and organizational needs (the need for a centralized data management staff person or group) and technical needs (the need for a centralized data repository and catalog). These needs are related in that a centralized data repository and catalog requires dedicated staff to develop and maintain. These data-focused staff could also take responsibility for other needs that were identified by the transit agency interviewees: the development of formal data sharing policies and protocols, includ- ing standard data licensing agreements and an established method for evaluating privacy risks. 49  

50   Data Sharing Guidance for Public Transit Agencies—Now and in the Future As noted in the literature, developing these capabilities among transit agency staff likely requires staff training, particularly in small- and medium-sized agencies (Lawson 2016). As an additional challenge, staff turnover can make it difficult to ensure that progress in data manage- ment is sustainable. Establishing a staff member or team that is dedicated to data management is an important step in addressing these challenges. Fulfilling these tasks can present technical challenges for some agencies that lack specialized resources (Brauneis and Goodman 2017; Lawson 2016). These challenges only increase with large-scale data that can require machine learning techniques for processing and scalable data storage and mining (Zaslavsky et al. 2013). In fact, this is often the reason that public agencies partner with private companies or universities who can help complete some of these data pro- cessing tasks both for the transit agency’s internal use and for broader sharing. However, some argue that these partnerships take power away from the public agencies, particularly when exter- nal partners fail to transparently describe the methods they use to process data (Brauneis and Goodman 2017). The transit agency interviewees spoke positively about the technical assistance their agencies receive through data sharing. In general, the transit agency interviewees were less concerned with having technical skills in house and more concerned with having the time to dedicate to data preparation tasks. Many expressed that the structure of their transit agency contributed to a lack of effort devoted to data management tasks. Most transit agency interviewees indicated their agencies do not have staff or divisions dedicated to data management, which means staff have other priorities. Those transit agencies that most actively analyze data internally tend to be most well-equipped, both technically and organizationally, to prepare data for sharing. Because of their internal capabilities, these transit agencies may be least in need of external research and innovation to use their data. The needs of small agencies to develop data sharing infrastructure require special attention. One particular internal challenge facing transit agencies is in data collection. To maxi- mize the value transit agencies can attain from sharing data, it is crucial that they collect valuable data. Data Collection Decisions about data collection determine the types of data and the data quality and coverage available to be shared. However, data collection is often more a byproduct of transit system design than a dedicated analysis effort. The data generated is often dependent on functional aspects of the transit system, such as operations and fare collection, rather than potential data analysis or the value that can be generated through sharing data. Some data collection processes predate modern conceptions of open data, and in some cases, considerable effort is required just to extract data from the transit system. Kitchin and Dodge (2014) note that automated data is generally collected as a result of an action, perhaps scanning a credit card or using a smart phone in which providing data is not the primary purpose. One exception is survey data, which is typically collected expressly for the purpose of analysis and evaluation. Data collection issues can impact the value of data for sharing. Several of the transit agency interviewees identified data collection and harvesting data from existing systems as major challenges. Their agencies were hesitant to share data with gaps and inaccuracies. Sometimes, data privacy concerns impact data collection processes, which can ultimately reduce the sharing value of data. For example, in almost all cases, smart card systems track individual card IDs to ensure that passes and discounts are applied appropriately. However, not all transit agencies store this information for analytical use. In some transit agencies, the data

Major Challenges   51   stored for internal use by the transit agency has a new ID for each trip or for each day, preventing the tracking of smart cards across trips Data Biases (in the former case) or across multiple days (in the latter). Although these measures protect individual privacy, they limit the potential for The value of data can be limited by analysis. In many transit agencies, persistent encrypted IDs are stored biases in terms of which data exists and for internal use, with precautions taken to preserve privacy when the which data is missing. In the mobility data data is shared externally. This shifts the decision about privacy protec- field, this issue is discussed when app or tion to the data censoring phase. GPS data from smartphones is used. This data excludes information on people who Even when good data is collected, a final hurdle for transit agencies do not own or use smartphones, which can be data ownership. This issue arises when transit agencies partner may disproportionately include specific with private companies to provide services, in which case the private population groups such as low income partner may not be required to turn over data to the transit agency. and older people (Windmiller et al. This is described in Chapter 5 on accessing external data. In other 2014). Similar data biases can occur in cases, vendors that install and maintain systems, such as automated transit agency smart card data. In most fare collection (AFC), APC, or automated vehicle location (AVL), systems, not all passengers use smart may retain ownership of the data generated. Transit agencies must be cards, and it is important to identify careful to consider the potential value of data sources and ensure that which passenger demographics are they have ownership of valuable data. Although there has been a shift more likely to use smart cards as well as in ownership of AVL and AFC systems to transit agencies, this issue the trip types more likely to be paid for may still persist for other data types. For example, if maintenance is with smart cards (Erhardt 2016). outsourced, some maintenance data may be owned by the maintenance company rather than the transit agency. The issue of data ownership is further complicated in the context of data on individuals. In the European Union, GDPR rules guarantee individuals’ ownership of their own data (see Section 2.6). Although these rules do not apply in the United States, transit agencies should consider the possibility that laws around individual data will change, and spe- cifically consider mechanisms in which individuals can give the transit agency permission to use their data as part of the data collection process. 6.2  External Data Policies and Standards Transit agency interviewees noted that their internal development process would benefit from external guidance and policies. Although all transit agency interviewees identified internal improvements needed for data sharing, they also expressed interest in more external support. Transit agencies recognize that many data sharing challenges are shared across public agencies. Rather than devoting resources to solve these challenges individually, they seek external guid- ance around topics including the following: • Handling sensitive or private data, including when small values need to be suppressed, and what precautions need to be taken to avoid re-identification risks • Writing or selecting data licenses • Documenting data Transit agencies are also looking to external organizations for the development of data stan- dards. This may require a regulatory push to encourage the adoption of new data standards, particularly to require private vendors to comply (Lawson 2016). Developing Data Standards Data standardization across transit agencies can enable external partners to repeat analyses for multiple transit agencies with limited additional effort. This can encourage private

52   Data Sharing Guidance for Public Transit Agencies—Now and in the Future companies and researchers to develop standard tools that can benefit transit agencies. Stan- dardization is highlighted as a salient need in the transit data industry (Sánchez-Martínez and Munizaga 2016), but it is also a major challenge, with data formats varying significantly across organizations. In public transit data, GTFS is the noted outlier, a standard format for route and schedule information that is widely used across transit agencies. The development of GTFS was initially pioneered by Google to integrate transit information into the Google Maps platform. Over time, it has become widely used outside of Google as well, particularly in apps that provide transit information to customers (Schweiger 2015). A newer standard, GTFS-RT, attempts to do the same for real-time vehicle location data. The Vermont Agency of Transportation and Trillium Solutions developed the GTFS-flex speci- fications to support flexible demand-responsive transportation services, different from the original GTFS that only models fixed-route public transportation. It helps transit users get information about nonfixed-route transit services, which are common in less dense environments. The NTD is another example of transit data that is both standardized and consolidated. To date, smart card data has not been standardized in the same way. Two organizations—the Integrated Transport Smartcard Association and the Secure Technology Alliance (formerly the Smart Card Alliance) in the United States—have developed standards for interoperability of smart cards, but these standards are focused on secure fare collection, not on data generation and formatting. Transit Intelligent Transportation Systems (ITS) Data Exchange Specification (TIDES) and GTFS-ride are two projects developing standards for transit ridership data from passenger coun- ters and fare collection. Still in their early stages, these standards look to support tools and applications for transit analysis. The Los Angeles DOT has developed an emerging sharing data standard called Mobility Data Specification (MDS) which serves as a model for data sharing policy between cities and the private sector. This data is ideally shared through an API, which has the advantage of allowing cities to see a dynamic, continuous picture of fleet usage and placement. In addition, the standard can make data analysis more efficient. A major challenge with data standards is adoption. Transit agen- cies and the vendors they employ need to cooperate, and this requires How Do Data Standards effort from one or both parties to convert existing systems to meet Get Adopted? new standards. Generally, standards are adopted either when there 1. Good standards require champions is a clear benefit (the proverbial carrot), or when their adoption is and resources to support a standards- mandated (the stick). As an example, transit agencies quickly adopted making activity, including respected GTFS because it allowed their information to be displayed in apps experts. their customers were using. In contrast, many transit agencies submit 2. Well-designed standards have few standardized data to the NTD, because it is required to do so if they optional fields and can evolve over receive funding through §5307 or §5311 formula grants [Title 49 time. United States Code (USC) §5335(a)]. 3. For adoption, there needs to be A standard needs champions as well as resources. Resources are key either a clear benefit or a mandate. to supporting a standards-making activity. For traction, the activity should include experts who are respected within the industry. Well- developed standards minimize the number of optional fields, which limit the usefulness of the standards. The standards-making activity should include testing, and a certification system may need to be developed to evaluate com- pliance with the standard. Good standards can evolve over time. As an example, GTFS has limited ability to describe fares but has the potential to be extended to handle more complex fare policies (Wang 2014).

Major Challenges   53   Data Standards in the Energy Sector The electric utility sector is currently undergoing standards development. The Department of Energy (DOE) published high level guidance about privacy (SEE Action 2012) and data interoperability (ICF 2016). The DOE guidance points to federal initiatives that are relevant across industries—Fair Information Practice Principles (FIPPs), the Consumer Privacy Bill of Rights, FTC Codes of Conduct, non-binding industry standards, and emerging “privacy seal” initiatives. This top-down guidance notes that it is up to state regulators, utilities, and third- party service providers to define standards and implement them (U.S. Depart- ment of Energy 2015). Time will tell whether this patchwork approach is successful. Some state regulators are moving quickly, while others are not. The DOE’s report summarizes feedback from regulators and utilities about the challenges to develop inter­operability: regulators lack the technological expertise and time to learn about interoperability needs; regulators lack access to industry publications and working group findings; and some utilities prefer proprietary systems rather than standardized services. In summary, the process of data standardization depends not just on an individual transit agency’s technical and organizational ability to apply standards, but also on a strong coalition that has built effective, flexible, and respected standards, and on motivational carrots or sticks to promote the standard’s adoption. The majority of transit agency interviewees recognized the need for more data standards but felt that external organizations or regulators would be required to implement them. Public Records Requests and Access to Data Private sector interviewees cited that protecting user privacy was the most common concern about providing data to transit agencies. They are also concerned that, under state public records laws, the shared data from these private companies’ users could fall into the public domain, vio- lating their customers’ privacy. For this reason, private companies often share aggregated data or provide access to an analytical platform rather than providing data directly. As described in Section 5.2, private companies also may share data with a third party rather than with a transit agency directly. One transit agency interviewee pointed out that laws can appear arbitrary or out of date. A law in their state exempts smart card data from FOIA requests on the basis that it contains individual records. However, data from smartphone apps, which the transit agency is planning to collect, will not be protected from release under the same law. As described in Section 5.3, transit agencies are beginning to take an active role in shaping legislation.

Next: Chapter 7 - Conclusions and Next Steps »
Data Sharing Guidance for Public Transit Agencies—Now and in the Future Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Transit agencies are beginning to harness the value of external data, but challenges remain.

The TRB Transit Cooperative Research Program's TCRP Research Report 213: Data Sharing Guidance for Public Transit Agencies – Now and in the Future is designed to help agencies make decisions about sharing their data, including how to evaluate benefits, costs, and risks.

Many transit agencies have realized benefits from sharing their internal data sets, ranging from improved customer information, to innovative research findings that help the transit agency improve performance.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!