Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
1 Effective transit planning, operations, maintenance, and analysis involve coordination among a wide array of internal and external partners to optimize system performance. As Peter Drucker has said, âcanât improve what you donât measureâ (Drucker 2006). Data are central to improving and optimizing service, becoming a major asset and investment to the data-driven organization. Transit agencies manage more data due to the advent of new technologies and harness a new ecosystem of tools to analyze the ever-increasing size of data sets. Yet, data processing, exploration, developing performance metrics, and communicating decisions are often scattered across many departments. Additionally, existing policies and processes often lack cohesion within their existing agency, and agencies operate with limited and shrinking resources, a changing technological landscape, and shifting roles and expecta- tions. To manage data assets, transit agencies need to become more data driven by â¢ Responding with the ability to transform data into information and decisions; â¢ Supporting interoperability to share information; â¢ Becoming effective data custodians to manage data discovery and access; and â¢ Becoming a trusted source to manage data quality and privacy. To that end, key data are increasingly being planned and managed to support the enter- prise rather than just supporting a single project, which is typical of many organizations. Data governance, therefore, is becoming increasingly important for organizations and overall systems of organizations that work together. Many organizations that adopt data governance practices recognize the need to undergo a cultural transformation, changing the ways indi- viduals and systems handle and process data. This study focuses on understanding the people, processes, and tools agencies use to adopt a data-driven culture. Specifically, the objective of this synthesis on the state of the practice on the Transit Analyst Toolbox was to describe how transit agencies manage, store, analyze, and, most importantly, govern their transit service data. The research consisted of a literature review, industry survey, and five case examples that covers data management, data governance, and use of service data in open source software planning tools. The Transit Analyst Toolbox synthesis focuses on data governance activities adopted by tran- sit to support âanalysis and approaches for reporting, communicating, and examining transit data.â The toolbox consists of resources, curation, and management practices to address â¢ Transit Service Data and Performance Measures â¢ Transit Data Management: Data Collection and Management Tools â¢ Transit Data Governance S U M M A R Y The Transit Analyst Toolbox: Analysis and Approaches for Reporting, Comsmunicating, and Examining Transit Data
2 The Transit Analyst Toolbox: Analysis and Approaches for Reporting, Communicating, and Examining Transit Data Transit Service Data and Performance Measures Transit service data are derived from several dependent systems. In many cases, data col- lected from one system feeds other systems (see Table 1 in Chapter 2), in which the quality of the data collected feed back into service planning. With increasingly more and better data collection tools, major challenges are driven by service data management systems. These include Volume: Resources and tools are needed to manage large data sets, and with the data col- lection tools that collect larger volumes of data and big data sources from third parties, managing and curating the data are major challenges today. Quality: Data quality will persist in being one of the major challenges for transit agen- cies. Quality checking such as missing time ranges or inconsistencies requires proce- dures and rules to identify the issues. A major source of challenges comes from vendor systems. Several survey respondents indicated that persistent errors recur from vendor operated systems. Integration: Integration is a means of associating multiple data sets for some logical metric. Agencies differentiate quality from integration methods in that quality is a âcleaningâ process, whereas integration requires a logical consistency rule such as two data sets using the same identification method or timing sensor for joining data sets. Integration errors may be mitigated if consistent quality control rules are applied to all data prior to storage and integration. As noted by Utah Transit Authority (UTA), applying an enterprise data dictionary, system of record (SOR), or single version of truth will promote consistency among all data acquired from different sources. Access: Open data standards such as General Transit Feed Specification (GTFS) have been a boon for sharing key transit service data. Additional specifications are in development or are not yet widely used such as GTFS Rides, GTFS Pathways, and more. One of the major reasons is the lack of tools to collect and quality check the data sets. Unlike GTFS where a plethora of tools exist to generate, collect, transform, and check data sets, these new specifications do not have easy-to-use tools to collect and quality check the data. Transit Data Management Industry data management tools are being adopted by large transit agencies to manage and analyze their service data. Many large agencies and some medium size agencies are applying data warehouse, cloud storage, and quality checking tools for their data sets. Yet, few best practice guides and transit data processes are available to identify critical challenges faced by practitioners to manage their data. Many agencies rely on vendor products to clean and curate their data. A single report developed on transit best practices for data management of marketing data (Strathman et al. 2008) identified five critical elements to manage transit Intelligent Transportation Systems (ITS) data (an equivalent of service data). They are â¢ An information system for archiving ITS data; â¢ Enterprise-level applications, the most important of which in the context of this guide- book is geographic information system (GIS); â¢ Processes for screening and validating ITS data to ensure integrity; â¢ Software tools that support reporting and analysis; and â¢ Human resources with the skills to maintain the infrastructure and, through analysis of ITS data, inform strategic decisions in marketing and other agency functions.
Summary 3 The survey specifically asked about major challenges to service data collection, curation, and management. The common response can be characterized as Ensuring quality data in a timely manner with sufficient resources (staff and skills) to accomplish the task. An issue that all three agencies still struggle with is managing and ensuring the quality of facility data, particularly stop level data. With the number of organizational units and technology systems that create, manage, and use the data, there are still challenges with creating systems of record for the data. The common industry approach is the adoption of a data governance framework that builds enterprise approaches, controls, and rules for managing data. Transit Data Governance Data governance has not been widely adopted by the transit industry, in contrast to state and highway organizations wherein the National Association of State Chief Informa- tion Officers (NASCIO) and the United States Department of Transportation (U.S. DOT) Federal Highway Administration (FHWA) initiated programs for constituent organi- zations over a decade ago. A data governance framework addresses peopleâroles and responsibilities of organizational units and persons curating data, curation processes, and policiesâto address data life-cycle issues. These data life-cycle issues include implementing a capability maturity model (CMM) measuring adoption and implementation of processes associated with data and describing the rules, tools, and technologies that ensure quality and integration. There is no program developed for transit agencies, and few agencies responding to the survey had or were developing data governance frameworks. The case example on data governance included three agencies of varying sizes that are in various states of adopting data governance. The agencies that participated in the data governance case examples described common reasons motivating them to begin their governance journey. In the three case examples on data governance, each agency initiated their data governance efforts for the same basic reasonâimprove data quality and ensure data and analysis results using the data were consis- tent. The three agencies only started their data governance in the last year, and they have not yet matured to a âsteady state.â The three agencies initiated their governance projects using different approaches. UTA used a bottom-up, noninvasive approach, while Alameda-Contra Costa Transit District (AC Transit) and King County Metro (KCM) initiated their processes while deploying a new enterprise warehouse that flow down to the data sources that feed the system. Common actions were used by the three agencies to establish data governance, including â¢ Eliminating duplicate data sets prior to establishing data governance, and they started with an enterprise data set. â¢ Storing their curated data in an enterprise database, of which the rawest data were accessible to everyone.
4 The Transit Analyst Toolbox: Analysis and Approaches for Reporting, Communicating, and Examining Transit Data â¢ Starting with a slice of their corporate data, of which service data were among the first slice. â¢ Taking an agile approach to implement governance, along with targeting a slice of data, each agency focused on an incremental approach to establishing formal data governance processes, meetings, and structures. All three efforts applied âjust enough governanceâ approach so that the organization was eased into changing behaviors. â¢ Creating a role that mediated between the technology and businessâcollecting needs and quality issues from the business, and communicating the information to the technology groups. â¢ Describing and assigning roles and responsibilities to data stewards and data domain stewards (data subject matter experts/business analysts) early in the process. â¢ Integrating existing processes and people responsible for governance into the data governance framework. Except for UTA, this means that no new meetings were set up to govern data; existing operations meetings covered data issues when they arose. â¢ Educating data stewards (e.g., information technology or IT) on their role and the impacts of their actions on downstream users. Executive leadership support for data governance has been cited by agency and in the literature as a critical element to ensure accountability for governance compliance enterprise wide. Suggestions for Further Research The synthesis concludes with a list of potential research topics. These include the following: â¢ Guidelines for Transit Agencies for Implementing Data Governance. A guideline that pro- vides several different approaches for implementing data governance could promote best practices commensurate with the size and needs of each organization. The guidelines should address different organizational structures such as an agency wherein the county leads data governance versus an independent structure. â¢ Synthesis on Automated Passenger Counter (APC). A synthesis of the practice on new and emerging technologies for automated passenger counting including technologies, performance quality, error sources, approach to sampling methods, validation, and secondary counting methods. â¢ Ridership Analysis Trends. Review trends associated with ridership data collection. The litera- ture shows a migration from APCs to Fare Media. Has Covid-19 affected this evolution? â¢ Transit Stop Level Data Management Guidebook. A guidebook would describe the busi- ness, tools, quality checking, curation, and resources required to manage stop level infor- mation. The guidebook would provide best practices from agencies that have successfully managed their data. â¢ Big Data and Transit Planning Opportunities. Respondents cited significant challenges managing their data, let alone big data and third-party sources, but there are data sources that could collect to augment their planning and operational data from communications tech- nologies already deployed on buses and subways. For example, similar to traffic operators who track Bluetooth or Wi-Fi signals to estimate traffic flow and congestion, transit agencies can use these transceivers to estimate congestion, track boardings and alightings, and under- stand flow patterns through stations. â¢ Machine Learning and Artificial Intelligence (ML/AI) for Transit Analytics Using Big Data. For transit agencies to become nimble, predicting behavior (operations, customers, preven- tive maintenance) would be of significant benefit. Prediction algorithms implementing ML/AI require large historic or streaming real time data sets to implement. A study and hands-on discussion guide on how to get started to apply several âtypicalâ prediction scenarios would help transit agencies on how they can harness the technology. For example, the guide would
Summary 5 include topics such as where to acquire the data, how to filter and clean the data, how to access the data, and what tools to use. â¢ Best Practices in Managing Data for Small and Rural Transit Agenciesâ Data-Driven Systems. For small and rural operators, there are templates and techniques that they can adopt, for which any computer savvy staff can effectively manage, curate, and apply data sources to support technology and data-driven connected technologies. This best practices guidebook will focus exclusively on smaller agencies, spreadsheet templates, and nonproprietary software tools that support their needs.