Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
79 General Summary of Findings The synthesis of the practice on The Transit Analyst Toolbox focuses on data governance activi- ties adopted by transit to support âanalysis and approaches for reporting, communicating, and examining transit data.â The toolbox consists of â¢ Transit Service Data and Performance Measures â¢ Transit Data Management: Data Collection and Management Tools â¢ Transit Data Governance The findings related to these areas are summarized. Transit Service Data Transit service data acquisition and collection is aided by technologies to generate, collect, and analyze raw data, and to generate performance metrics. Even as technologies help to collect more data, and accurate data, new challenges arise to manage larger volumes of data from diverse sources. Major challenges posed by service data today are described as the following: Volume: Resources and tools are needed to manage large data sets, and with the data collec- tion tools that collection larger volumes of data and big data sources from third parties, managing and curating the data are a major challenge today. Quality: Data quality will persist in being one of the major challenges for transit agencies. Qual- ity checking such as missing time ranges or inconsistencies require procedures and rules to identify the issues. A major source of challenges come from vendor systems. Several survey respondents indicated that persistent errors recur from vendor-operated systems. Integration: Integration is a means of associating multiple data sets for some logical metric. Agencies differentiate quality from integration methods in that quality is a âcleaningâ process, whereas integration requires a logical consistency rule such as two data sets using the same identification method or timing sensor for joining data sets. Integration errors may be mitigated if consistent quality control rules are applied to all data prior to storage and integra- tion, as noted by UTA, applying an enterprise data dictionary, SOR, or single version of truth will promote consistency among all data acquired from different sources. Access: Open data standards such as GTFS have been a boon for sharing key transit service data. Additional specifications are in development or are not yet widely used such as GTFS Rides, GTFS Pathways, and more. One of the major reasons is the lack of tools to collect and quality check the data sets. Unlike GTFS where a plethora of tools exist to generate, collect, transform, and check data sets, these new specifications do not have easy to use tools to collect and quality check the data. C H A P T E R 5 Conclusions and Suggestions for Future Research
80 The Transit Analyst Toolbox: Analysis and Approaches for Reporting, Communicating, and Examining Transit Data Current initiatives such as the international transit data conferences, TCRP SG-18 project, and TIDES are just beginning to address transit data quality and integration issues, though they do not generally address critical management and governance issues that help overcome some of the major challenges faced by transit. In summary, with the increasing service data volume and variety, many transit agencies turn to data management tools to support their integration and data archiving needs. These challenges may be mitigated by applying data management methods and practices using tools adopted by other industries. However, industry best practices related to data management have not been promulgated or researched to any major extent. Transit Data Management Industry data management tools are being adopted by large transit agencies to manage and analyze their service data. Many large agencies and some medium size agencies are applying data warehouse, cloud storage, and quality checking tools for their data sets. However, there are few guidebooks and best practice manuals for transit agencies to manage their service data and resul- tant performance data. The critical elements described by a 2008 report on transit best practices by Strathman included the following components: â¢ An information system for archiving ITS data; â¢ Enterprise-level applications, the most important of which in the context of this guidebook is GIS; â¢ Processes for screening and validating ITS data to ensure integrity; â¢ Software tools that support reporting and analysis; and â¢ Human resources with the skills to maintain the infrastructure and, through analysis of ITS data, inform strategic decisions in marketing and other agency functions. As described in the survey, the major service data challenges faced by transit agencies include ensuring quality data in a timely manner with sufficient resources (staff and skills) to accomplish the task. As highlighted in the Data Management case example category, representing three orga- nizations of varying modes, sizes, and geographical areas, the type of data management system depends on the type of resources available to the agency. Metro Transit and AC Transit rely on their IT departments for operating and managing their systems. They handle the day-to-day oper- ations, backups, and programming activities. One common characteristic of all three examples is the critical contribution of the business (operations and planning) to communicate data needs, characteristics, uses, and quality. A common issue that all three agencies still struggle with is managing and ensuring the quality of facility data, particularly stop-level data. With the number of organizational units and technology systems that create, manage, and use the data, there are still challenges with creating SORs for the data. Many industry associations that have similar challenges as transit does (e.g., FHWA, DAMA, and NASCIO) advocate addressing 10 data management areas: â¢ Data GovernanceâPlanning, supervision, and control over data management and use. â¢ Data Architecture ManagementâAn integral part of the enterprise architecture. â¢ Data DevelopmentâThe data-focused activities within the SDLC, including data modeling and data requirements analysis, design, implementation, and maintenance of databases data- related solution components. â¢ Database Operations ManagementâPlanning, control, and support for structured data assets across the data lifecycle, from creation and acquisition through archival and purge.
Conclusions and Suggestions for Future Research 81 â¢ Data Security ManagementâEnsuring privacy, confidentiality, and appropriate access. â¢ Reference and Master Data ManagementâPlanning, implementation, and control activities to ensure consistency of contextual data values with a âgolden versionâ of these data values. â¢ Data Warehousing and Business Intelligence ManagementâEnabling access to decision sup- port data for reporting and analysis. â¢ Document and Content ManagementâStoring, protecting, indexing, and enabling access to data found in unstructured sources (electronic files and physical records). â¢ Metadata ManagementâIntegrating, controlling, and delivering metadata. â¢ Data Quality ManagementâDefining, monitoring, and improving data quality. One of the major recommendations of these industry groups is the adoption of a data gover- nance framework that builds enterprise approaches, controls, and rules for managing data. Transit Data Governance Data governance has not been widely adopted by the transit industry in contrast to state and highway organizations wherein NASCIO and FHWA initiated programs for constituent orga- nizations over a decade ago. As described by many organizations like FHWA and NASCIO, a data governance framework addresses people, processes, and rules. The people aspect includes roles and responsibilities, including persons responsible for curating data. The processes aspect includes the curation processes and policies to address data life-cycle issues, including implementing a capability maturity model (CMM) to measure adoption and implementation of processes asso- ciated with data. Finally, the rules aspect includes tools and technologies that ensure quality and integration of the data. Responses to the survey show that many transit agencies implement elements of a data gover- nance framework for slices of the data sets that they manage, although even organizations that deploy robust data management systems continue to have integration challenges. The agencies that participated in the data governance case examples described common reasons motivating them to begin their governance journey. In the three case examples on data governance, each agency initiated their data governance efforts for the same basic reasonâimprove data quality and ensure data and analysis results using the data were consistent. The three agencies only started their data governance in the last year, and they have not yet matured to a âsteady state.â The three agencies initiated their governance projects using different approaches. UTA used a bottom-up, noninvasive approach, while AC Transit and KCM initiated their processes while deploying a new enterprise warehouse that flows down to the data sources that feed the system. Common actions were used by the three agencies to establish data governance, including â¢ Eliminating duplicate data sets prior to establishing data governance and starting with an enterprise data set. â¢ Storing their curated data in an enterprise database, of which the rawest data were accessible to everyone. â¢ Starting with a slice of their corporate data of which service data were among the first slice. â¢ Taking an agile approach to implement governance, along with targeting a slice of data, each agency focused on an incremental approach to establishing formal data governance processes, meetings, and structures. All three efforts applied âjust enough governanceâ approach so that the organization was eased into changing behaviors. â¢ Creating a role that mediated between the technology and businessâcollecting needs and quality issues from the business and communicated the information to the technology groups. â¢ Describing and assigning roles and responsibilities to data stewards and data domain stewards (data subject matter experts/business analysts) early in the process.
82 The Transit Analyst Toolbox: Analysis and Approaches for Reporting, Communicating, and Examining Transit Data â¢ Integrating existing processes and people responsible for governance into the data governance framework. Except for UTA, this means that no new meetings were set up to govern data; existing operations meetings covered data issues when they arose. â¢ Educating data stewards (e.g., IT) on their role and the impacts of their actions on down- stream users. Executive leadership support for data governance has been cited by agency and in the litera- ture as a critical element to ensure accountability for governance compliance enterprise-wide. Suggestions for Further Study The synthesis identified several areas. Further study could address gaps in the literature related to data collection technology methods and performance, trends in data collection, best practices for managing certain types of data, and lessons learned to apply data governance rules. Specifically, suggestions include the following topics: 1. Transit agencies have been slow to adopt data governance frameworks as seen by the survey and case examples. FHWA guides and outreach efforts have contributed to state DOTs and regional transportation organizations adopting data governance practices, peer review, workshops, and tools that support implementation. â Guidelines for Transit Agencies for Implementing Data Governance. A guideline that pro- vides several different approaches for implementing data governance could promote best practices commensurate with the size and needs of each organization. The guidelines should address different organizational structures such as an agency wherein the county leads data governance versus an independent structure. 2. Passenger counting technologies have migrated over the last 12 years since a synthesis of the practice on passenger counting systems (Boyle 2008) was developed. There are several topics related to collecting ridership information that would be of interest: â Synthesis on APC. A synthesis of the practice on new and emerging technologies for auto- mated passenger counting, including technologies, performance quality, error sources, approach to sampling methods, validation, and secondary counting methods. â Ridership Analysis Trends. Review trends associated with ridership data collection. The literature shows a migration from APCs to Fare Media. Has Covid-19 impacted this evolution? 3. A significant number of survey respondents stated that managing their stop level information was challenging, and that there were limited tools to manage data from planning, physical implementation to inventory, and customer information. â Transit Stop Level Data Management Guidebook. A guidebook would describe the busi- ness, tools, quality checking, curation, and resources required to manage stop-level infor- mation. The guidebook would provide best practices from agencies that have successfully managed their data. 4. Promote research in big data and prediction analytics for transit. â Big Data and Transit Planning Opportunities. Respondents cited significant challenge managing their data, let alone big data and third-party sources, but there are data sources that could collect to augment their planning and operational data from communications tech- nologies already deployed on buses and subways. For example, similar to traffic operators who track Bluetooth or Wi-Fi signals to estimate traffic flow and congestion, transit agen- cies can use these transceivers to estimate congestion, track boardings and alightings, and understand flow patterns through stations. â ML/AI for Transit Analytics Using Big Data. For transit agencies to become nimble, pre- dicting behavior (operations, customers, preventive maintenance) would be of significant
Conclusions and Suggestions for Future Research 83 benefit. Prediction algorithms implementing ML/AI require large historic or streaming real-time data sets to implement. A study and hands-on discussion guide on how to get started to apply several âtypicalâ prediction scenarios would help transit agencies how they can harness the technology. For example, the guide would include topics such as where to acquire the data, how to filter and clean the data, how to access the data, and what tools to use. 5. Most of the suggested research topics require skilled data management staff or specialized data applications. However, as seen from survey results, smaller agencies tend to procure turnkey data management tools because of limited resources, funding, or staff. The limitations tend to reduce their ability to innovate using data-driven technologies. There are methods and tech- nologies that agencies can request during procurements to open their systems, leverage tools, or collaborate with other organizations to innovate. â Best Practices in Managing Data for Small and Rural Transit Agencies Data-Driven Systems. For small and rural operators, there are templates and techniques that they can adopt for which any computer-savvy staff can effectively manage, curate, and apply data sources to support technology and data-driven connected technologies. This best practices guidebook will focus exclusively on smaller agencies, spreadsheet templates, and nonpropri- etary software tools that support their needs.