National Academies Press: OpenBook

A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry (2010)

Chapter: Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking

« Previous: Chapter 1 - Introduction
Page 6
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 6
Page 7
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 7
Page 8
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 8
Page 9
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 9
Page 10
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 10
Page 11
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 11
Page 12
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 12
Page 13
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 13
Page 14
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 14
Page 15
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 15
Page 16
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 16
Page 17
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 17
Page 18
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 18
Page 19
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 19
Page 20
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 20
Page 21
Suggested Citation:"Chapter 2 - Performance Measurement, Peer Comparison, and Benchmarking." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.
×
Page 21

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

6Performance measurement is a valuable management tool that most organizations conduct to one degree or another. Transit agencies measure performance for a variety of rea- sons, including: • To meet regulatory and reporting requirements, such as annual reporting to the FTA’s National Transit Database (NTD), to a state Department of Transportation (DOT), and/or to another entity that provides funding support to the agency; • To assess progress toward meeting internal and external goals, such as (a) measuring how well customers and poten- tial customers perceive the quality of the service provided, or (b) demonstrating how the agency’s service helps sup- port regional mobility, environmental, energy, and other goals; and • To support agency management and oversight bodies in making decisions about where, when, and how service should be provided, and in documenting the impacts of past actions taken to improve agency performance (1). Taken by themselves, performance measures provide data, but little in the way of context. To provide real value for per- formance measurement, measures need to be compared to something else to provide the context of “performance is good,” “performance needs improvement,” “performance is getting better,” and so on. This context can be provided in a number of ways: • By comparing performance against internal or external ser- vice standards or targets to determine whether minimum policy objectives or regulatory requirements are being met; • By comparing current performance against the organiza- tion’s past performance to determine whether performance is improving, staying the same, or getting worse, and to what degree; and • By comparing the organization’s performance against that of similar organizations to determine whether its perfor- mance is better than, about the same as, or worse than that of its peers. All of these methods of providing context are valuable and all can be integrated into an organization’s day-to-day activ- ities. This report, however, focuses on the second and third items, using peer comparisons and trend analysis as means to (a) evaluate a transit agency’s performance, (b) identify areas of relative strength and weakness compared to its peers, and (c) identify high-performing peers that can be studied in more detail to identify and adopt practices that could improve the agency’s own performance. Benchmarking has been defined in various ways: • “The continuous process of measuring products, services, and practices against the toughest competitors or those companies recognized as industry leaders” (David Kearns, chief executive officer, Xerox Corporation) (2). • “The search for industry best practices that lead to superior performance” (Robert C. Camp) (2). • “A process of comparing the performance and process characteristics between two or more organizations in order to learn how to improve” (Gregory Watson, former vice president of quality, Xerox Corporation) (3). • “The process of identifying, sharing, and using knowledge and best practices” (American Productivity & Quality Center) (4). • Informally, “the practice of being humble enough to admit that someone else is better at something and wise enough to try to learn how to match, and even surpass, them at it” (American Productivity & Quality Center) (4). The common theme in all these definitions is that bench- marking is the process of systematically seeking out best prac- tices to emulate. In this context, a performance report is not the desired end product; rather, performance measurement is a tool used to provide insights, raise questions, and identify C H A P T E R 2 Performance Measurement, Peer Comparison, and Benchmarking

other organizations that one may be able to learn from and improve. Benchmarking in the Private Sector The process of private-sector benchmarking in the United States has matured to the point where the practice and bene- fits of benchmarking are well understood. Much of this progress can be attributed to the efforts of the Xerox Corpo- ration, which, in 1979, decided to initiate competitive bench- marking in response to significant foreign competition. Since then, benchmarking has been embraced by business leaders and has become the basis for many of the Malcolm Baldrige National Quality Award’s performance criteria. The first use of documented private-sector benchmark- ing in the United States was performed by Xerox in 1979 (2, 5). Before this time, American businesses measured their per- formance against their own past performance, not against the performance of other companies. In the period between 1975 and 1979, Xerox’s return on net assets dropped from 25% to under 5% due to its loss of patent protection and the subsequent inflow of foreign competition. Xerox was compelled to take action to stem this precipitous decline in market share and profitability. The CEO of Xerox, David Kearns, decided to analyze Xerox’s competition to deter- mine why they were gaining market share. Xerox discovered that its development time for new products was twice as long as its competitors’. Xerox also found out that its man- ufacturing cost was the same as the sales price of competing products. Xerox did not simply compare itself to one other entity. By observing and incorporating successful practices used by other businesses in its weak areas, Xerox was able to achieve a turnaround. For instance, Xerox examined Sears’ inventory management practices and L.L. Bean’s ware- house operations, and of course, carefully analyzed the de- velopment time and cost differences between itself and its competition (2). In the early to mid 1980s, U.S. companies realized the many benefits of information-sharing and started developing net- works. Examples of these networks, which eventually became benchmarking networks, are provided below: • The General Motors Cross-Industry Study of best practice in quality and reliability was a 1983 study of business lead- ers in different industries to define those quality manage- ment practices that led to improved business performance. • The General Electric Best-Practice Network, a consortium of 16 companies, met regularly to discuss best practice in noncompetitive areas. These companies were selected so that none competed against any other participant, thus creating an open environment for sharing sensitive infor- mation about business practices. • Hewlett-Packard (HP) also had a wide variety of collabora- tive efforts with other businesses. For instance, HP helped Proctor & Gamble (P&G) understand policy deployment. The nature of this collaboration included inviting two P&G executives to work inside HP for a 6-month period to experience how HP’s planning process worked. Ford and HP also engaged in numerous business practice sharing activities (6). The criteria for the Malcolm Baldrige National Quality Award (7) were developed in the 1980s by 120 corporate qual- ity executives who aimed to agree on the basic parameters of best practice. The inclusion of benchmarking elements in many of the evaluation criteria of the Malcolm Baldrige Na- tional Quality Award and Xerox’s receipt of the award helped benchmarking and its benefits gain prominence throughout the private sector. In 1992, the International Benchmarking Clearinghouse (IBC) was established to create a common methodology and approach for benchmarking. The IBC: • Created a benchmarking network among a broad spectrum of industries and was supported by an information data- base and library, • Conducted benchmarking consortium studies on topics of common interest to members, • Standardized training materials around a simple bench- marking process and developed business process taxonomy that enabled cross-company performance comparison, and • Accelerated the diffusion of benchmarking as an accepted management practice through the propagation of the Bench- marking Code of Conduct (8) governing how companies collaborate with each other during the course of a study. In 1994, the Global Benchmarking Network (GBN) was es- tablished to bring together disparate benchmarking efforts in various nations, including the U.K. Benchmarking Centre, the Swedish Institute for Quality, the Informationszentrum Benchmarking in Germany, and the Benchmarking Club of Italy, along with U.S. benchmarking organizations. Benchmarking in the Public Sector A number of public agencies in the United States have im- plemented benchmarking programs. This section highlights a few of them. New York City’s COMPSTAT Program New York City has employed the Comparative Statistics (COMPSTAT) program since the Giuliani mayoral era and many believe that this system has helped the city reduce crime 7

and make improvements in other areas as well. The program became nationally recognized after its successful implemen- tation by New York City in the mid-1990s. The actual origin of COMPSTAT was within New York City Transit, when its police force began using comparative statistics for its law en- forcement needs and saw dramatic declines in transit crime. The system significantly expanded after Mayor Rudolph Giuliani took office and decided to implement COMPSTAT on a citywide basis. The internal benchmarking element within COMPSTAT is embodied in the unit versus unit comparison. Commanders can monitor their own performance, evaluate the effective- ness of their strategies, and compare their own success to that of others in meeting the established performance objectives. Timely precinct-level crime statistics reported both internally to their peers and to the public motivated commanders to im- prove their crime reduction and prevention strategies and come up with innovative ideas to fight crime. COMPSTAT eventually became the best practice not only in law enforcement but also in municipalities. Elements of COMPSTAT have been implemented by cities across the United States to a greater or lesser extent. An example of a municipality that has implemented a similar system is Bal- timore. City officials from Baltimore visited New York City, obtained information about COMPSTAT, and initiated CITISTAT, a similar performance evaluation system, described below, that facilitates continuous improvement (9). Baltimore’s CITISTAT CITISTAT is used by the mayor of Baltimore as a manage- ment and accountability tool. The tenets of Baltimore’s pro- gram are similar to that of New York City’s: • Accurate and timely intelligence, • Effective tactics and strategies, • Rapid deployment of resources, and • Relentless follow-up and assessment. Heads of agencies and bureaus attend a CITISTAT meet- ing every other week with the mayor, deputy mayors, and key cabinet members. Performance data are submitted to the CITISTAT team prior to each meeting and are geocoded for electronic mapping. As wtih New York City’s program, the success of CITISTAT has attracted visitors from many government agencies across the United States and from abroad (10). District of Columbia’s CapStat The District of Columbia also developed a performance- based accountability process. CapStat identifies opportunities to improve the performance and efficiency of DC’s govern- ment and provide a higher quality of service to its residents. The mayor and city administrator have regular meetings with all executives responsible for improving performance for a specific issue, examine and interpret performance data, and develop strategies to improve government services. The effec- tiveness of the strategies is continuously monitored, and de- pending on the results, the strategies are modified or contin- ued. CapStat sessions take place at a minimum on a weekly basis (11). Philadelphia’s SchoolStat Program Philadelphia’s SchoolStat program was modeled after COMPSTAT. During the 2005–2006 school year, Philadel- phia began using the SchoolStat performance management system. All 270 principals, the 12 regional superintendents, and the chief academic officer attend the monthly meetings at which performance results are evaluated and strategies to improve school instruction, attendance, and climate are as- sessed. One major benefit of the program is that information and ideas are disseminated vertically and horizontally across the school district. Many performance improvements were seen in the program’s first year in operation (12). Air Force In the Persian Gulf War, the Air Force shipped spare parts between its facility in Ohio and the Persian Gulf. The success of its rapid and reliable parts delivery system can be credited to the Air Force benchmarking of Federal Express’s shipping methods (13). Benchmarking in the Public Transit Industry International Efforts Benchmarking Networks Several benchmarking networks, voluntary associations of organizations that agree to share data and knowledge with each other, have been developed internationally. There were four notable international public transit benchmarking net- works in operation in 2009. Three of these were facilitated by the Railway Technology Strategy Centre at Imperial College London and shared common processes, although the net- works catered to differing modes and city sizes. The two rail networks also shared common performance indicators. The first of the networks now facilitated by Imperial Col- lege London, CoMET (Community of Metros), was initiated in 1994 when Hong Kong’s Mass Transit Railway Corpora- tion (MTR) proposed to metros in London, Paris, New York, 8

and Berlin that they form a benchmarking network to share information and work together to solve common problems. Since that time, the group has expanded to include 13 metros and suburban railways in 12 of the world’s largest cities: Beijing, Berlin, Hong Kong, London, Madrid, Mexico City, Moscow, New York, Paris (Metro and RER), Santiago, São Paulo, and Shanghai. All of the member properties have an- nual ridership of over 500 million (14). The Nova group, which started in 1997, focuses on medium- sized metros and suburban railways with ridership of under 500 million. As of 2009, its membership consisted of Bangkok, Barcelona, Buenos Aires, Delhi, Glasgow, Lisbon, Milan, Montreal, Naples, Newcastle, Rio de Janeiro, Singapore, Taipei, Toronto, and Sydney (15). Imperial College London also facilitates the International Bus Benchmarking Group, which started in 2004 and had 11 members as of 2009: Barcelona, Brussels, Dublin, Lisbon, London, Montreal, New York, Paris, Singapore, Sydney, and Vancouver. The bus group shares the same basic benchmark- ing process as its rail counterparts, but uses a different set of key performance indicators (16, 17). The fourth international benchmarking network that was active in 2009 was Benchmarking in European Service of pub- lic Transport (BEST). The program was initiated by Stock- holm’s public transit system in 1999. Originally conceived of as a challenge with the transit systems in three other Nordic capital cities—Copenhagen, Helsinki, and Oslo—it quickly evolved into a cooperative, non-competitive program with the goal of increasing public transport ridership. After a pilot pro- gram in 2000, BEST has reported results annually since 2001. In addition to the original four participants, Barcelona, Geneva, and Vienna have participated more-or-less continuously since 2001; Berlin and Prague have participated more recently; and London and Manchester also participated for a while. The program is targeted at regions with 1 to 3 million inhabitants that operate both bus and rail services, but does not strictly hold to those criteria. The network is facilitated by a Norway- based consultant (18). Common features of these four benchmarking networks include: • Voluntary participation by member properties and agree- ment on standardized performance measures and measure definitions; • Facilitation of the network by an external organization (a university or a private consulting firm) that is responsible for compiling annual data and reports, performing case studies, and arranging annual meetings of participants; • A set of annual case studies (generally 2–4 per year) on topics of interest to the participants; • Confidentiality policies that allow the free flow of informa- tion within the network, but enforce strict confidentiality outside the network, unless all participants agree to release particular information; • An attitude that performance indicators are tools for stim- ulating questions, rather than being the output of the benchmarking process. The indicators lead to more in- depth analyses that in turn identify processes that produce higher levels of performance. The three Imperial College–facilitated networks use rela- tively traditional transit performance measures as their “key performance indicators.” In contrast, BEST uses annual tele- phone citizen surveys (riders and non-riders) in each of its participating regions to develop its performance indicators. According to BEST’s project manager, the annual cost to each participating agency is in the range of 15,000 to 25,000 euros, depending on how many staff participate in the annual sem- inar and on the number of case studies (“Common Interest Groups”) in which the agency participates. The cost also includes each agency’s share of the telephone survey and the cost of compiling results. Annual costs for the other three networks were not available, but the CoMET project manager has stated that the “real, tangible benefits to the participants . . . have far outweighed the costs” (19, 20). European Benchmarking Research The European Commission has sponsored several studies relating to performance measurement and benchmarking. Citizens’ Network Benchmarking Initiative. The Citi- zens’ Network Benchmarking Initiative began as a pilot project in 1998, with 15 cities and regions of varying sizes and characteristics participating. Participation was voluntary, with the cities supplying the data and providing staff time to participate in working groups. The European Commission funded a consultant to assemble the data and coordinate the working groups. The goal of the pilot project was to test the feasibility of comparing public transport performance across all modes, from a citizen’s point-of-view. During the pilot, 132 performance indicators were tested, which were refined to 38 indicators by the end of the process. The working groups addressed four topics; working group members for each topic made visits to the cities already achieving high per- formance in those areas, and short reports were produced for each topic area. Following the pilot project, the program was expanded to 40 cities and regions. As before, agency participation was vol- untary and the European Commission funded a consultant to assemble the data and coordinate the working groups. As Europe has no equivalent to the National Transit Database, the program’s “common indicators” performance measures were intended to rely on readily available data and not require 9

aggregation into a more complex indicator. In the full pro- gram, some of the pilot indicators were abandoned due to lack of data or consistency of definition, while some new in- dicators were added. The program ended in 2002 when the funding for the consultant support ran out, although there appeared to be at least some interest among the participants in continuing the program (21). Extending the Quality of Public Transport (EQUIP). A second initiative, EQUIP, occurred at roughly the same time as the Citizens’ Network Benchmarking Initiative. EQUIP de- veloped a Benchmarking Handbook (22) covering five modes: bus, trolleybus, tram/light rail, metro, and local heavy rail (i.e., commuter or suburban rail). The handbook consists of two volumes: (1) a methodology volume describing bench- marking in general and addressing sampling issues, and (2) an indicators volume containing 91 standardized indicators for measuring an agency’s internal performance and service qual- ity. Of these, 27 are considered “super-indicators” that pro- vide an entry-level introduction to benchmarking. Ideally, each of these indicators would be collected for each of the five modes covered by the handbook. EQUIP was tasked with developing methods that agencies could use for internal benchmarking, but the methodology lent itself to agencies submitting data to a centralized, potentially anonymous database that could be used for external compar- isons, and then finally direct interaction with other agencies. During development, the methodology was tested on a net- work of 45 agencies in nine countries; however, the network did not continue after the conclusion of the project (23). One challenge faced by EQUIP was that the full EQUIP process required collecting data that European agencies either were not already collecting or were not collecting in a stan- dardized way, due to the absence of mandatory performance reporting along the lines of the NTD in the United States. Therefore, agencies would have incurred additional costs to collect and analyze the data. In addition, most European ser- vice is contracted out, with multiple companies sometimes providing service in the same city, so there can be competi- tive reasons why a service provider may be reluctant to share data with others. In addition, the local transit authority needs to compile data from multiple operators. However, as the majority of the EQUIP measures are ones that U.S. systems already routinely collect for NTD reporting purposes, the EQUIP process appears transferable to the United States. Benchmarking European Sustainable Transport (BEST). The European Union (EU) BEST program (different from the Nordic BEST benchmarking network described above) focused on developing or improving benchmarking capabil- ities for all transport modes in Europe (e.g., air, freight rail, public transport, and bicycling) at scales ranging from inter- national to local. The program sponsored six conferences between 2000 and 2003 that explored different aspects of benchmarking, and also sponsored three pilot benchmark- ing projects in the areas of road safety, passenger rail trans- port, and airport accessibility (24). Quality Approach in Tendering/contracting Urban Pub- lic Transport Operations (QUATTRO). The EU’s QUAT- TRO program (25) developed a standardized performance- measurement process that subsequently was adapted into the EN 13816 standard (26) on the definition, targeting, and measurement of service quality on public transport. The stan- dard describes a process for measuring service quality, rec- ommends areas to be measured, and provides some general standardized terms and definitions, but does not provide spe- cific targets for performance measures nor specific numerical values as part of measure definitions (e.g., the number of min- utes late that would be considered “punctual” or on-time). Both QUATTRO and EN 13816 describe a quality loop, illustrated in Figure 1, with four main components that mea- sure both the service provider and customer points of view: • Service quality sought: The level of quality explicitly or implicitly required by customers. It can be measured as the sum of a number of weighted quality criteria; the relative weights can be determined through qualitative analysis. 10 Service quality expected Service quality perceived Service quality delivered Service quality targeted Measurement of the satisfaction Measurement of the performance Customer view Service provider view Service beneficiaries: customers and the community Service partners: operator, road authorities, police… Figure 1. Quality Loop, QUATTRO and EN 13816.

• Service quality targeted: The level of quality that the ser- vice provider aims to provide for customers. It considers the service quality sought by customers as well as external and internal pressures, budgetary and technical constraints, and competitors’ performance. The following factors need to be addressed when setting targets: – A brief statement of the service standard [e.g., “we intend our passengers to travel on trains which are on schedule (meaning a maximum delay of 3 minutes)”]; – A targeted level of achievement (e.g., “98% of our pas- sengers find that their trains are on schedule”); and – A threshold of unacceptable performance that, if crossed, should trigger immediate corrective action, such as (but not limited to) provision of alternative service or cus- tomer compensation. • Service quality delivered: The level of quality achieved on a day-to-day basis, measured from the customer point of view. It can be measured using direct observation. • Service quality perceived: The level of quality perceived by the customer, measured through customer satisfaction sur- veys. Customer perception depends on personal experience of the service or associated services, information received about the service from the provider or other sources, and the customer’s personal environment. Perceived quality may bear little resemblance to delivered quality. Transferring Knowledge between Industries A presentation (27) at one of the Benchmarking European Sustainable Transport conferences focused on the process of looking outside one’s own industry to gain new insights into one’s business practices. As discussed later in this chapter, a common fear that arises when conducting benchmarking ex- ercises, even among relatively close peers, is that some funda- mental difference between peers (for example—in a transit context—relative agency or city size, operating environment, route network structure, agency objectives) will drive any observed differences in performance between the peers. As a result, some argue, it is difficult for a benchmarking exercise to produce useful results. When looking outside one’s own industry, differences between organizations are magnified, as are the fears of those being measured. At the same time, benchmarking only within one’s own industry can lead to performance improvements, but only up to the industry’s current level of best practice. Looking outside one’s industry, on the other hand, allows new approaches to be considered and adopted, resulting in a greater improvement in perfor- mance than would have been possible otherwise. In addition, competition and data confidentiality issues are lessened the further away one goes from one’s own industry. The value from an out-of-industry benchmarking effort comes from digging deeply into the portions of the organiza- tions that share common issues rather than from looking at high-level performance indicators. For example, the BEST presentation (27) looked at revenue and risk management best practices that could be transferred to the freight transportation industry from such industries as travel, hospitality, energy, and banking. In the area of risk management, for example, com- mon areas of risk include market risks due to changes in de- mand, changes in unit costs, over-capacity in the market, and insufficient capacity in the market. In the revenue-generation area, the freight transportation industry has adopted, among others, price forecasting, customer segmentation, and prod- uct differentiation practices from other industries. International Databases There is no international equivalent to the National Tran- sit Database. The closest counterpart is the Canadian Transit Statistics database compiled annually by the Canadian Urban Transit Association (CUTA). The database is available only to CUTA members (28). The International Association of Pub- lic Transport (UITP) has produced a Mobility in Cities Data- base that provides 120 urban mobility indicators for 50 cities. The data cover the years 1995 and 2001. An interesting aspect of the database is that urban transport policies are also tracked as part of the database, both policies that were enacted between 1990 and 2001 and those planned to be enacted between 2001 and 2010 (29). U.S. Efforts Transit Agencies Past transit agency peer-comparison efforts uncovered in the literature review and the initial agency outreach effort rarely extended into the realm of true benchmarking (i.e., in- volving contact with other agencies to gain insights into the results of comparisons and generating ideas for improve- ments). Commonly, agencies have conducted peer reviews as part of agency or regional planning efforts, although some reviews have also been generated as part of a management initiative to improve agency performance. In most cases, the peer-comparison efforts were one-time or infrequent events rather than part of an ongoing performance measurement and improvement process. Some examples of these efforts are described below. The Ann Arbor Transportation Authority conducted a peer analysis at the end of 2006 that involved direct contact with 10 agencies (8 of which agreed to participate) to (a) provide more recent data than were available at the time through the NTD (year 2004 data) and (b) provide details about measures not available through the NTD (the presence of a downtown hub and any secondary hubs, and the presence of bike racks, 11

a trip planning system, and a bus tracking system). One im- petus for the review was the agency’s 25% increase in rider- ship during the previous 2 years. The Central Ohio Transportation Authority (COTA) in Columbus regularly compares itself to Cleveland, Cincinnati, Dallas, Buffalo, and Austin using NTD data. Measures of particular interest include comparative cost information (often used in labor negotiations) and maintenance infor- mation. COTA also internally tracks customer-service-based measures such as the number of complaints per 100,000 passengers. The Utah Transit Authority (UTA) commissioned a per- formance audit in 2005 (30). The audit was very detailed and included hundreds of performance measures, many of which went beyond the planning level and into the day-to- day level of operations. However, the audit also included a peer-comparison element that compared UTA’s perform- ance against peer agencies by mode (i.e., bus, light rail, and paratransit) in terms of boardings, revenue miles, operating costs, peak fleet requirements, and service area size and pop- ulation. Peers were selected based on region (west of the Mis- sissippi River), city size, and the existence of light rail systems built within the previous 30 years. Some agencies use peer review panels or visits as tools to gain insights into transit performance and generate ideas for improvements. Peer review or “blue ribbon” panels tend to be more like an audit or management review, where peer rep- resentatives can be on-site at an agency for up to 3 or 4 days. Some form of peer-identification process is typically used to develop these panels. Those who have participated in such efforts have found it quite valuable to discuss details with their counterparts at other agencies. Visits to other agencies can also provide useful insights if the agencies have been se- lected on the basis of (a) having similar characteristics to the visiting agency and (b) strong performance in an area of in- terest to the visitors. Visits to agencies simply on the basis of reputation may be interesting to participants, but are not as likely to produce insights that can be adopted by the visiting agency. APTA is developing a set of voluntary recommended prac- tices and standards for use by its members. Some of these pro- vide standard definitions for non-NTD performance measures. Other recommended practices address processes (for example, a customer comment process and comment-tracking data- base) that could lead to more widespread availability of non- NTD data, although the data would not necessarily be defined consistently between agencies. An example of a standard def- inition is APTA’s Draft Standard for Comparison of Rail Transit Vehicle Reliability Using On-Time Performance (31), which defines two measures and an algorithm for determining the percentage of rail trips that are more than 5 minutes late as a result of a vehicle failure on a train. Transit Finance Learning Exchange (TFLEx) One U.S. benchmarking network currently in existence is the Transit Finance Learning Exchange, “a strategic alliance of transit agencies formed to leverage mutual strengths and continuously improve transit finance leadership, development, training practices and information sharing” (32). TFLEx was formed in 1999 and currently has thirteen members: • Capital Metropolitan Transportation Authority – Austin, • Central Puget Sound Regional Transit Authority (Sound Transit), • Dallas Area Rapid Transit (DART), • Hillsborough Area Regional Transit Authority (HART), • Los Angeles County Metropolitan Transportation Authority (LACMTA), • Massachusetts Bay Transportation Authority (MBTA), • Orange County Transportation Authority (OCTA), • Regional Transportation Authority of Northeastern Illinois (RTA), • Regional Transportation Commission of Southern Nevada (RTC), • Rochester-Genesee Regional Transportation Authority (RGRTA), • San Joaquin Regional Transit District, • Santa Monica Big Blue Bus, and • Washington Metropolitan Area Transit Authority (WMATA). As of 2009, annual membership fees were $5,000 for agen- cies with annual operating budgets larger than $50 million, and $2,500 otherwise. To avoid losing members, member- ship dues have dropped in recent years due to the economic downturn. TFLEx’s original goal was to develop a standardized data- base of transit performance data to overcome the challenges, particularly related to consistency, of relying on NTD data. In most cases, the performance measures collected by TFLEx have been variants of data already available through NTD, rather than entirely new performance measures; the benefit is in the greater consistency of the TFLEx reporting procedures. TFLEx has not entirely succeeded in meeting its goal of producing a standardized and regularly updated database of transit performance data for benchmarking purposes, for two primary reasons: • First, collecting the data requires significant time and/or money commitments. It has been difficult in many cases for member agencies to dedicate resources to providing data for TFLEx. One successful strategy used in the past was to fly a TFLEx representative directly to member agen- cies to collect data. This approach worked well, but the 12

costs of the data collection effort were higher than can cur- rently be supported. • Second, developing standard definitions of core perfor- mance measures that everyone can agree to for reporting purposes is difficult. For instance, TFLEx spent years trying to develop a standard definition of “farebox recovery” to ensure consistent reporting of this measure in TFLEx data. In general, TFLEx’s experience has been that it is rela- tively easy to get data once, but updating the data on a reg- ular basis is very difficult. Because of the financial difficulties faced by many transit agencies at the time of writing, fund- ing and resources for TFLEx data collection efforts have di- minished considerably. Data for member agencies are still reported in a confidential section of the TFLEx website, but the data typically come directly from NTD reporting at pres- ent, rather than representing the results of a parallel data collection effort. While the database function of TFLEx has subsided for the time being, the organization provides several other benefits for members. TFLEx agencies meet for semi-annual work- shops where best practices in transit financial management are shared. In addition, the TFLEx website provides a forum to ask specific questions and receive feedback from other member agencies. TFLEx leadership reported that questions to these forums typically elicit valuable responses. Although the current TFLEx programs are not data-driven, the mem- bers still consider TFLEx to be a valuable benchmarking tool simply through the ability to quickly share information with other member agencies. The semi-annual workshops are also seen as valuable ways to develop professional relationships to which one can turn for answers to many transit performance- related questions. Despite the challenges that the group has faced in develop- ing a reliable source of benchmarking data, TFLEx leadership still feel that the need for these data within a transit agency is strong. As a result, they expect TFLEx to undertake a renewed effort to collect benchmarking data in the next several years. Over the long term, the TFLEx interviewees felt strongly that the NTD reporting procedures should be significantly revised to create a more standardized dataset. They felt there is a movement within the transit industry toward greater trans- parency that will allow for more standardized reporting in the future. One interviewee noted the success of public schools in developing benchmarks (e.g., test scores) that allow schools to be directly compared to one another as a model that the transit industry could hope to follow. States State DOTs are often interested in transit performance, as they are typically responsible for distributing federal grant funding to rural, small urban, and medium urban systems. Funding formulas often include basic elements of perfor- mance (e.g., ridership, revenue miles operated), while some also include cost-related factors. For example, the Indiana DOT incorporates a 3-year trend of passenger trips per oper- ating expense, vehicle miles per operating expense, and locally derived income per operating expense into its formula. The Texas DOT uses revenue miles per operating expense, riders per revenue mile, local investment per operating expense, and (for urban systems only) riders per capita in its formula (33). The North Carolina DOT (NCDOT) is considering develop- ing minimum service standards for transit agencies in its state (34), but has not yet done so. The Washington State DOT (WSDOT) has a strong focus on performance measurement, both internal and external. WSDOT’s Public Transit Division produces an annual sum- mary on the state of public transportation in the state (35), which includes a statewide summary and sections for each of the state’s 28 local-government public transportation systems. The local-agency sections include a comparison of 10 key per- formance indicators to the statewide average for the group (e.g., urban fixed-route service), as well as 3-year trends and 5-year forecasts for each agency for a larger set of performance measures, broken out by mode. All of the reported measures are NTD measures. However, the summary also provides use- ful information about individual systems that go beyond NTD measures, including information on: • Local tax rates and the status of local efforts to increase local public transportation taxes and/or expand transit districts; • Type of governance (e.g., Public Transportation Benefit Area, City, County); • Description of the makeup of the governing body (e.g., two county commissioners, two representatives from cities with populations greater than 30,000, etc.); • Days and hours of service; • Number and types of routes (e.g., local routes, commuter routes) that are operated; • Base fare (adult and senior); • Descriptions of transfer, maintenance, and other facilities; and • Summary of the agency’s achievements in the previous year, objectives for the upcoming year, and objectives for the next 5 years. The NCDOT commissioned a Benchmarking Guidebook (34), which was published in 2006. The guidebook’s purpose is to provide public transportation managers in North Car- olina with step-by-step guidance for conducting benchmark- ing processes within their organizations. The state’s underly- ing goal is to help ensure that transit systems throughout the state serve their riders efficiently and effectively, and use the 13

state’s public funding as productively as possible. The guide- book proposes a three-part benchmarking process: 1. Trend analysis—to be conducted at least annually by each transit system. 2. Peer group analysis—to be conducted at least annually by each transit system (comparing themselves to national peers) and by the NCDOT (comparing performance among groups of North Carolina peers). 3. Statewide minimum standards—10 measures that would be evaluated annually by NCDOT, with poorly performing transit systems provided help to improve their perfor- mance, and superior performance being recognized. States frequently tabulate data that agencies within the state submit to the NTD, allowing access to the data up to a year earlier than waiting for the NTD. States have also fre- quently tabulated a limited set of data for smaller systems that prior to 2008 were not required to report to the NTD. How- ever, due to the minimal amount of staff at smaller systems, data may not be reported consistently or at all. One state, for example, reported that their collection of rural and small city cost data was “substantially unreliable,” due to missing data and values they knew for certain were incorrect (36). Some state DOTs, such as Florida and Texas, have had universities audit NTD data submitted by transit agencies to ensure con- sistency with performance measure definitions and have had the universities perform agency training on measures that were particularly troublesome. Regions Peer comparisons are also performed at the regional level. For example, by state law, the Metropolitan Council in Min- neapolis must perform a transit system performance audit every 4 years. The audit encompasses the 24 entities that provide service within the region. The 2003 audit included a peer comparison of the region as a whole to 11 peer regions (selected on the basis of area size and composition of transit services) and a comparison of bus and paratransit services by Metro Transit in Minneapolis/St. Paul to six peer cities. A trend analysis was performed for six key measures for both Metro Transit and the peer group average (37). The Atlanta Regional Council conducted a Regional Transit Institutional Analysis (38) to examine how the region should best plan, fund, build, and operate public transit. A group of peer regions was constructed to assist in the analysis. Factors used to select peers consisted of urban area size (within 2 mil- lion of the Atlanta region’s population), urban area growth rate, population density, annual regional transit trips, percent drive-alone trips, annual delay per traveler, and cost-of-living index. Boston, Portland, Los Angeles, and New York were also included at the request of the council’s board, as those areas are frequently cited as examples. The comparisons focused on non-NTD factors, such as the budget approval process, fare policy, responsibility for capital construction, funding alloca- tion, bonding authority, and recent initiatives. The state of Illinois’ Office of the Auditor General period- ically conducts performance audits of the transit funding and operating agencies in the Chicago region. The most recent audit was conducted in 2007 (39). A portion of the audit de- veloped groups of five peers each for the following Chicago- area transit services: Chicago Transit Authority (CTA) heavy rail, CTA bus, Metra commuter rail, Pace bus, Pace demand response, and Pace vanpool. CTA and Metra peers were se- lected by identifying agencies operating in major cities with rapid rail service, while Pace’s bus peers were selected by iden- tifying agencies that operate in suburban portions of major cities and that operate from multiple garages. (Pace manage- ment noted that three of the five peers provide service within the major city, unlike Pace, and that Pace had the lowest ser- vice area population density of the group.) Pace operates the second-largest vanpool service in the country, so the other four largest vanpool operations were selected as the peers for that comparison. The peer comparisons looked at four major categories of NTD measures: service efficiency, service effec- tiveness, cost effectiveness, and passenger revenue effective- ness, plus a comparison of top-operator wage rates using data from other sources. Between 5 and 19 measures were com- pared among the peer groups, depending on the service being analyzed. For those areas where a service’s performance was below that of its peers, the audit developed recommendations for improvements. Research The University of North Carolina at Charlotte produced annual rankings of transit system performance (40), derived from 12 ratios of NTD measures reflecting the resources avail- able to operate service, the amount of service provided, and the resulting utilization of the service. Agencies were assigned to one of six groups based on mode operated (bus-only or multimodal) and population served. Each agency’s value for a ratio was compared to the group mean for the ratio, result- ing in a performance ratio. The 12 performance ratios for an agency were then averaged, and this overall performance ratio was used to rank systems within their own groups, as well as for all systems. Some of the criticisms of this effort included that there was no industry consensus or agreement on the measures that were used, that inconsistencies in the data exist, and that systems have different goals, regional demo- graphics, and regional economies. It was also pointed out that a poor performance in one single category could overshadow good results in several other categories—for example, MTA- 14

New York City Transit ranked in the top three agencies overall for 6 of the 12 measures one year, yet ended up ranked 124 out of 137 agencies overall due to being in the bottom four agencies for 3 of the 12 measures. The performance ratio also had prob- lems with autocorrelation among the component measures (1). The authors of the study did not believe that geographic or size differences affected their results, but did acknowledge that their findings did not shed light on the reasons why apparently similar systems differed so much in performance. The National Center for Transit Research (NCTR) con- ducted a study on benchmarking (41) in 2004 for the Florida DOT. This study’s objective was to develop a method of measuring commonly maintained performance statistics in a manner that would be broadly acceptable to the transit indus- try and thereby provide useful information that could help agencies improve their performance over time. The project focused on the fixed-route motorbus mode and was limited to NTD variables, with the intent of expanding the range of modes and variables in the future if the initial project proved successful. Peer groups were initially formed based on geo- graphic region, and then subdivided on the basis of service area population, service area population density, total oper- ating expense, vehicles operated in maximum service, and an- nual total vehicle miles. An additional group of the 20 largest transit systems from around the country was also formed, as the largest systems often did not have comparable peers within their region. The NCTR study compared 22 performance measures in six performance categories: service supply/availability, service consumption, quality of service, cost efficiency, operating ratio, and vehicle utilization. For each measure, an agency was assigned points depending on where its performance value stood in relation to the group mean, with a value more than 2 standard deviations below the mean earning no points, a value more than 2 standard deviations above the mean earn- ing 2 points, and between 0.5 and 1.5 points for values falling within ranges located between those two extremes. By adding up the point values for each measure, a total score can be de- veloped for each agency, and the agencies can be ranked within their group based on their respective scores. In addi- tion, composite scores and rankings can be developed for each performance category. According to the study’s authors, the results of the process are not intended to indicate that one particular transit agency is “better” than another, but rather to serve as a tool that allows transit agencies to see where they fit into a group of relatively similar agencies. NCHRP Report 569: Comparative Review and Analysis of State Transit Funding Programs (42) provides information to help states conduct peer analyses and other comparative as- sessments of their transit funding programs, using data from the Bureau of Transportation Statistics’ Survey of State Fund- ing for Public Transportation and other data sources. The re- port presents the following framework for using the survey’s data to construct peer groups and conduct peer analyses: 1. Determine the purpose of the analysis, or the types of mea- sures to be compared (a common objective). 2. Determine the metrics for formulating peer groups (which similarities should be shared among the peers). 3. Develop the peer groups based on the metrics selected and their relative importance (i.e., determine weights). The report provides examples of how the framework could be applied. In one sample analysis, the assumed objective was to compare state transit funding between “transit-dependent” and “non-transit-dependent” states. In a second example, peer groups were formed for the purposes of comparing state tran- sit funding programs. These examples include suggestions for peer-grouping measures and suggestions for performance measures relevant to each example’s objective that could be used for drawing comparisons. TCRP Report 88: A Guidebook for Developing a Transit Per- formance-Measurement System (1) describes a process for transit agencies to follow to set up an internal performance- measurement system, a necessary first step to any benchmark- ing effort. The report describes more than 400 performance measures that are used in the transit industry. Each measure is assessed on the basis of its performance category (availabil- ity, service delivery, community impact, travel time, safety and security, maintenance and construction, and economic/ financial), its data collection needs, and its potential strengths and weaknesses for particular applications. A series of question- based menus guide readers from a particular agency objective to one or two relevant measures for that objective, consider- ing the agency’s size and data-collection capabilities. A rec- ommended core set of measures for different agency sizes is also presented for agencies that want to start with a basic performance-measurement program prior to fine-tuning to reflect specific agency objectives. True benchmarking, involv- ing contact with other agencies, is not covered in the report (only trend analyses and peer comparisons are described); however, the report can serve as a valuable resource to a benchmarking effort by providing a source of appropriate measures that can be applied to a particular benchmarking application. Maintaining a customer focus is an important aspect of a successful benchmarking effort. Transit agencies often use customer satisfaction surveys to gauge how well customers perceive the quality of service being provided. TCRP Report 47: A Handbook for Measuring Customer Satisfaction and Service Quality (43) provides a recommended set of standardized questions that transit agencies could incorporate into their cus- tomer surveying activities. If more agencies adopted a standard core set of questions, customer satisfaction survey results could 15

be added to the mix of potential comparisons in a bench- marking exercise. Levels of Benchmarking Benchmarking can be performed at different levels of com- plexity that result in different levels of depth of understand- ing and direction for improvement. The European EQUIP project (22, 23), described previously, defined three levels of benchmarking complexity, which form a useful foundation for the discussions in this section. This report splits EQUIP’s Level 3 (direct contact with other agencies) into two levels, one involving one-time or irregular contact with other agen- cies (this report’s Level 3), and the other involving participa- tion in a benchmarking network with a more-or-less fixed set of partner agencies (this report’s Level 4). Level 1: Trend Analysis Are we performing better than last week/month/quarter/year? The first level of evaluation is to track performance on a periodic basis, often year-to-year, but potentially also week- to-week, month-to-month, or quarter-to-quarter, using the same indicators in a consistent way. Trend analysis forms the core of an effective performance-measurement program and is essential for good management and stewardship of funds. A program can be tailored to measure an agency’s success in meeting its goals and objectives, and each agency has the flexi- bility to choose exactly what to measure and how to measure it. A trend analysis can show whether a transit agency is im- proving in areas of interest over time, such as carrying more rides, collecting more fare revenue, or decreasing complaints from the public. However, a trend analysis does not gauge how well an agency is performing relative to its potential. An agency could have increased its ridership substantially, but still be providing relatively few rides for the size of market it serves and the level of service being provided. To move to the next level of performance evaluation, a peer comparison should be conducted (Level 2). Level 2: Peer Comparison How are we performing in relation to comparable agencies? There are a number of reasons why a transit agency might want to perform a peer comparison: for example, to support an agency’s commitment to continual improvement, to vali- date the outcome of a past agency initiative, to help support the case for additional funding, to prioritize activities or ac- tions as part of a strategic or short-range planning process, or to respond to external questions about the agency’s oper- ation. In a peer comparison, an agency compares its perfor- mance against other similar agencies that have contributed similarly collected data to a centralized database, which may or may not be anonymous. No direct contact or sharing of knowledge occurs between agencies, other than knowledge that can be obtained passively (e.g., from documents or data obtained through an Internet search). The set of performance measures that can be used in a peer comparison is much more limited than in a trend analysis, as the data for each measure must be available for all of the peer agencies involved in the comparison, and each transit agency must use the same definition for any given measure. As a re- sult, most peer comparisons in the United States have relied on the NTD as it is readily available and uses standardized definitions. As discussed later in this chapter, the NTD does not provide measures for all performance topics of potential interest to a transit agency, nor do all reporting agencies con- sistently follow the FTA’s performance measure definitions. Nevertheless, despite these handicaps, the industry consensus [as determined from this project’s outreach efforts (44)] is that the NTD is the best source of U.S. transit data available and that the FTA is continually working to improve NTD data quality. A critical element of a peer comparison is the selection of a credible peer group. If the peer group’s characteristics are not sufficiently similar to that of the transit agency perform- ing the comparison, any conclusions drawn from the com- parison will be suspect, no matter how good the quality of the performance measure data used in the comparison. At the same time, it is unrealistic to expect that the members of a peer group will be exactly like the target agency. Data from standardized data sources can be used to form peer groups of comparable agencies [the peer-grouping methodology pre- sented in Chapter 3 follows this approach, using the NTD, Census Bureau, the 2007 Urban Mobility Report (45), and data developed by this project as the sources]. The transit agency’s performance can then be compared to its peers in areas of interest, using standardized performance measures to identify areas where the agency performs as well as or better than the others and areas where it lags behind. It is unlikely that an agency will excel among its peers in all areas; therefore, the peer comparison process can help guide an agency in targeting its resources toward areas that show strong potential for improvement. A transit agency may dis- cover, for example, that it is providing a comparable level of service but carrying fewer passengers than its peers. This knowledge can be used by itself to indicate that effectiveness may need to be improved, but it becomes more powerful when combined with more detailed data obtained directly from the peer agencies (Level 3). Level 3: Direct Agency Contact What can we learn from our peers that will help us improve our performance? 16

Level 3 represents the start of true benchmarking. At this level, the transit agency performing the comparison makes direct contact with one or more of its peers. More-detailed in- formation and insights can be gained through this process than from a simple reliance on a database. One reason for directly contacting other peers is that the measures required to answer a performance question of inter- est are simply not available from national databases. A variety of data that are not reported to the NTD (for example, cus- tomer satisfaction data) are often collected by peer agencies but are not necessarily summarized in standard reports. In other cases, performance measures may be reported to the NTD, but not at the desired level of detail—for example, an agency that is interested in comparing the cost-effectiveness of commuter bus routes will only find system-level data in the NTD, which aggregates all of a particular transit agency’s bus services. Another reason for directly contacting a peer is to gain in- sights into what the agency’s top-performing peers are doing to achieve their superior performance in a particular area. These insights may lead to ideas on how these peer agencies’ practices may be transferable to the transit agency performing the comparison, leading eventually to the agency being able to improve its performance in an area of relative weakness. A third reason for contacting peers is to obtain background information about a particular transit agency (e.g., agency policies or board composition) and to verify or ask questions about unusually high or low results. These types of contacts help verify that the peer agency really is similar to the agency performing the comparison and that the performance results are reliable. At Level 3, contact with other transit agencies occurs on a one-time or irregular basis, guided by specific agency needs, such as the need to update a transit development plan. Al- though benchmarking is occurring, a consistently applied and scheduled agency benchmarking program and an agency culture supporting continuous improvement may not yet exist. Because peer agencies are unlikely to change much over the short term (barring a major event such as a natural disas- ter or the opening of a new mode), the same set of peers can often be used over a period of years, resulting in regular con- tacts with peers. At some point, transit agencies may decide it would be valuable to institute a more formal information- sharing arrangement (Level 4). Level 4: Benchmarking Networks What knowledge can we share with each other in different areas that will help all of us improve our performance? At the highest level of benchmarking, an agency imple- ments a formal benchmarking program and establishes (or is taking steps to establish) an agency culture encouraging continuous improvement. The agency identifies similar, like- minded agencies that have agreed to work together to regu- larly share data and experiences with each other for the ben- efit of all participants. The participants in the benchmark- ing network agree upon a set of data definitions and measures to be shared among the group, have a process set up that al- lows staff from different agencies to share their experiences with others, and may pool resources to fund investigations into performance topics of interest to the group. Much of the data-related portion of the process is similar to Level 3, but after the initial start-up, requires less effort to manage, as the peer group members have already been identified and a common set of measures and definitions has already been agreed upon. Benchmarking Success Factors The following is a summary of the key factors for success- ful peer comparison and full-fledged benchmarking programs that were identified from the project’s literature review and agency outreach effort: • The peer grouping process is perhaps the most important step in the benchmarking process. Inappropriate peers may lead to incorrect conclusions or stakeholder refusal to ac- cept a study’s results. For high-level performance compar- isons, peers should have similar sizes, characteristics, and operating conditions. One should expect peers to be simi- lar, but not identical. Different peer groups may be needed for different types of comparisons (2, 41, 44). • Design the benchmarking study and identify the study’s ob- jectives before starting to collect data. Performance should be measured relative to the agency’s goals and objectives. Common definitions of performance measures are essen- tial (1–3, 44). • A management champion is needed first to support the ini- tial performance-measurement and benchmarking effort, and then later to implement any changes that result from the process. Without such support, time and resources will be wasted as no changes will occur (1, 3, 6, 44). • Comparing trends, both internally and against peers, helps identify whether particularly high or low performance was sustainable or a one-time event, which leads to better inter- pretation of the results of a benchmarking effort (3, 44). • Organizations should focus less on rankings in bench- marking exercises and more on using the information to stimulate questions and to identify ways they can adapt the best practices of others to their own activities. A “we can learn from anyone” attitude is helpful. Don’t expect to be the best in every area (6, 20, 44). • Consider the customer in any benchmarking exercise. Pub- lic transit is a customer-service business, and transit bench- marking should seek to identify ways to improve transit 17

performance and thereby improve ridership (1, 3, 6, 18, 22, 26, 44). • A long-term approach to performance measurement and benchmarking is more likely to be successful than a series of independent studies performed at irregular intervals. Even established benchmarking programs should be moni- tored and reviewed over time to make sure they stay current with an organization’s objectives and current conditions (1, 2, 20). Confidentiality The U.S. and European transit benchmarking networks (14–16, 18, 31), as well as the Benchmarking Code of Con- duct (8), emphasize the importance of confidentiality, par- ticularly in regard to information about business practices and the results of benchmarking comparisons. The networks also extend confidentiality, to one degree or another, to the in- puts into the benchmarking process. All of these sources agree that information can be released if all affected parties agree to do so. In an American transit context, confidentiality of inputs is not attainable in many circumstances because the NTD is available to all. However, certain types of NTD data (e.g., safety and security data) are not released to the public at present, while non-NTD data that may assist a benchmarking process, such as customer satisfaction survey results, are only available through the cooperation of other transit agencies, who may not wish the information to be broadly disseminated. On the other hand, as public entities, many U.S. transit agencies are subject to state “sunshine laws” that may require the re- lease of information if requested to do so (e.g., by a member of the public or by the media). The public nature and stan- dardization of the NTD (and, for Canadian agencies, the avail- ability of standardized Canadian data) makes it easier for U.S. and Canadian transit agencies to perform peer comparisons than their counterparts in other parts of the world. At the same time, the public availability of the NTD makes it possible for others to compare transit performance in ways that transit agencies may not necessarily agree with. Optimal Peer Group Size A success factor that is rarely explicitly stated—North Car- olina’s Benchmarking Guidebook (34) being an exception— but that is generally implied through the way that peer groups are developed is that there are upper and lower limits to how many peers should be included in a peer group. Too many peers result in a heavy data collection burden and the possi- bility that peers are too dissimilar to draw meaningful con- clusions. Too few peers makes it difficult to credibly judge how well a transit agency is performing, and in a worst case could lead to accusations that the peers were hand-picked to make an agency look good. In general, anything below 4 peers is considered to be too few, while somewhere in the range of 10 to 20 peers is considered to be too many, depending on the application. Benchmarking Networks Transit benchmarking networks have had the greatest suc- cess, both in terms of longevity and documented results. Such networks also exist in the private sector. The advantages of benchmarking networks include: • Participants agree upon common measures and data definitions—this provides standardization, focuses data collection on areas of interest to the group, and gives participants more confidence in the quality of the data and the results. • Participants have already agreed that they share a sufficient number of characteristics in common—this helps reduce, if not eliminate, questions afterwards about how compara- ble a particular peer is. • Cost-sharing is possible, allowing participants to get better- quality information at a lower cost than if they were to con- duct a benchmarking exercise on their own. • Networks facilitate the development and comparison of long-term performance trends. • Agency staff grow professionally through exposure to and discussions with colleagues in similar positions at other participating agencies. • Confidentiality, if desired. Two key success factors for transit benchmarking networks in Europe have been the use of an external facilitator (e.g., a university or a private consultant) and ongoing financial sup- port. The facilitator performs functions that individual tran- sit agency staff may not have time or experience for, including compiling and analyzing data, producing reports, and organ- izing meetings (e.g., information-sharing working groups on a specific topic or an annual meeting of the network partici- pants) (18, 20). The cost of the facilitator is shared among the participants. At least two European pilot benchmarking net- works (13, 23) dissolved after EU funding for the research project (and the facilitator) ended. Benchmarking networks are not easy to maintain: they require long-term commitments by the participating agen- cies to contribute resources to the effort, to the benefit of all. At the same time, both private- and public-sector expe- riences indicate that the knowledge transfer benefits and the staff networking opportunities provided by a bench- marking network provide a valuable return on the agency’s investment. 18

Benefits of and Challenges with Transit Peer Comparisons Benefits of Transit Peer Comparisons Most of the participants in this project’s outreach effort (44) agreed that peer comparisons should be used as one tool in a set of various management tools for measuring perfor- mance. From a manager’s perspective, it is always valuable to have a sense of where performance lies relative to other sim- ilar agencies. Useful information can be revealed even if a given methodology might have some flaws and not be “per- fect” or “ideal.” In addition, even if not necessarily used by outside agencies to determine funding levels or otherwise measure accountability, peer comparisons can be used as a way to foster competition and motivate transit agencies to improve their performance. When used internally, such com- parisons can provide insight into areas where an agency is performing relatively well among its peers or where some im- provements might be needed. However, nearly all those con- tacted stated that peer comparisons should not be used as the only benchmark for a transit agency’s performance. The gen- eral consensus is that they are very good diagnostic tools but are typically not complex enough (by nature) to facilitate a complete understanding of performance. Most transit agencies use the NTD for peer comparisons, and most expressed a general satisfaction with being able to use the data relatively easily to facilitate comparisons. While there are certainly limitations to the NTD (see the next sec- tion), it was noted that it has less ambiguity relative to other data sources due to the somewhat standard definitions and reporting requirements. Comments such as “it’s what we’ve got” and “it’s the best of the worst” were heard in the discus- sions. More than one individual stated that the NTD is “bet- ter” and more reliable than the comparable data used on the highway side by the Federal Highway Administration (thus making the point that all large federal databases have their own sets of problems and issues). Also, peer comparisons can be used to support requests for more resources for a transit agency. This might be an easier task when an agency’s performance is considered better than its peers. However, with the proper presentation, an agency’s relatively poorer performance might also be used to show a need for more resources (e.g., when an agency’s local funding levels are shown to be much lower than its peers). Overall, the outreach participants have learned a great deal from their experiences with peer comparisons, and they can be considered tools that are valuable regardless of the out- come. When a transit agency compares favorably to its peers, it can provide a sense of validation for current efforts. When an agency appears not so favorably, lessons can be learned about what areas need more attention. Challenges with Transit Peer Comparisons While most outreach participants agreed that transit peer comparisons are useful tools, many challenges to the process were acknowledged. Outreach participants noted that mak- ing true “apples to apples” comparisons are difficult and that, in designing a methodology, it is hard to “be all things to all people.” At least one participant believes that all statistical compar- isons among transit systems are “fatally flawed” due to the basic settings of the various systems or their board policies, which result in substantial differences that make clear com- parisons nearly impossible. Alternatively, as one participant stated, “No one said this has to be easy.” There will always be arguments that “we’re so unique, we’ll never be like so-and- so,” or “it’s so different here,” yet most agree that such com- plaints should not thwart the careful development and use of such comparisons. One major issue that can cause problems in transit com- parisons is the peer selection process itself. Who is selecting the peers? When a transit agency self-selects, there can be a bias against including agencies that might be performing bet- ter. Several participants noted that managers might ignore, manipulate, or otherwise skew information that does not make the system look good. It can be relatively easy to present data in a way that an agency wants it presented. In addition, sev- eral of those with direct experience developing peer groups for analysis indicated that they were often told to include cer- tain agencies in the analysis that were clearly not appropriate peers. Often the motivation for adding such agencies to the analysis included a sense of competition with the other com- munity or a desire to be more like another community (per- haps in other ways besides just transit; i.e., “We are always compared to such-and-such community in other ways, so we should add them to the peer group”). Including communities that are not necessarily appropriate peers might be helpful if the purpose of the exercise is to see what the differences are; however, it will not be as instructive if the purpose is to benchmark existing performance. Because much of the information used in the typical tran- sit peer comparisons is statistical in nature, a lack of the ap- propriate technical knowledge among those either conduct- ing the analysis or interpreting it can cause problems. As one participant noted, “Do you have ‘numbers’ people on staff?” Without a thorough understanding of how the numbers are derived and what they mean, and without being able to prop- erly convey that information to those who will interpret and use the results, “weaknesses can be magnified” and the over- all usefulness of the process is reduced. While the NTD, as a relatively standardized database, is the source of most information used in typical transit compar- isons, there is some limited utility of the data. The following 19

are some of the issues that outreach participants see with the use of NTD as related to transit peer comparisons: • Despite standardized definitions, some transit agencies still report some items differently and/or not very well, particu- larly in the service area and maintenance categories (and other factors not necessarily in the NTD, such as on-time performance, can also be measured quite differently among agencies); • NTD provides only a “once a year” (or year-end) picture of performance; • Data lags (e.g., data for report year 2006 data become na- tionally available at the beginning of calendar year 2008); • Only one service area is reported for an agency, but ser- vice area can vary greatly by mode (especially with demand- response service), thus leading to issues with any per-capita measures; • Missing or otherwise incomplete data, particularly for smaller agencies; and • Limited information for contracted services, although such services sometimes represent a significant portion of the service operated. In addition, many participants noted that other relevant fac- tors that should be included in any comparison are not found in the NTD. To paraphrase one participant, it should be re- membered that NTD is just a “compromise” offset against the burden to the agencies of reporting requirements. Another negative, according to a few participants, is that the typical transit comparisons focus too much on the informa- tion that is most easily measured or focus on too few mea- sures. While some might argue that such a focus is appropri- ate, especially if a method is expected to be widely applied, others believe it will not result in the best and most meaning- ful comparisons, thus reducing the effort to simply a “paper exercise.” Some believe in the “less is more” approach, while others believe that “more is more.” Media Issues For many in the project’s outreach effort, media reactions to transit peer comparisons have not been very controversial. Unless there is some major negative issue, the media will often ignore the information or simply report anecdotal informa- tion. In some areas where peer comparisons are very favor- able, the agencies often promote the information to the media as a way to gain additional support for transit services in the community. Alternatively, dealing with the media can sometimes be a challenge for some transit agencies. There might be questions about the peer selection process and why some agencies were included (or excluded) from the analysis. As one participant stated, the media will always “slant to the negative,” and so whoever might be presenting the information must really un- derstand the methods and numbers and be able to convey them appropriately to the audience. The agency representa- tives should be comfortable enough with the data and results and be ready and able to explain the meaning and relevance of the information. In addition, if something looks “different,” it is important to remember that “different” does not necessar- ily mean “bad.” Some participants added that having a set methodology to follow (determined external to the agency) can be a way to show that an objective process was used. Lessons Learned After 30 years, benchmarking is well established in the pri- vate sector, and its benefits are widely recognized. Public sec- tor adoption of benchmarking is more recent (generally since the mid-1990s), but many examples of successful benchmark- ing programs can already be found in the United States. There has been significant interest in Europe in public transit bench- marking, particularly since the late 1990s, and there are cur- rently four well-established international benchmarking net- works catering to different modes and city sizes. However, although a few in the U.S. public transit industry have recog- nized the benefits of benchmarking and have taken steps to- ward incorporating it into agency activities, it is not yet a widespread practice. U.S. and Canadian transit agencies wishing to conduct peer comparisons or full-scale benchmarking efforts have a signif- icant advantage not available to their counterparts in the rest of their world, namely the existence of standardized databases (the NTD and Canadian Transit Statistics, respectively) that provide access to a wide array of consistently defined variables that have been reported by a large number of agencies over a considerable period of time. Although NTD data are still per- ceived by many in U.S. transit agencies as being unreliable— and certainly there is still room for improvement—the testing conducted by this project found that the NTD is usable for a wide variety of benchmarking applications. The Florida DOT has sponsored for a number of years the Florida Transit Information System (FTIS) software, which is a freely available, powerful tool for accessing and analyzing data from the complete NTD. The peer-grouping methodol- ogy described in this report has now been added to FTIS, making peer comparisons quicker to perform than ever and allowing for a greater depth of analysis. Peer comparison is best applied as a diagnostic tool that helps agency management identify areas for improvement, particularly when one takes the approach from the start that one always has room for improvement. The results of peer comparisons should be used to stimulate questions about the reasons behind the performance results, which in turn can 20

lead to ideas that can result in real performance improve- ments. Many international transit agencies have found that the contacts they make with peer agencies as a result of a benchmarking process provide the greatest benefit, rather than any set of numerical results. However, the numerical analysis remains an important intermediate step that allows one to identify best-practice peers. Management support is vital for performance measurement in general, but particularly so for a benchmarking process. Re- sources need to be allocated to make the peer agency contacts, and both resources and a management champion are needed to support any initiatives identified through the benchmark- ing process designed to improve agency performance. Dur- ing times of economic hardship, management support is par- ticularly vital to keeping an established program running; however, it is also exactly at these times when benchmarking can be a particularly vital tool for identifying potential im- provements and efficiencies that can help a transit agency continue to perform its mission using fewer overall resources. Benchmarking networks represent the highest level of benchmarking and are particularly useful for (a) compiling standardized databases of measures not available elsewhere and (b) coordinating contacts between organizations on top- ics of mutual concern. Networks can also help spread the cost of data analysis and collection over a group of agencies, thus reducing the costs for all participants compared to each par- ticipant performing their own separate analyses. The use of an external facilitator has been a common success factor for transit benchmarking networks. However, joining a network is not a requirement to successfully perform benchmarking— agencies can still learn a lot from conducting their own indi- vidual efforts. Finally, while it is desirable to have peer transit agencies share as many characteristics as possible as the agency per- forming the comparison, it should also be kept in mind that all transit agencies are different in some respect and that one will never find exact matches to one’s own agency. The need for similarity is more important when higher-level perfor- mance measures that can be influenced by a number of fac- tors are being compared (e.g., agency-wide cost per boarding) than when lower-level measures are being compared (e.g., miles between vehicle failures). Keep in mind that a number of suc- cessful benchmarking efforts have occurred across industries by focusing comparisons only on the areas or functions that the organizations have in common. In summary, performance measurement, peer comparison, and benchmarking are tools that a transit agency can apply and benefit from right now. Potential applications are described in Chapter 3. Some of the potential issues identified earlier in this section are addressed by this project’s methodology, while others simply require awareness of the potential presence of the issue and tools for dealing with the issue (Chapter 4). There is also room for improvements to the process in the future; Appendix C provides recommendations on this subject. 21

Next: Chapter 3 - Applications and Performance Measures »
A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s Transit Cooperative Research Program (TCRP) Report 141: A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry explores the use of performance measurement and benchmarking as tools to help identify the strengths and weaknesses of a transit organization, set goals or performance targets, and identify best practices to improve performance.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!