Read "A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry" at NAP.edu

« Previous: Chapter 3 - Applications and Performance Measures

Page 30

Suggested Citation:"Chapter 4 - Benchmarking Methodology." National Academies of Sciences, Engineering, and Medicine. 2010. A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry. Washington, DC: The National Academies Press. doi: 10.17226/14402.

Page 31

Page 32

Page 33

Page 34

Page 35

Page 36

Page 37

Page 38

Page 39

Page 40

Page 41

Page 42

Page 43

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

30 Introduction This chapter describes a step-by-step process, in eight steps, for conducting a trend-analysis, peer-comparison, or full- ï¬edged benchmarking effort. Not all of the steps described below will be needed for every effort. The first step of the process is to understand the context of the benchmarking exercise. What is the source of the issue being investigated? What is the timeline for providing an answer? Will this be a one-time exercise or a permanent process? The answers to these questions will determine the level of effort for the entire process and will also guide the selection of performance measures and screening criteria in subsequent steps. In Step 2, performance measures are developed that relate to the performance question being asked. A benchmarking program that is being set up as a regular (e.g., annual) effort will use the same set of measures year after year, while one- time efforts to address speciï¬c performance questions or issues will use unique groups of measures each time. This reportâs peer-grouping methodology screens potential peers based on a number of common factors that inï¬uence performance results between otherwise similar agencies; however, addi- tional screening factors may be needed to ensure that the ï¬nal peer group is appropriate for the performance question being asked. These secondary screening factors are also iden- tiï¬ed at this stage. In all applications except a simple trend analysis of the target agencyâs own performance, a peer group is estab- lished in Step 3. This reportâs basic peer-grouping methodol- ogy has been implemented in the freely available Web-based FTIS software. Instructions for using FTIS are provided in Appendix A. FTIS identiï¬es a set of potential peers most like the agency performing the peer-review (the âtarget agencyâ), based on a variety of criteria. The software provides the results of the screening and the calculations used in the process so that users can inspect the results for reasonableness. Once the initial peer group has been established, the secondary screen- ing factors can be applied to reduce the initial peer group to a ï¬nal set of peers. After a peer group is identiï¬ed, Step 4 compares the per- formance of the target agency to its peers. A mix of analysis techniques is appropriateânot just looking at a snapshot of the agenciesâ performance for the most recent year, but also looking at trends in the data. This effort identifies both the areas where the transit agency is doing well relative to its peers (but might be able to better) and areas where the agencyâs performance lags behind its peers. Ideally, the process does not focus on producing a âreport cardâ of performance (although one can be useful for supporting the need for per- formance improvements), but instead is used to raise ques- tions about potential reasons behind the performance and to identify peer group members that the target agency can learn from. In a true benchmarking application, the process moves on to Step 5, where the target agency contacts its best-practices peers. The intent of these contacts is to (a) verify that there are no external factors unaccounted for that explain the dif- ference in performance and (b) identify practices that could be adopted to improve oneâs own performance. A transit agency can skip this step, but it loses the value of learning what its peers have tried previously and thus risks spending resources unnecessarily in re-inventing the wheel. If a transit agency seeks to improve performance in a given area, it moves on to Step 6, developing strategies for improv- ing performance, and Step 7, implementing the strategies. The speciï¬cs of these steps depend on the particular performance- improvement need and the agencyâs resources and operating environment. Once strategies for improving performance have been imple- mented, Step 8 monitors results on a regular basis (monthly, quarterly, or annually, depending on the issue) to determine whether the strategies are having a positive effect on perfor- mance. As the agencyâs peers may also be taking steps to C H A P T E R 4 Benchmarking Methodology

improve performance, the transit agency should periodically return to Step 4 to compare its performance against its peers. In this way, a cycle of continuous performance improve- ment can be created. Figure 2 summarizes the steps involved in the benchmarking methodology. Places in the methodology where a step can be skipped or the process can end (depending on the application) are shown with dotted connectors. Step 1: Understand the Context of the Benchmarking Exercise The ï¬rst step of the process is to clearly identify the purpose of the benchmarking effort since this determines the available timeframe for the effort, the amount and kind of data that can and should be collected, and the expected ï¬nal outcomes. Examples of the kinds of benchmarking efforts that could be conducted, in order of increasing effort, are: â¢ Immediate one-time request, such as a news media inquiry following a proposed increase in fares. â¢ Short-range one-time request, such as a management focus to increase the fuel efï¬ciency of the ï¬eet in response to ris- ing fuel costs. â¢ Long-range one-time request, such as a regional planning process that is relating levels of service provision to popu- lation and employment density, or a state effort to develop a process to incorporate performance into a formula-based distribution of grant funding. â¢ Permanent internal benchmarking process, where agency performance will be evaluated broadly on a regular (e.g., annual) basis. â¢ Establishment of a benchmarking network, where peer agen- cies will be sought out to form a permanent group to share information and knowledge to help the group improve its collective performance. The level of the benchmarking exercise should also be deter- mined at this stage since it determines which of the remaining steps in the methodology will need to be applied: â¢ Level 1 (trend analysis): Steps 1, 2, and 4, and possibly Steps 6â8 depending on the question to be answered. â¢ Level 2 (peer comparison): Steps 1â4, and possibly Steps 6â8 depending on the question to be answered. â¢ Level 3 (direct agency contact): Steps 1â5, and frequently Steps 6â8. â¢ Level 4 (benchmarking networks): Steps 1â3 once, Steps 4 and 5 annually, Step 6 through participation in working groups, and Steps 7 and 8 at the discretion of the agency. Step 2: Develop Performance Measures Step 2a: Performance Measure Selection The performance measures used in a peer comparison are, for the most part, dependent on the performance ques- tion being asked. For example, a question about the cost- effectiveness of an agencyâs operations would focus on ï¬nancial outcome measures, while a question about the effectiveness of an agencyâs maintenance department could use measures related to maintenance activities (e.g., maintenance expenses), agency investments (e.g., average ï¬eet age), and maintenance outcomes (e.g., revenue miles between failures). Additional descriptive measures that provide context about peer agencies are also valuable to incorporate into a review. Because each performance question is unique, it is not pos- sible to provide a standard set of measures to use. Instead, use Chapter 3 of this report to identify 6 to 10 outcome measures that are the most applicable to the performance question, plus additional descriptive measures as desired. In addition, Chapter 5 provides case-study applications of the methodol- ogy that include examples of performance measures used for each application. Performance measures not directly available or derivable from the NTD (or from the other standardized data included with FTIS) will require contacting the other transit agencies in the peer group. If peer agency data are needed, be sure to budget plenty of time into the process to contact the peers, to obtain the desired information from them, and to compile 31 1. Understand context 2. Develop performance measures 3. Establish a peer group 4. Compare performance 5. Contact best-practices peers 6. Develop implementation strategies 7. Implement the strategy 8. Monitor results Figure 2. Benchmarking steps.

the information. Examples of common situations where out- side agency data might be required are: â¢ Performance questions involving speciï¬c service types (e.g., commuter bus routes); â¢ Performance questions involving customer satisfaction; â¢ Performance questions involving quality-of-service factors such as reliability or crowding; and â¢ Performance questions requiring detailed maintenance data. Signiï¬cant challenges exist whenever non-standardized data are needed to answer a performance question. Agencies may not collect the desired information at all or may deï¬ne desired measures differently. If data are available, they may not be com- piled in the desired format (e.g., route-speciï¬c results are pro- vided, but the requesting agency desires service-speciï¬c results). Therefore, the target agency should plan on performing addi- tional analysis to convert the data it receives into a useable form. It is often possible to obtain non-standard data from other agencies, but it does take more time and effort. Benchmarking networks are a good way for a group of tran- sit agencies to ï¬rst agree on common deï¬nitions for non-NTD measures of interest and then to set up a regular data-collection and reporting process that all can beneï¬t from. Step 2b: Identify Secondary Screening Measures This reportâs recommended peer-grouping methodology incorporates a number of factors that can inï¬uence one tran- sit agencyâs performance relative to another. However, it does not account for all potential factors. Depending on the per- formance question being asked, a secondary screening might need to be performed on the initial peer group produced by the methodology. These measures should be selected prior to forming peer groups to avoid any perception later on that the peer group was hand-picked to produce a desired result. Examples of factors that might be considered as part of a sec- ondary screening process include: â¢ Institutional structure (e.g., appointed board vs. a directly elected board): Available from NTD form B-10. (All NTD forms with publicly released data are viewable through the FTIS software.) â¢ Service operator (e.g., directly operated vs. purchased ser- vice): Although this factor is included in the peer-grouping methodology, it is not a pass/fail factor. Some performance questions, however, may require a peer group of agencies that purchase or do not purchase service. In other situa- tions, the presence, lack, or mix of contracted service could help explain performance results, and therefore, this factor would not be desirable for secondary screening. â¢ Service philosophy [e.g., providing service to as many resi- dents and worksites as possible (coverage) vs. concentrating service where it generates the most ridership (efï¬ciency)]: Determined from an Internet inspection of agency goals and/or route networks. â¢ Service area type (e.g., being the only operator in a region): This reportâs peer grouping methodology considers eight different service area types in forming peer groups, but allows peers to have somewhat dissimilar service areas. Some performance questions, however, may require exact matches. Service area information is available through FTIS; the Internet can also be used to compare agenciesâ system maps. â¢ Funding sources: Available from NTD form F-10. â¢ Vehicles operated in maximum service: Available from NTD form B-10. â¢ Peak-to-base ratio: Derivable for larger agencies (at least 150 vehicles in maximum service, excluding vanpool and demand response) from NTD form S-10. â¢ FTA population categories for grant funding: An agency may wish to compare itself only to other agencies within its FTA funding category (e.g., <50,000 population, 50,000â200,000 population, 200,000 to 1 million popu- lation, >1 million population), or a funding category it expects to move into in the future. Service area popula- tions are available on NTD form B-10, while urban area populations are available through FTIS. â¢ Capital facilities (e.g., number of maintenance facilities): Available from NTD form A-10. â¢ Right-of-way types: Available from NTD form A-20. â¢ Service days and span: Available from NTD form S-10. Some of the case studies given in Chapter 5 provide exam- ples of secondary screening. Step 2c: Identify Thresholds The peer-grouping methodology seeks to identify peer transit agencies that are similar to the target agency. It should not be expected that potential peers will be identical to the target agency, and the methodology allows potential peers to be different from the target agency in some respects. However, if a potential peer is substantially different in one respect from the target agency, it needs to be quite similar in several other respects for the methodology to identify it as a potential peer. The methodology testing determined that not all transit agencies were comfortable with having no thresholds on any given peer-grouping factorâsome thought suggested peers were too big or too small in comparison to their agency, for example, despite considerable similarity elsewhere. This report discourages setting thresholds for peer-grouping fac- 32

tors (e.g., the size that constitutes âtoo bigâ) when not needed to address a particular performance question. However, it is also recognized that the credibility and eventual success of a benchmarking exercise depends in great measure on how its stakeholders (e.g., staff, decision-makers, board, or the pub- lic) perceive the peers used in the exercise. If the peers are not perceived to be credible, the results of the exercise will be questioned. Users of the methodology at the local level are in the best position to gauge the factors that might make peers not appear credible to their stakeholders. If thresholds are to be used, users should review the method- ologyâs peer-grouping factors to determine (a) whether a threshold is needed and (b) what it should be. As with screen- ing measures, it is important to do this work in advance in order to avoid perceptions later on that the peer group was hand-picked. Step 3: Establish a Peer Group Overview The selection of a peer group is a vital part of the bench- marking process. Done well, the selection of an appropriate, credible peer group can provide solid guidance to the agency, point decision-makers towards appropriate directions, and help the agency implement realistic activities to improve its performance. On the other hand, selecting an inappropri- ate peer group at the start of the process can produce results that are not relevant to the agencyâs situation, or can produce targets or expectations that are not realistic for the agencyâs operating conditions. As discussed above, the credibility of the peer group is also important to stakeholders in the bench- marking processâif the peer group appears to be hand-picked to make the agency look good, any recommendations for action (or lack of action) that result from the process will be questioned. Ideally, between eight and ten transit agencies will ulti- mately make up the peer group. This number provides enough breadth to make meaningful comparisons without creating a burdensome data-collection or reporting effort. Some agen- cies have more unique characteristics than others, and it may not always be possible to come up with a credible group of eight peers. However, the peer group should include at least four other agencies to have sufï¬cient breadth. Examples of situations where the ideal number of peers may not be achiev- able include: â¢ Larger transit agencies generally, as there is a smaller pool of similar peers to work with; â¢ Largest-in-class transit agencies (e.g., largest bus-only oper- ators), as nearly all potential peers will be smaller or will operate modes that the target agency does not operate; â¢ Transit agencies operating relatively uncommon modes (e.g., commuter rail), as there is a smaller pool of potential peers to work with; and â¢ Transit agencies with uncommon service types (e.g., bus operators that serve multiple urban areas), as again there is a small pool of potential peers. The peer-grouping methodology can be applied to a tran- sit agency as a whole (considering all modes operated by that agency), or to any of the speciï¬c modes operated by an agency. Larger multi-modal agencies that have difï¬culty ï¬nding a suf- ï¬cient number of peers using the agency-wide peer-grouping option may consider forming mode-specific peer groups and comparing individual mode performance. Mode-speciï¬c groups are also the best choice for mode-speciï¬c evaluations, such as an evaluation of bus maintenance performance. Larger transit agencies that have difficulty finding peers may also consider looking internationally for peers, particu- larly to Canada. Statistics Canada provides data for most of the peer-grouping methodologyâs demographic screening factors, including population, population density, low-income popu- lation, and 5-year population growth for census metropolitan areas. Many Canadian transit agency websites provide basic budget and service data that can be integrated into the peer- grouping process, and Canadian Urban Transit Association (CUTA) members have access to CUTAâs full Canadian Tran- sit Statistics database (28).1 For ease of use, this reportâs basic peer-grouping method- ology has been implemented in the Web-based FTIS soft- ware, which provides a free, user-friendly interface to the full NTD. However, the methodology can also be implemented in a spreadsheet, and was used that way during the initial test- ing of the methodology. Detailed instructions on using FTIS to perform an initial peer grouping are provided in Appendix A, and full details of the calculation process used by the peer-grouping method- ology are provided in Appendix B. The following subsections summarize the material in these appendices. Step 3a: Register for FTIS The NTD component of FTIS is accessed at http://www.ftis. org/INTDAS/NTDLogin.aspx. The site is password protected, but a free password can be requested from this page. Users typ- ically receive a password within one business day. 33 1 An important difference that impacts performance ratios derived from CUTA ridership data is that U.S. ridership data are based on vehicle boardings (i.e., unlinked trips), while CUTA ridership data are based on total trips regardless of number of vehicles used (i.e., linked trips). Thus, a transit trip that includes a transfer counts as two rides in U.S. data, but only one ride in CUTA data. Unlinked trips is the sum of linked trips and number of transfers. Some larger Canadian agencies also report unlinked trip data to APTA.

Step 3b: Form an Initial Peer Group The initial peer-grouping portion of the methodology iden- tiï¬es transit agencies that are similar to the target agency in a number of characteristics that can influence performance results between otherwise similar agencies. âLikeness scoresâ are used to determine the level of similarity between a poten- tial peer agency and the target agency both with respect to indi- vidual factors (e.g., urban area population, modes operated, and service areas) and for the agencies overall. Appendix A provides detailed instructions on using FTIS to form an ini- tial peer group. Transit agencies should not expect that their peers will be exactly like themselves. The methodology allows peers to differ substantially in one or more respects, but this must be compensated by a high degree of similarity in a number of other respects. (Agencies not comfortable with having a high degree of dissimilarity in a given factor can develop and apply screening thresholds, as described in Step 2c.) The goal is to identify a set of peers that are similar enough to the target agency that credible and useful insights can be drawn from the performance comparison to be conducted in Step 4. The methodology uses the following three screening fac- tors to help ensure that potential peers operate a similar mix of modes as the target agency: â¢ Rail operator (yes/no). A rail operator is deï¬ned here as one that operates 150,000 or more rail vehicle miles annually. (This threshold is used to distinguish transit agencies that operate small vintage trolley or downtown streetcar circula- tors from large-scale rail operators.) This factor helps screen out rail-operating agencies as potential peers for bus-only operators. â¢ Rail-only operator (yes/no). A rail-only operator operates rail and has no bus service. This factor is used to screen out multi-modal operators as peers for rail-only operators. â¢ Heavy-rail operator (yes/no). A heavy-rail operator oper- ates the heavy rail (i.e., subway or rapid transit) mode. This factor helps identify other heavy-rail operators as peers for transit agencies that operate this mode. As discussed in more detail in Appendix A, bus-only oper- ators that wish to consider rail operators as potential peers can export a spreadsheet containing the peer-grouping results and then manually recalculate the likeness scores, excluding these three screening factors. Depending on the type of analysis (rail-specific vs. bus- specific or agency-wide) and the target agencyâs urban area size, up to 14 peer-grouping factors are used to identify transit agencies similar to the target agency. All of these peer-grouping factors are based on nationally available, consistently deï¬ned and reported measures. The factors are: â¢ Urban area population. Service area population would theoretically be a preferable variable to use, but it is not yet reported in a consistent way to the NTD. Instead, the methodology uses a combination of urban area population and service area typeâdiscussed belowâas a proxy for the number of people served. â¢ Total annual vehicle miles operated. This is a measure of the amount of service provided, which reflects service fre- quencies, service spans, and service types operated. â¢ Annual operating budget. Operating budget is a measure of the scale of a transit agencyâs operations; agencies with similar budgets may face similar challenges. â¢ Population density. Denser communities can be served more efï¬ciently by transit. â¢ Service area type. Agencies have been assigned one of eight service types, depending on the characteristics of their ser- vice (e.g., entire urban area, central city only, commuter service into a central city). â¢ State capital (yes/no). State capitals tend to have a higher concentration of office employment than other similarly sized cities. â¢ Percent college students. Universities provide a focal point for service and often directly or indirectly subsidize studentsâ transit usage, thus resulting in a higher level of ridership than in other similarly sized communities. â¢ Population growth rate. Agencies serving rapidly growing communities face different challenges than either agencies serving communities with moderate growth rates or agen- cies serving communities that are shrinking in size. â¢ Percent low-income population. The amount of low- income population is a factor that has been correlated with ridership levels. Low-income statistics reï¬ect both household size and conï¬guration in determining poverty status and are therefore a more robust measure than either household income or automobile ownership. â¢ Annual roadway delay (hours) per traveler. Transit may be a more attractive option for commuters in cities where the roadway network is more congested. This factor is only used for target agencies in urban areas with populations of 1 million or more. â¢ Freeway lane miles (thousands) per capita. Transit may be more competitive with the automobile from a travel- time perspective in cities with relatively few freeway lane- miles per capita. This factor is only used for target agencies in urban areas with populations of 1 million or more. â¢ Percent service demand-responsive. This factor helps de- scribe the scale of agencyâs investment in demand-response service (including ADA complementary paratransit service) 34

as compared with ï¬xed-route service. This factor is only used for agency-wide and bus-mode comparisons. â¢ Percent service purchased. Agencies that purchase their service will typically have different organization and cost structures than those that directly operate service. â¢ Distance. This factor serves multiple functions. First, it serves as a proxy for other factors, such as climate, that are more difficult to quantify but tend to become more different the farther apart two agencies are. Second, agen- cies located within the same state are more likely to oper- ate under similar legislative requirements and have similar funding options available to them. Finally, for benchmark- ing purposes, closer agencies are easier to visit and stake- holders in the process are more likely to be familiar with nearby agencies and regions. This factor is not used for rail-mode-speciï¬c peer grouping due to the relatively small number of rail-operating agencies. Likeness scores for most of these factors are determined from the percentage difference between a potential peerâs value for the factor and the target agencyâs value. A score of 0 indicates that the peer and target agency values are exactly alike, while a score of 1 indicates that one agencyâs value is twice the amount of the other. For example, if the target agency was in a region with an urbanized area population of 100,000 while the population of a potential peer agencyâs region was 150,000, the likeness score would be 0.5, as one population is 50% higher than the other. For the factors that cannot be compared by percentage difference (e.g., state cap- ital or distance), the factor likeness scores are based on for- mulas that are designed to produce similar types of resultsâa score of 0 indicates identical characteristics, a score of 1 indi- cates a difference, and a score of 2 or more indicates a sub- stantial difference. Appendix A provides the likeness score calculation details for all of the peer-grouping factors. The total likeness score is calculated from the individual screening and peer-grouping factor likeness scores as follows: A total likeness score of 0 indicates a perfect match between two agencies (and is unlikely to ever occur). Higher scores indicate greater levels of dissimilarity between two agencies. In general, a total likeness score under 0.50 indicates a good match, a score between 0.50 and 0.74 represents a satisfactory match, and a score between 0.75 and 0.99 represents poten- tial peers that may usable, but care should be taken to inves- tigate potential differences that may make them unsuitable. Peers with scores greater than or equal to 1.00 are undesirable due to a large number of differences with the target agency, Total likeness score Sum screening factor sc = ores Sum peer grouping factor scores Cou ( )+ ( ) nt peer grouping factors( ) . but may occasionally be the only candidates available to ï¬ll out a peer group. A total likeness score of 70 or higher may indicate that a potential peer had missing data for one of the screening factors. (A factor likeness score of 1,000 is assigned for missing data; dividing 1,000 by the number of screening factors results in scores of 70 and higher.) In some cases, suitable peers may be found in this group by manually re-calculating the total like- ness score in a spreadsheet and removing the missing factor from consideration, if the user determines that the factor is not essential for the performance question being asked. Missing congestion-related factors, for example, might be more easily ignored than a missing total operating budget. Step 3c: Performing Secondary Screening Some performance questions may require looking at a nar- rower set of potential peers than found in the initial peer group. For example, one case study described in Chapter 5 involves an agency that did not have a dedicated local funding source and was interested in comparing itself to peers that did have one. Another case study involves an agency in a region that was about to reach 200,000 population (thus moving into a differ- ent funding category) and wanted to compare itself to peers that were already at 200,000 population or more. Some agen- cies may simply want to make sure that no peer agency is âtoo differentâ to be a potential peer for a particular application. Data contained in FTIS can often be used to perform these kinds of screenings. Some other kinds of screening, for exam- ple based on agency policy or types of routes operated (e.g., commuter bus or BRT), will require Internet searches or agency contacts to obtain the information. The general process to follow is to ï¬rst identify how many peers would ideally end up in the peer group. For the sake of this example, this number will be eight. Starting with the highest- ranked potential peer (i.e., the one with the lowest total like- ness score), check whether the agency meets the secondary screening criteria. If the agency does not meet the criteria, replace it with the next available agency in the list that meets the screening criteria. For example, if the #1-ranked potential peer does not meet the criteria, check the #9-ranked agency next, then #10, and so forth, until an agency is found that meets the criteria. Repeat the process with the #2-ranked potential peer. Continue until a group of eight peers that meets the sec- ondary screening criteria is formed, or until a potential peerâs total likeness score becomes too high (e.g., is 1.00 or higher). Table 15 shows an example of the screening process for Knoxville Area Transit, using âexistence of a dedicated local funding sourceâ as a criterion. The top 20 âmost similarâ agencies to Knoxville are shown in the table in order of their total likeness score. The table also shows whether or not each agency has a dedicated local funding source. In this case, 35

seven of Knoxvilleâs top eight peers have a dedicated local funding source. Connecticut TransitâNew Haven Division does not, so it would be replaced by the next-highest peer in the list that doesâin this case, Western Reserve Transit Authority. Although it is the 16th-most-similar agency in the list, it still has a good total likeness score of 0.53. Although not needed in this example, some user judgment might be needed about the extent of dedicated local funding that would qualify. Some local funding sources might only provide 1% or less of an agencyâs total operating revenue, for example. Step 4: Compare Performance The performance measures to be used in the benchmarking effort were speciï¬ed during Step 2a. Now that a ï¬nal peer group has been identiï¬ed, Step 4 focuses on gathering the data associ- ated with those performance measures and analyzing the data. Step 4a: Gather Performance Data NTD Data Performance measures that are directly collected by the NTD or can be derived from NTD measures can be obtained through FTIS. The process for doing so is described in detail in Appendix A. NTD measures provide both descriptive infor- mation such as operating costs and revenue hours and out- come measures such as ridership. Many useful performance measures, however, are ratios of two other measures. For example, cost per trip is a measure of cost-effectiveness, cost per revenue hour is a measure of cost-efficiency, and trips per revenue hour is a measure of productivity. None of these ratios is directly reported by the NTD, but all can be derived from other NTD measures. FTIS provides many common performance ratios, and any ratio derivable from NTD data can be calculated by exporting it from FTIS to a spreadsheet. One potential concern that users may have with NTD data is the time lag between when data are submitted and when data are officially released, which can be up to 2 years. Rapidly changing external conditionsâfor example, fuel price increases or a downturn in the economyâmay result in the most recent conditions available through the NTD not being reï¬ective of current conditions. There are several ways that these data lag issues can be addressed if they are felt to be a concern: 1. Request NTD viewer passwords directly from the peer agencies. These passwords allow users to view, but not alter, data ï¬elds in the various NTD forms. As long as agencies are willing to share their viewer passwords, the agency perform- 36 Agency City State Likeness Score Dedicated Local Funding? Use as Peer? Knoxville Area Transit Knoxville TN 0.00 1 W inston-Salem Transit Authority Winston-Salem NC 0.25 Yes 2 S outh Bend Public Transportation Corporation South Bend IN 0.36 Yes 3 B irmingham-Jefferson County Transit Authority Birmingham AL 0.36 Yes 4 C onnecticut Transit - New Haven Division New Haven CT 0.39 No 5 F ort Wayne Public Transportation Corporation Fort Wayne IN 0.41 Yes 6 T ransit Authority of Omaha Omaha NE 0.41 Yes 7 C hatham Area Transit Authority Savannah GA 0.42 Yes 8 S tark Area Regional Transit Authority Canton OH 0.44 Yes 9 T he Wave Transit System Mobile AL 0.46 No 10 Capital Area Transit Raleigh NC 0.48 No 11 Capital Area Transit Harrisburg PA 0.48 No 12 Shreveport Area Transit System Shreveport LA 0.49 No 13 Rockford Mass Transit District Rockford IL 0.50 No 14 Erie Metropolitan Transit Authority Erie PA 0.52 No 15 Capital Area Transit System Baton Rouge LA 0.52 No 16 Western Reserve Transit Authority Youngstown OH 0.53 Yes 17 Central Oklahoma Transportation & Parking Auth. Oklahoma City OK 0.53 No 18 Des Moines Metropolitan Transit Authority Des Moines IA 0.55 No 19 Mass Transportation Authority Flint MI 0.56 Yes 20 Escambia County Area Transit Pensacola FL 0.57 No Table 15. Example secondary screening process for Knoxville Area Transit.

ing the peer comparison has access to the most up-to-date information available. 2. Request data from state DOTs. Many states require their transit agencies to report NTD data to them at the same time they report it to the FTA. 3. Review trends in NTD monthly data. The following vari- ables are available on a monthly basis, with only an approx- imate 6-month time lag: unlinked passenger trips, revenue miles, revenue hours, vehicles operated in maximum ser- vice, and number of typical days operated in a month. 4. Review trends in oneâs own data. Are unusual differences between current data and the most-recent NTD data due to external, national factors that would tend to affect all peers (in which case conclusions about the target agencyâs performance relative to its peers should still be valid), or are they due to agency- or region-speciï¬c changes? With either of the ï¬rst two options, it should be kept in mind that data obtained prior to their ofï¬cial release from the NTD may not yet have gone through a full quality-control check. Therefore, performing checks on the data as described in Step 4b (e.g., checking for consistent trends) is particularly recommended in those cases. Peer Agency Data Transit agencies requesting data for a peer analysis from other agencies should accompany their data request with the following: (a) an explanation of how they plan to use the data and whether the peer agencyâs data and results can or will be kept conï¬dential, and (b) a request for documenting how the measures are deï¬ned and, if appropriate, how the data for the measures are collected. Transit agencies may be more willing to share data if they can be assured that the results will be kept conï¬dential. This avoids potential embarrassment to the peer agency if they turn out to be one of the worst-in-group peers in one or more areas, and also saves them the potential trouble of having to explain differences in results to their stakeholders if they do not agree with the studyâs methodology or result interpretations. In one of the case studies conducted for this project, for example, one agency was not interested in sharing customer-satisfaction data because they disagreed with the way the target agency calcu- lated and used a publicly reported customer-satisfaction index. The potential peer did not want to be publicly compared to the target agency using the target agencyâs methodology. Conï¬dentiality can be addressed in a peer-grouping study by identifying which transit agencies were selected as peers but not publicly identifying the speciï¬c agency associated with a speciï¬c data point in graphs and reports. This information would, of course, be available internally to the agency (to help them identify best-in-group peers), but conclusions about where the target agency stands relative to its peers can still be made and supported when the peer agency results are shown anonymously. The graphs that accompany the examples of data-quality checks in Step 4b give examples of how informa- tion can be presented informatively yet conï¬dentially. It is important to understand how measures are deï¬ned andâin some casesâhow the data were collected. For exam- ple, on-time performance is a commonly used reliability measure. However, there are wide variations in how transit agencies deï¬ne âon-timeâ (e.g., 0 to 5 minutes late vs. 1 minute early to 2 minutes late) that inï¬uence the measureâs value, since a more generous range of time that is considered âon-timeâ will result in a higher on-time performance value (1). In addi- tion, the location where on-time performance is measuredâ departure from the start of the route, a mid-route point, or arrival at the routeâs terminalâcan influence the measure results. For a peer agencyâs non-NTD data to be useful for a peer- comparison, the measure values need to be deï¬ned similarly, or the measure values need to be re-calculated from raw data using a common deï¬nition. The likelihood of having similar deï¬nitions is highest when an industry standard or recom- mended practice exists for the measure. For example, at the time of writing, APTA was developing a draft standard on deï¬ning rail transit on-time performance (32), while TCRP Report 47 (43) provides recommendations on phras- ing customer-satisfaction survey questions. The likelihood of being able to calculate measures from raw data is highest when the data are automatically recorded and stored (e.g., data from automatic passenger counter or automated vehicle location equipment) or when a measure is derived from other measures calculated in a standardized way. Normalizing Cost Data Transit agencies will often want to normalize cost data to (a) reï¬ect the effects of inï¬ation and (b) reï¬ect differences in labor costs between regions. Adjusting for inï¬ation allows a trend analysis to clearly show whether an agencyâs costs are changing at a rate faster or slower than inï¬ation. Adjusting for labor costs differences makes it easier to draw conclusions that differences in costs between agencies are due to internal agency efï¬ciency differences rather then external cost differ- ences. Some of the case studies in Chapter 5 provide exam- ples of performing inï¬ation and cost-of-living adjustments; the general process is described below. The consumer price index (CPI) can be used to adjust costs for inï¬ation. CPIs for the country as a whole, regions of the country, and 26 metropolitan areas are available from the Bureau of Labor Statistics (BLS) website (http://www.bls.gov/ cpi/data.htm). FTIS also provides the national CPI. To adjust costs for inflation, multiply the cost by (base year CPI)/ 37

(analysis year CPI). For example, the national CPI was 179.9 for 2002 and 201.6 for 2006. To adjust 2002 prices to 2006 lev- els for use in a trend analysis, 2002 costs would be multiplied by (201.6/179.9) or 1.121. Average labor wage rates can be used to adjust costs for dif- ferences in labor costs between regions since labor costs are typically the largest component of operating costs. These data are available from the Bureau of Labor Statistics (http://www. bls.gov/oes/oes_dl.htm) for all metropolitan areas. The âall occupationsâ average hourly rate for a metropolitan area is recommended for this adjustment because the intent here is to adjust for the general labor environment in each region, over which an agency has no control, rather than for a tran- sit agencyâs actual labor rates, over which an agency has some control. Identifying differences in a transit agencyâs labor costs, after adjusting for regional variations, can be an important out- come of a peer-comparison evaluation. Although it is possible to drill down into the BLS wage database to get more-speciï¬c dataâfor example, average wages for âbus drivers, transit and intercityââthe ability to compare agency-controllable costs would be lost because the more-detailed category would be dominated by the transit agencyâs own workforce. The âall occupationsâ rates, on the other hand, allow an agency to (a) investigate whether it is spending more or less for its labor relative to its regionâs average wages, and (b) adjust its costs to reï¬ect differences in a regionâs overall cost of living (which impacts overall average wages within the region). To adjust peer agency costs for differences in labor costs, multiply the cost by (target agency metropolitan area labor cost)/(peer agency metropolitan area labor cost). For exam- ple, Denverâs average hourly wage rate in 2008 was $22.67, while Portlandâs was $21.66. If Denver RTD is performing the analysis and wants to adjust TriMet costs to reï¬ect the higher wages in the Denver region, it would multiply TriMet costs by (22.67/21.66), or 1.047. Step 4b: Analyze Performance Data Checking Before diving into a full analysis of the data, it is useful to create graphs for each measure to check for potential data problems, such as unusually high or low values for a given agencyâs performance measure for a given year, and for values that bounce up and down with no apparent trend. The follow- ing ï¬gures give examples of these kinds of checks. Figure 3 illustrates outlier data points. Peer 4 has an obvi- ous outlier for the year 2003. As it is much higher than the agencyâs other values (including prior years, if one went back into the database) and is much higher than any other agencyâs values, that data point could be discarded. The rest of Peer 4âs data show consistent trends; however, since this agency had an outlier and would be the best-in-group performer for this measure, it would be worth a phone call to the agency to con- ï¬rm the validity of the other yearsâ values. Peer 5 also has an outlier for the year 2004. The value is not out of line with other agenciesâ values, but is inconsistent with Peer 5âs over- all trend. In this case, a phone call would ï¬nd out whether the agency tried (and then later abandoned) something in 2004 that would have improved performance, or whether the data point is simply incorrect. In Figure 4, Peer 2âs values for the percent of breaks and allowances as part of total operating time are nearly zero and 38 Demand Response 0 10 20 30 40 50 60 Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Tampa Fa re bo x Re co ve ry (% ) 2003 2004 2005 2006 2007 Figure 3. Outlying data points example.

far below those of the other agencies in the peer group. It might be easy to conclude that this is an error, as vehicle oper- ators must take breaks, but this would be incorrect in this case. According to the NTD data deï¬nitions, breaks that are taken as part of operator layovers are counted as platform time, whereas paid breaks and meal allowances are consid- ered straight time and are accounted for differently. There- fore, Peer 2âs values could actually be correct (and are, as conï¬rmed by a phone call). Peer 7 is substantially higher than the others and may be treating all layover time as break time. The conclusion to be drawn from this data check is that the measure being used will not provide the desired information (a comparison of schedule efï¬ciency). Direct agency contacts would need to be made instead. Figure 5 shows a graph of spare ratio (the number of spare transit vehicles as a percentage of transit vehicles used in max- imum service). As Figure 5(a) shows, spare ratio values can change significantly from one year to the next as new bus fleets are brought into service and old bus fleets are retired. It can be difficult to discern trends in the data. Figure 5(b) shows the same variable, but calculated as a three-year rolling average (i.e., year 2007 values represent an average of the actual 2005â2007 values). It is easier to discern from this ver- sion of the graph that Denverâs average spare ratio (along with Peer 1, Peer 3, and Peer 4) has held relatively constant over the longer term, while Peer 2âs average spare ratio has decreased over time and the other two peersâ spare ratios have increased over time. In this case, there is no apparent problem with the 39 Agency-wide 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% UTA Br ea ks & A llo w an ce s vs . T ot al O pe ra tin g Ti m e 2002 2003 2004 2005 2006 Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7 Figure 4. Outlying peer agency example. Motorbus 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Denver Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7Denver Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7 Sp ar e Ra tio (% ) 2003 2004 2005 2006 2007 2003 2004 2005 2006 2007 (a) Spare Ratio Annual Values (b) Spare Ratio as a Three-Year Rolling Average Motorbus Sp ar e Ra tio (3 -Y ea r R oll ing A ve rag e) (% ) Figure 5. Data volatility example.

data, but the data check has been used to investigate a poten- tially better way to analyze and present the data. Data Interpretation For each measure selected for the evaluation, the target agencyâs performance is compared to the performance of the peer agencies. Ideally, this evaluation would look both at the target agencyâs current position relative to its peers (e.g., best- in-class, superior, average, inferior), and the agencyâs trend. Even if a transit agencyâs performance is better than most of its peers, a trend of declining performance might still be a cause for concern, particularly if the peer trend was one of improving performance. Trend analysis also helps iden- tify whether particularly good (or bad) performance was sustained or was a one-time event, and can also be used for forecasting (e.g., agency performance is below the agencyâs target at present, but if current trends continue is forecast to reach the agencyâs target in 2 years). Graphing the performance-measure values is a good ï¬rst step in analyzing and interpreting the data. Any spreadsheet program can be used, and FTIS also provides basic graphing functions. It may be helpful to start by looking at patterns in the data. In Figure 6, for example, it can be seen that the gen- eral trend in the data for all peers, except Peer 7, has been an increase in operating costs per boarding over the 5-year period, with Peers 3â6 experiencing steady and signiï¬cant increases each year. Denverâs cost per boarding, in compari- son, has consistently been the second-best in its peer group during this time, and Denverâs cost per boarding has increased by about half as much as the top-performing peer. Most of Denverâs peers also experienced a sharp increase in costs dur- ing at least one of the years included in the analysis, while Denverâs year-to-year change has been relatively small and, therefore, more predictable. This analysis would indicate that Denver has done a good job of controlling cost per boarding, relative to its peers. Sometimes a measure included in the analysis may turn out to be misleading. For example, farebox recovery (the por- tion of operating costs covered by fare revenue) is a com- monly used performance measure in the transit industry and is readily available through FTIS. When this measure is applied to Knoxville, however, Knoxvilleâs fare recovery ratio is by far the lowest of its peers, as indicated in Figure 7(a). Given that Knoxvilleâs performance is among the best in its peer group in a number of other measures, an analyst should ask why this result occurred. Clues to the answer can be obtained through a closer inspection of the NTD data. NTD form F-10, available within FTIS, provides informa- tion about each agencyâs revenue, broken down by a number of sources. For 2007, this form shows that Knoxville earned nearly as much revenue from âother transportation revenueâ as it did from bus fares. A visit to the agencyâs website, where budget information is available, conï¬rms that the agency receives revenue from the University of Tennessee for oper- ating free shuttle service to the campus and sports venues. Therefore, farebox recovery is not telling the entire story about how much of Knoxvilleâs service is self-supporting. As an alternative, all directly generated non-tax revenue used for operations can be compared to operating costs (a measure known as the operating ratio). This requires more work, as non-fare revenue should be allocated among the various 40 Motorbus $0.00 $1.00 $2.00 $3.00 $4.00 $5.00 $6.00 $7.00 Co st p er B oa rd in g 2002 2003 2004 2005 20062006 peer median Denver Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7 Figure 6. Pattern investigation example.

modes operated (it is only reported on a system-wide basis), but all of the required data to make this allocation is available through FTIS, and the necessary calculations can be readily performed within a spreadsheet. Figure 7(b) shows the results of these calculations, where it can be seen that Knoxville used to be at the top of its peer group in terms of operating ratio but is now in the middle of the group, as the university payments apparently dropped sub- stantially in 2006. A comparison of the two graphs also shows that Knoxville is the only agency among its peers (all of whom have dedicated local funding sources) to get much directly gen- erated revenue at present from anything except fares. A ï¬nal example of data interpretation is shown in Figure 8, comparing agenciesâ annual casualty and liability costs, nor- malized by annual vehicle miles operated. This graph tells several stories. First, it can be clearly seen that a single serious accident can have a signiï¬cant impact on a transit agencyâs casualty and liability costs in a given year because many agen- cies are self-insured. Second, it shows how often the peer group experiences serious accidents. Third, it indicates trends in casualty and liability costs over the 5-year period. Eugene, Peer 3, and Peer 6 were the best performers in this group over the study period, while Peer 7âs costs were consistently higher than the group as a whole. 41 Agency-wide 0 10 20 30 40 50 60 70 Eugene Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7 Peer 8 Ca su al ty an d lia bi lit y co st p er v eh ic le m ile (ce nts ) 2003 2004 2005 2006 20072007 peer median Figure 8. Data interpretation example #2. Motorbus 0% 5% 10% 15% 20% 25% 30% 35% Knoxville Fa re bo x Re co ve ry (% ) 2003 2004 2005 2006 2007 2007 peer median (a) Farebox Recovery (b) Directly Generated Funds Recovery Motorbus 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Di re ct ly Ge n er at ed Fu nd s Re co ve ry (% ) 2003 2004 2005 2006 2007 2007 peer median Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7 Peer 8 Knoxville Peer 1 Peer 2 Peer 3 Peer 4 Peer 5 Peer 6 Peer 7 Peer 8 Figure 7. Data interpretation example #1.

Results Presentation The results of the data analysis will need to be documented for presentation to the stakeholders in the process. The exact form will depend on the audience, but can include any or all of the following: â¢ An executive summary highlighting the key ï¬ndings, â¢ A summary table presenting a side-by-side comparison of the numeric results for all the measures for all the peers, â¢ Graphs, potentially including trend indicators (such as arrows) or lines indicating the group average, â¢ A combination of graph and table, with the table providing the numeric results to accompany the graph, â¢ A combination of graph and text, with the text interpret- ing the data shown in the graph, â¢ Multiple graphs, with one or more secondary graphs show- ing descriptive data that support the interpretation of the main graph, and â¢ Graphics that support the interpretation of text, tables, and/or graphs. Peer group averages can be calculated as either means or medians. Means are more susceptible to being influenced by a transit agency with particularly good or particularly poor per- formance, while medians provide a good indicator of where the middle of the group lies. The case studies in Chapter 5 and the material in Appen- dix C give a variety of examples of how performance informa- tion can be presented. TCRP Report 88 (1) contains a section providing guidance on presenting performance results, and publications available on the European benchmarking network websites (14â16, 47) can also be used as examples. Step 5: Contact Best-Practices Peers At this point in the process, a transit agency knows where its performance stands with respect to its peers, but not the reasons why. Contacting top-performing peers addresses the âwhyâ aspect and can lead to identifying other transit agenciesâ prac- tices that can be adopted to improve oneâs own performance. In most cases a transit agency will ï¬nd one or more areas where it is not the best performer among its peers. An agency with superior performance relative to most of its peers, and possessing a culture of continuous improvement, would con- tinue the process to identify what it can learn from its top- performing peers to improve its already good performance. When an agency identiï¬es areas of weakness relative to its peers, it is recommended that it continue the benchmarking process to see what it can learn from its best-performing peers. For Level 1 and 2 benchmarking efforts, it is possible to skip this step and proceed directly to Step 6, developing an implementation strategy. However, doing so carries a higher risk of failure since agencies may unwittingly choose a strat- egy already tried unsuccessfully elsewhere or may choose a strategy that results in a smaller performance improvement than might have been achieved with alternative strategies. Step 5 is the deï¬ning characteristic of a Level 3 benchmark- ing effort, while the working groups used as part of a Level 4 benchmarking effort would automatically build this step into the process. Step 5 would also normally be built into the pro- cess when a benchmarking effort is being conducted with an eye toward changing how the agency conducts business. The kind of information that is desired at this step is beyond what can be found from databases and online sources. Instead, executive interviews are conducted to determine how the best- practices agencies have achieved their performance, to identify lessons learned and factors that could inhibit implementa- tion or improvement, and to develop suggestions for the tar- get agency. There are several formats for conducting these interviews, which can be tailored for the speciï¬c needs of the performance review. â¢ Blue ribbon panels of expert staff and/or top management from peer agencies are appropriate to bring in for one-time- only or limited-term reviews, such as a special management focus on security or a large capital project review. â¢ Site visits can be useful for hands-on understanding of how peer agencies operate. The staff involved could range from line staff to top management, depending on the speciï¬c issues being addressed. â¢ Working groups can be established for topic-speciï¬c dis- cussions on performance, such as a working group on pre- ventative maintenance practices. Line staff and mid-level management in the topic area would be most likely to be involved. The private sector has also used staff exchanges as a way of obtaining a deeper understanding of another organizationâs business practices by having one or two select staff become immersed in the peer organizationâs activities for an extended period of time. Involving staff from multiple levels and functions within the transit agency helps increase the chances of identifying good practices or ideas, helps increase the potential for staff buy-in into any recommendations for change that are made as a result of the contacts, helps percolate the concept of continuous improvement throughout the transit agency, and helps pro- vide opportunities for staff leadership and professional growth. Step 6: Develop an Implementation Strategy In Step 6, the transit agency develops a strategy for making changes to the current agency environment, with the goal of improving its performance. Ideally, the strategy development 42

process will be informed by a study of best practices, which would have been performed in Step 5. The strategy should include performance goals (i.e., quantify the desired outcome) and provide a timeline for implementation, and should iden- tify any required funding. The strategy also needs to identify the internal (e.g., business practices or agency policies) or external (e.g., regional policies or new revenue sources) changes that would be needed to successfully implement the strategy. Top- level management and transit agency board support is vital to getting the process underway. However, support for the strat- egy will need to be developed at all levels of the organization: lower-level managers and staff also need to buy into the need for change and understand the potential beneï¬ts of change. Therefore, the implementation strategy should also include details on how information will be disseminated to agency staff and external stakeholders and should include plans for devel- oping internal and external stakeholder support for imple- menting the strategy. Step 7: Implement the Strategy TCRP Report 88 (1) identiï¬ed that once a performance evaluation is complete and a strategy is identiï¬ed, the process can often halt due to lack of funding or stakeholder support. If actual changes designed to improve performance are not implemented at the end of the process, the peer review risks becoming a paper exercise, and the lack of action can reduce stakeholder conï¬dence in the effectiveness of future perfor- mance evaluations. If problems arise during implementation, the agency should be prepared to address them quickly so that the strategy can stay on course. Step 8: Monitor Performance As noted in Step 6, the implementation strategy should include a timeline for results. A timeline for monitoring should also be established to make sure that progress is being made toward the established goals. Depending on the goal and the overall strategy timeline, the reporting frequency could range from monthly to annually. If the monitoring effort indi- cates a lack of progress, the implementation strategy should be revisited and revised if necessary. Hopefully, however, the monitoring will show that performance is improving. In the longer term, the transit agency should continue its peer-comparison efforts on a regular basis. The process should be simpler the second time around because many or all of the agencyâs peers will still be appropriate for a new effort, points of contact will have been established with the peers, and the agencyâs staff will now be familiar with the process and will have seen the improvements that resulted from the ï¬rst effort. The agencyâs peers hopefully will also have been working to improve their own performance, so there may be something new to learn from themâeither by investigating a new per- formance topic or by revisiting an old one after a few years. A successful initial peer-comparison effort may also serve as a catalyst for forming more-formal performance-comparison arrangements among transit agencies, perhaps leading to the development of a benchmarking network. 43

Next: Chapter 5 - Case Studies »

A Methodology for Performance Measurement and Peer Comparison in the Public Transportation Industry (2010)

Chapter: Chapter 4 - Benchmarking Methodology

Welcome to OpenBook!

Get Email Updates