Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 EVALUATION OF CURRENT PROGRAMS Given the five criteria identified as relevant to evaluating current programs identified in the previous chapter, a method to actually perform the evaluation is needed. In this chapter the Analytic Hierarchy Process is presented as a suitable method for performing this evaluation. The process is applied to current programs, and the results are discussed. ANALYTIC HIERARCHY PROCESS The Analytic Hierarchy Process91 (AHP) is a quantitative method for making complex decisions. The process relies on estimating the magnitude of difference between choices by making simple comparisons. Through the AHP the simple comparisons are used to first evaluate the relative weight of each criteria, and then to evaluate each of the alternatives according to the criteria. The result is a set of relative âweightsâ for each of the criteria and a quantitative score for each alternative that represents the preferences of the participants. The process works by defining a goal in terms of a hierarchy of criteria (and possibly sub criteria), and then evaluating each of the alternatives within those criteria. This is shown in Figure 14. Figure 14: Goal, Criteria, and Alternatives in AHP In the first step, pairwise comparisons are made between the different criteria. For each pair of criteria, a comparison is made to determine which criteria is more important and how much more important it is. From these comparisons the AHP process identifies the relative importance of each criterion. Once the relative importance of the criteria is identified, the alternatives are evaluated. Within each criterion the various alternatives can be compared with one another in a similar manner to the first step. These comparisons are used to generate a score for each alternative within a specific criterion. After completing this 44 Source: NTMââENVIRONMENTAL DATA FOR INTERNATIONAL CARGO TRANSPORT, ROAD TRANSPORT EUROPE, Version 2010-06-17, Page 9.
process for all the criteria the total score for each alternative is calculated by weighting the score within each criterion by the relative importance of that criterion. This produces an overall evaluation of each alternative with respect to the goal. The AHP process is well suited to group decision making, where consensus must be reached between many group members. By structuring the decision in the form of a hierarchy and then focusing attention on individual components, AHP amplifies a groupâs decision-making capabilities. It does not require numeric guesses to quantify results; instead it accommodates subjective judgments by using a ratio scale92. Given the many different types of stakeholders interested in the carbon footprint of transportation, as well as the large number of programs to be evaluated, AHP is well suited to the problem. APPLICATION OF AHP In order to evaluate current tools for measuring the carbon footprint of transportation in the supply chain, a workshop featuring many different stakeholders was held at MIT on October 25th, 2012. The workshop featured 16 participants in the AHP exercise, drawn from a number of different industries. This included carriers (road, drayage, rail, ocean), shippers (high tech, retail, apparel, chemicals, beverages), 3PLs, and other stakeholders (government, NGO, research, equipment manufacturers). All participants had some previous familiarity with carbon footprint tools for transportation, and ranged in experience from lead engineers to vice presidents. 45 Source: Reprinted from European Journal of Operational Research, Vol. 48 (1), Thomas Saaty, "How to make a decision: The analytic hierarchy process", Page 15, 1990 with permission from Elsevier. Intensity of importance on an absolute scale Definition Explanation 1 Equal importance Two elements contribute equally to the objective 3 Moderate importance of one over another Experience and judgment moderately favor one element over another 5 Essential or strong importance Experience and judgment strongly favor one element over another 7 Very strong importance One element is favored very strongly over another, its dominance is demonstrated in practice 9 Extreme importance The evidence favoring one element over another is of the highest possible order of affirmation 2, 4, 6, 8 Intermediate values between the two adjacent judgments When compromise is needed Table 3: The Fundamental Scale92 At the workshop the five criteria were presented to the participants and discussed in the context of current programs and views on transportation. After presentation
Criteria A Criteria B Intensity Breadth Comparability 1.75 Breadth Depth 1.55 Breadth Precision 1.50 Breadth Verifiability 1.11 Comparability Depth 4.04 Comparability Precision 3.40 Comparability Verifiability 1.95 Depth Precision 1.01 Depth Verifiability 1.64 Precision Verifiability 1.05 Table 4: Criteria Preference The pairwise comparisons show a clear preference for comparability as a criterion, as it was judged more important than each of the other four criteria. It also recorded the strongest intensity of importance, with it being considered between moderately and strongly more important than depth and precision. Verifiability and breadth showed the next highest importance. Verifiability was rated as more important than each of the criteria, except comparability. The relative strength of the importance was not overly strong with scores ranging from 1.05-1.95. Breadth was judged more important than depth and precision, but less so than verifiability and comparability. However, the average strength of preference for breadth was slightly higher than for verifiability. A particularly useful aspect of AHP is the ability to turn the pairwise comparisons into a quantitative evaluation of their importance. Applying the AHP process to the participantâs ratings produced a relative weight for the importance of each criterion. These weights are shown in Figure 15. 46 After the responses were collected from the 16 participants, the results were averaged to produce a consensus judgment for the group as a whole. The results of this analysis are shown in Table 4. The criteria determined to be more important is shown in bold and underlined. The relative intensity of the importance of the chosen criteria is shown in the intensity column. of the criteria, the participants in the workshop provided their individual input on the relative importance of each of the criteria. This was done through a series of pairwise comparisons between each criterion. Each participant was asked to determine which of the two criteria was more important, and to judge the relative magnitude of that relationship based on the scale shown in Table 3. This was repeated for each of the 10 possible pairs of criteria.
Figure 15: Relative Importance of Criteria The quantitative results indicate the strong preference for comparability as the most important criterion, with a relative weighting of 39%. Of the remaining criteria, breadth and verifiability were judged to be next most important, with weightings of 19% and 18% respectively. The slightly higher weighting for breadth represents the higher average intensity of preference compared to precision and depth, as well as the lower intensity of preference for comparability in comparison. This explains why breadth is judged to be overall slightly more important than verifiability, even though verifiability was judged more important in the pairwise comparison. Precision and depth were judged to be least important, with relative weightings of 13% and 11%. In addition to the relative weightings of the criteria, a measure of the inconsistency of the ratings was calculated. The average scores of the group produced an inconsistency rating of .00921, indicating a very consistent set of beliefs. In general applications of AHP an inconsistency ratio of less than 0.1 is considered to be consistent. With the relative weightings of the criteria determined, it is now possible to evaluate current programs by comparing their performance within each criterion. EVALUATING ALTERNATIVES There are two primary methods for evaluating the different alternatives within each criterion: relative measurement and absolute measurement 93 . Relative measurement works in a similar manner to the procedure for criteria weighting, with each alternative being pairwise compared with the others and assigned a relative intensity of preference under each criterion. The results of the pairwise comparisons are then used to generate scores for each alternative within that criterion. In absolute measurement the alternatives are not compared with each other, instead they are compared against a set of absolute standards that are established 39% 19% 18% 13% 11% Comparability Breadth Verifiability Precision Depth 47
for each criterion. The standards themselves are compared with each other under each criterion in order to develop the relative scores achieved by meeting each standard. This allows for creation of standards that use concepts such as high, medium, and low or A, B, C, D, and F letter grades. For the evaluation of existing programs an absolute measurement approach was used. This approach has two primary advantages over relative measurement. First, it allows for the evaluation of a large number of alternatives. In a relative measurement scheme the number of comparisons required increases as additional alternatives are added. For the five criteria evaluated during the workshop each participant made a total of 10 comparisons. If five alternatives were to be compared using a relative measurement it would require 10 comparisons to be made for each of the five criteria, a total of 50 comparisons. The total number of comparisons can increase quicklyâit would require 225 comparisons to handle 10 alternatives and more than 24,750 comparisons for 100 alternatives. Under absolute measurement, each alternative need only be compared to the standards for each criterion, requiring significantly less total comparisons. Second, relative measurements are sensitive to the addition of new alternatives, even if those alternatives are copies of existing alternatives. This can include rank reversalâwhere the addition on a new alternative may cause two existing alternatives to switch their order in the ranking. This phenomenon does not occur with absolute measurements, so if new alternatives are added to the process it will not cause a change in the preference order of the previously existing alternatives. In order to perform the absolute measurement, a series of standards were established to rank alternatives as achieving high, medium, or low performance in each criterion. The standards for high, medium, and low within each criterion were based on the review of the current programs and discussion during the workshop held at MIT. In addition, the relative importance of achieving each rank in each criterion was developed based on the guidelines given in Table 3. For each criterion, a score of low was given the baseline value of one, and the medium and high scores were evaluated based on their relative preference to the low standard. For internal consistency, the relative preference of the high standard to medium was assumed to be simply the ratio of their relative weights in comparison to the low standard. For example, the preference for high to medium in the case of breadth is defined as 1.14, reflecting the ratio of 8:7. The standards and relative weights used for each of the five criteria are shown in Table 5. 48
Criteria Measure Description Weight Breadth High Includes all modes plus logistics activities 8 Medium All four main modes (road/air/water/rail) 7 Low Single mode 1 Comparability High Standardized boundaries and output measures 8 Medium Single standardized data and methodology 5 Low Multiple methodology and data options 1 Depth High Full Life Cycle Assessment 6 Medium Well to Wheel analysis 5 Low Direct emissions only 1 Precision High Shipment level reporting 7 Medium Carrier level reporting 5 Low National/Industry Average 1 Verifiability High External audit/verification required 5 Medium Methodology and data are publicly available 2 Low No verification/non-standardized data 1 Table 5: Absolute Criteria Measures The weights were determined based on discussion with participants of the October 25th workshop and the estimated value of meeting higher standards. The use of different weights for scores of high, medium, and low in each criterion allows for differences in the value of achieving higher scores in different criteria to be captured in the final evaluation. An increase from low to medium in verifiability is only slightly preferred, as the benefits are judged to be of relatively small value. In contrast, an increase from low to medium in breadth is of strong importance due to the value in having all four modes considered in the tool. Using the weights given in Table 5, the AHP methodology was used to develop a score, within each criterion, for achieving a given level of the standard. The scores were normalized by setting a score of 1.00 for achieving the high standard within each criterion. These scores are shown in Table 6, and reflect the values that will be used to evaluate existing programs. 49
Criteria Measure Score Breadth High 1.00 Medium 0.88 Low 0.13 Comparability High 1.00 Medium 0.63 Low 0.13 Depth High 1.00 Medium 0.83 Low 0.17 Precision High 1.00 Medium 0.71 Low 0.14 Verifiability High 1.00 Medium 0.40 Low 0.20 Table 6: Scores of Criteria Measures The relatively high importance attached to achieving a medium level of breadth reflects the need for a tool capable of handling each of the main transportation modes. The addition of other logistics activities increases the breadth to capture associated activities, but these are generally considered to have a minor impact on emissions when compared to the actual transportation. This explains the only slightly greater score for achieving a rating of high. For comparability, the use of a standardized set of methods and data ensures that comparisons between different organizations are based on the same methods. This was judged to be strongly more important than a tool having multiple options. The addition of guidelines on setting standardized boundaries for what emissions should be included, as well as providing some measure of standardization in the output of a relative efficiency score, provide additional benefit. The majority of emissions from most transportation fuels are produced during direct combustion, and even a low level of depth might capture most of the relevant emissions. When alternative fuels and electric vehicles are considered; however, a WTW approach is more suited to capturing the relevant emissions. For this reason a score of medium for depth was judged to be strongly more important. Adding additional life cycle impacts such as infrastructure or vehicle production add only marginal benefit, and thus a score of high was not judged significantly more important that a score of medium. The importance of precision was based on discussion with participants during the workshop. The participants expressed a preference for tools that were capable of providing differentiation between different carriers, but that shipment level reporting was not significantly more important. For this reason a score of medium, reflecting a carrier-specific level of precision, was judged to be moderately more 50
important than a tool that used average values, while a shipment-level precision was only slightly more important. Verifiability represents the most difficult criteria to judge. Most tools rely on the user to input accurate and true data, and only through some manner of external verification can this be checked. Such verification is often costly and time consuming, but some programs, such as the CDP, GHG Protocol, and carbon label standards require this level of verifiability. This high level of verifiability was judged moderately more important than a low level, reflecting the difficulties that such verification presents. Verifiability may also be increased by transparency in methods and data sources, and this transparency level of transparency is considered slightly more important than a low level of verifiability. EVALUATING CURRENT PROGRAMS With the relative weights of the criteria and the scoring within criteria set, the existing programs can now be evaluated. Each program is evaluated using the AHP method through a three-step process. First, the program is evaluated against the standards in Table 5 to determine the rating of high, medium, or low in each of the five criteria. Second, the relative weighting of the criteria shown in Figure 15 were multiplied by the scores associated with each ranking shown in Table 6 to get the weighted score for each criterion. Third, the overall score is calculated by adding together the weighted scores for each of the five criteria. This process is shown in Figure 16. Figure 16: Evaluating an Existing Program All scores are based on a maximum score of 1.0, with a theoretical tool achieving ratings of high in each category achieving a perfect score. Similarities in the design of many tools allow them to be grouped into a limited number of âtypesâ of tools. The range of scores, even within a given type, demonstrates how different approaches to the design of tools can produce different results depending on the implementation. Similarly, tools that take different approaches may earn similar scores, as strengths in one area are balanced by weakness in another. After applying this methodology to current tools, four major types of tools can be identified. 51
The first type of tool focuses on producing highly comparable results for a single mode, achieving scores in the range of 0.56-0.60. Examples of this type of methods include the EPA SmartWay program and the BSR CCWG. The consistent system boundary and methods required by participants in these programs, as well as the standardized scoring of carriers, produce results comparable across companies. This comparability is supported by high levels of precision allowed by the carrier-level data supplied by SmartWay and carrier-route-level data produced by the BSR CCWG. These advantages were offset by the lack of breadth offered by programs tailored primarily for single modes (though SmartWay does provide scores for railways in addition to trucking). The second type of tool offers consistent methodologies for all four primary modes, but lack the ability to provide carrier-specific default values or a relative output value such as CO2 per tonne-mile or TEU-km. Tools of this type achieve scores of 0.52-0.59. The use of standardized emissions factors lead to higher verifiability, due to the transparency in their use. This comes at the cost of higher levels of precision, since the results are not based on company or shipment specific data. EcoTransIT94 and the NTM calculator95 are examples of tools that use this type of approach. The third type of tool provides methods for all modes, but offer a lower level of comparability. Tools of this type achieve scores in the range of 0.32-0.44. Examples of this type of tool include the IPCC Guidelines and the GHG Protocol. They provide methods for all the major modes, but only provide average emissions factors that use a tank-to-wheel level of depth. The lack of consistent activity-data based methods and emissions factors limit the ability for different organizations to produce consistent results with the tools. The fourth type of tool is focused on a single mode, but lacks the balanced performance across criteria of higher scoring tools. Tools of this type achieve scores in the range of 0.29-0.45. The EPA MOVES and the GREET tool represent examples of this type. The EPA MOVES tool is capable of producing very detailed emissions calculations, but is focused only on road vehicles and TTW emissions. The large number of factors that can be considered in the model also makes the results less comparable across organizations, as different assumptions regarding inputs can lead to different results. The GREET model is also focused on road vehicles. It uses a WTW depth for a number of different fuel types, but makes use of average vehicle efficiency numbers that lack the precision of other approaches. COMPARABILITY WITHIN AND BETWEEN TOOLS The participants of the workshop expressed a desire for tools that provided a common boundary, allowed for tracking at the carrier level, and provided results that could be used to benchmark across different firms. The widespread support of SmartWay and the CCWG by industry participants, as well as the high scores achieved under this evaluation, provide guidance for the direction of future tools. By incorporating these features and with the participation of industry in the development of these tools the EPA and BSR have produced some of the most successful tools to date. 52
However, the preference for comparability expressed in this evaluation was based on comparability within a tool. Specifically, the focus was on how the results of the tool could be compared across different organizations or time periods. The focus was not on comparability between tools. That focus would be on how comparable the results from different tools are to one another. This is important given the high scores of tools focused on single modes, creating a need for multiple tools each focused on different industries. The issues with comparing between tools can be illustrating by examining the methods used by two of the top scoring tools: the BSR CCWG tool focused on ocean carriers and the EPA SmartWay tool focused primarily on truck carriers. Both tools use a survey approach to assess the performance of individual carriers, but methodological differences between the tools create issues in direct comparison of the results. The EPA SmartWay tool asks carriers to provide information on total fuel consumption, total number of miles traveled, the number of revenue miles charged to the customer, and data regarding average payload. The carrier receives a score in terms of CO2 per mile and CO2 per ton-mile by taking the total CO2, calculated using the fuel data provided, and dividing it by the total number of revenue miles or the total ton-miles, calculated by multiplying the revenue miles by the average payload. The SmartWay program divides carriers into five bins based on their scores, and the publicly reported score for each carrier is the midpoint value for all carriers in the bin. This score is made available to shippers, who can then use the score to estimate their emissions from shipments hauled by each carrier. The shipper enters the total miles or ton-miles of shipments hauled by that carrier, and these are multiplied by the carrierâs score to estimate total emissions. Because the carrierâs score is based on revenue miles rather than total miles, the contribution of empty and out-of-route miles to overall efficiency are accounted for in the estimated emissions. Further, the use of average payload means the ton-miles score represents the actual average utilization. By knowing just the distance between origin and destination, and the weight of the shipment if using a ton-miles score, the shipper is able to get an estimate for emissions that reflect the carrierâs actual average operating performance. This is in contrast to the BSR CCWG methodology. Ocean carriers are asked to provide data on total fuel consumption, total distance sailed, nominal ship capacity in TEUs, and number of reefer plugs. In a similar manner to the SmartWay approach, the total CO2 is calculated from fuel data, and this is divided by the total TEU-km, calculated by multiplying the nominal capacity by the total distance sailed, to calculate a performance metric in terms of CO2 per TEU-km. This can also be calculated for specific trade lanes and for reefer containers. By using nominal TEU capacity the emissions per TEU-km are underestimated, as vessels are not at 100% utilization at all times. The use of total distance sailed also creates complications for shippers who wish to use the performance metrics to 53
estimate the CO2 of ocean shipments. In order to accurately calculate emissions, the shipper must know the actual sailing distance between the origin and destination, but this is dependent on any intermediate ports that may have been visited. The extra sailing distance is essentially out-of-route distance for a shipper trying to move goods directly between the origin and destination, and this will not be accounted for if the shipper uses the direct sailing distance between origin and destination. The differences between the two methodologies mean that the results are not directly comparable with one another. Using nominal capacity as opposed to actual utilization will tend to underestimate emissions for an ocean shipment in comparison to trucking. The shipper must also account for out-of-route distance introduced by intermediate ports when estimating emissions from ocean shipments, further underestimating emissions if this is not accounted for. The lack of comparable standards between modes may not necessarily impact the preference for multiple tools, as the relative carbon efficiency of each mode is generally consistent. However, the challenge for future development is to create a tool that offers the level of comparability offered by mode-specific tools, while also providing a consistent basis for comparison between modes. As of yet no similar tool has been created for the airfreight industry, and shippers may not want to manage using multiple tools. Given the global scope of most supply chains, future tools should be capable of providing multi-modal calculations while delivering the benefits of current mode-specific tools. FUTURE TOOL DEVELOPMENT A tool that provided a consistent set of well-to-wheel emissions factors across all four major modes would achieve a score of medium for both breadth and depth. If the tool was part of an overall program that required a consistent system boundary and guidance for which transportation activities are to be included, and provided a set of performance indicators that measured both total emissions (effectiveness) and relative emissions (efficiency), a score of high could also be achieved for comparability. The tool could be based on transparent, open data and methods that make use of average levels of performance for different fuels, vehicles, and mode types. This tool would receive a score of low for precision and medium for verifiability, for an overall score of 0.74. Alternatively, the tool could follow a similar path to the SmartWay and CCWG tools and collect data from specific carriers and routes. This could be used to achieve a score of medium (for carrier specific emissions factors) or high (for route level emissions factors) in the breadth criterion. This would come at the cost of some level of transparency due to the private nature of the information supplied by the carriers. This would reduce the verifiability score to low. A tool based on this design would achieve a score of 0.78 for providing carrier-level emissions factors or a score of 0.81 for route-specific emissions factors. In the next chapter we discuss developing a work plan for a tool that would be capable of providing these capabilities. 54
91 Saaty, T. L. (1990). "How to make a decision: the analytic hierarchy process." European Journal of Operational Research 48(1): 9-26. 92 Dyer, R. F. and E. H. Forman (1992). "Group decision support with the analytic hierarchy process." Decision Support Systems 8(2): 99-124. 93 Saaty, T. L. (1986). "Absolute and relative measurement with the AHP. The most livable cities in the United States." Socio-Economic Planning Sciences 20(6): 327-331. 94 http://www.ecotransit.org/calculation.en.html 95 http://www.ntmcalc.org/index.html 55