National Academies Press: OpenBook

Guidebook for Developing Subnational Commodity Flow Data (2013)

Chapter: Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation

« Previous: Chapter 4.0 - Developing Subnational Commodity Flow Data Using Supplemental Sources of Local Economic Activity
Page 116
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 116
Page 117
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 117
Page 118
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 118
Page 119
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 119
Page 120
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 120
Page 121
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 121
Page 122
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 122
Page 123
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 123
Page 124
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 124
Page 125
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 125
Page 126
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 126
Page 127
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 127
Page 128
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 128
Page 129
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 129
Page 130
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 130
Page 131
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 131
Page 132
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 132
Page 133
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 133
Page 134
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 134
Page 135
Suggested Citation:"Chapter 5.0 - Developing Subnational Commodity Flow Data Using Disaggregation ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 135

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

116 Developing Subnational Commodity Flow Data Using Disaggregation 5.1 Introduction This section provides an examination of how to develop subnational commodity flow data using data disaggregation. The disaggregation of freight flow data is the process of taking a preexisting freight flow database and dividing it into further detail to generate a more refined freight flow database. Disaggregation can occur across any dimension of the database (e.g., commodity, geography, or season). For freight flow databases, disaggregation most commonly occurs for geography. Transportation planners often take national-level freight flow data and generate county-level or more refined freight flow data to assist in freight planning efforts. Another common occurrence is travel demand modelers purchasing county-level proprietary freight flow data and disaggregating them to the TAZ level as part of the development of truck trip tables. The primary pieces of information needed to conduct disaggregation of commodity flow data are an aggregated commodity flow database (e.g., FAF or TRANSEARCH), defined boundar- ies of the disaggregation region, and a disaggregation variable that is defined at the level of the disaggregated region. The same process would be used for disaggregating both FAF and TRANSEARCH. This Guidebook identifies the following five steps for administering a commodity flow data disaggregation technique: • Step 1—Identify aggregated commodity flow database • Step 2—Determine geographic boundaries for disaggregated database • Step 3—Specify time dimension for disaggregated database • Step 4—Select disaggregation variable • Step 5—Select and implement disaggregation technique Many of these steps are interrelated, but the Guidebook discussion of each step is ordered as shown in the above bulleted list. The description of each step is structured to focus on the following four key elements described in the Playbook (Chapter 6.0): 1. Key Considerations—A brief description of the main issues encountered and tradeoffs that will need to be made for the step. 2. Implementation Process—A detailed description of how to implement the step. 3. Example—An example of how this step has been implemented in other studies. Note that Steps 1, 2, and 4 are sufficiently straightforward such that no examples are needed. Step 3 shows an example for disaggregating commodity flow data along a time dimension. Examples of the full disaggregation technique are provided in Step 5. 4. User’s Guide Worksheet Punch List—Simple bulleted instructions that Guidebook users can check off to ensure that they have implemented each of the major steps involved in conducting a commodity flow data disaggregation. C h a p t e r 5 . 0

Developing Subnational Commodity Flow Data Using Disaggregation 117 Each of these four elements focuses on different aspects of data disaggregation. For transpor- tation agencies that are considering hiring a contractor to disaggregate FAF or TRANSEARCH data, reading the “Key Considerations” section of each step will likely provide enough infor- mation for the generation of an RFP on the topic. Transportation agencies that want to under- stand the details of how to conduct a commodity flow data disaggregation should focus on the “Implementation Process” sections in addition to the “Example” section. The “Example” section also will provide specific reference to efforts that have been conducted in other regions that can be compared to what already has been done in the agency’s region and to responses to an agency’s RFP. After transportation agencies have a sufficient background in all of the aspects related to conducting a commodity flow disaggregation, the “User’s Guide Worksheet Punch List” sections can be used to walk the agency through all of the specific steps that need to be done to disaggregate a database. This section also can be compared to responses to an agency’s RFP. 5.2 Step-by-Step Process for Disaggregating Commodity Flow Data This section provides a comprehensive examination of the steps involved in developing sub- national commodity flow data by disaggregating a preexisting commodity flow database. This section provides a detailed description of all of the necessary steps, including addressing relevant implementation issues that a typical regional planning office, local agency, or state department of transportation may experience when considering and implementing a data disaggregation. Step 1—Identify Aggregated Commodity Flow Database Key Considerations The databases that most often serve as a starting point for disaggregation are the FHWA FAF database and the Global Insight TRANSEARCH database. The FAF database is publicly available at no cost. Its primary limitation tends to be that it has 114 predefined geographic regions that are not refined enough to match with the full spectrum of many transportation agencies’ freight planning appli- cations. TRANSEARCH is a proprietary commodity flow database that is often purchased at the county level, which provides much more refined geographic data than FAF. However, the process for developing TRANSEARCH is not as rep- licable as FAF, and TRANSEARCH’s estimates do not include statistically validated confidence intervals. Implementation Process The primary action in this step is to review the features of the two most commonly used com- modity flow databases: the FHWA FAF database and TRANSEARCH. FAF includes estimates of the weight and value of commodity flows by origin, destination, commodity, and mode for a base year of 2007 and a forecast year of 2040. FAF data are a prime candidate for disaggregation because this database provides comprehensive geographic and commodity coverage of com- modity flows and is publicly available. The currently available version of FAF covers all flows for 123 regions (major metropolitan areas and balances of states), 17 additional metropolitan areas that serve as major international gateways, and 7 international regions. Disaggregation of FAF

118 Guidebook for Developing Subnational Commodity Flow Data data often includes allocating freight flows from these default regions to regions that are com- patible with state and metropolitan-level planning regions such as counties, zip codes, or TAZs. FAF data can be obtained at http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm. The FAF 2007 base year database is built entirely from public data sources. The starting point for FAF is the BTS CFS. The CFS is an establishment survey of 100,000 businesses in the United States that is then statistically expanded to estimate freight flows across 43 commodities and all freight modes. This survey is conducted every 5 years with the most recently released data being from 2007. The 2002 survey included only 50,000 businesses. As discussed in Chapter 1.0 of the Guidebook, while the CFS is the most comprehensive establish- ment survey conducted in the United States, there are known sources of inaccuracy when consid- ering the CFS as a comprehensive commodity flow database. These inaccuracies include business categories that are not surveyed, shipment types that are not included, and cells that are estimated. Business types that are not included in the CFS sample are construction, service establish- ments, retail, farm-based businesses, logging, printing and publishing, retail, and household and business moves. Not including these establishments leads to an undercounting of commodity flows in the CFS. A study by the Oak Ridge National Laboratory (ORNL) completed in 2000 estimated that the 1997 CFS captured only 75 percent of total U.S. freight shipments measured in tons, 74 percent when measured in ton-miles, and 81 percent when measured in value. The 2002 CFS was estimated to have captured only 54 percent of total U.S. freight shipments when measured in tons, 67 percent in ton-miles, and 63 percent in value. FAF adjusts for this under- counting by developing estimates of these other sources through other means. Table 5.1 shows the process used for estimating the freight flows for each of the sectors missing from the CFS. Note that VIUS refers to the U.S. Census Bureau Vehicle Inventory and Use Survey. This database Out-of-Scope CFS Business Sector National to State Allocations Commodity Used for State Allocation State to County Allocation (Origin) State to County Allocation (Destination) Construction VIUS Industry Sector All CBP Sector Employment CBP Sector Employment Services VIUS Industry Sector All CBP Sector Employment CBP Sector Employment Retail VIUS Industry Sector All CBP Sector Employment Population Farm-Based VIUS Commodity Animals Value in Farm Sales (USDA) CBP Animal Slaughtering and Processing Employment Farm-Based VIUS Commodity Cereal Value in Farm Sales (USDA) CBP Grain and Oil Seed Milling Employment Farm-Based VIUS Commodity Other Agriculture Value in Farm Sales (USDA) CBP Food Mfg Employment Logging VIUS Commodity Logs and Other Wood Round Wood Production (NFS)a CBP Wood Products Employment Printing CBP Industry Employment Printed Materials CBP Industry Employment Population Fisheries CBP Industry Employment Live Fish CBP Industry Employment CBP Seafood Products Employment aNFS = National Forest Service. Source: “Out-of-Scope” reports developed for FAF by Macrosys and ORNL. Table 5.1. Variables for state and county allocations for CFS out-of-scope shipments.

Developing Subnational Commodity Flow Data Using Disaggregation 119 includes information on commercial vehicles registered by state along with truck type and oper- ating characteristics for each state. VIUS was discontinued after the 2002 publication. The CFS also only requests information on outbound shipments. This is sufficient for cap- turing domestic flows, since both the origin and destination occur within the CFS survey area. However, there is no survey coverage of import flows, because these shipments originate in foreign countries outside of the sample area covered by the CFS. Additionally, there are several origin-destination-commodity-mode combinations that are not captured in the CFS, which is typically due to relatively smaller amounts of freight being moved and the randomness of the collected data. FAF compensates for these “zero cells” by estimating these combinations using other techniques. For example, although CFS information is not avail- able for fertilizer shipments from Iowa to Memphis, Tennessee, CFS information is available on total fertilizer shipments from Iowa to all other CFS regions and for all commodities shipments from Iowa to Memphis, Tennessee. Using these broader shipment patterns, the fertilizer shipments from Iowa to Memphis are interpolated, and this estimate is provided within the FAF database. Each of these estimation procedures has its own set of impacts on disaggregating FAF data. FAF data that are developed based on the expanded CFS sample data are the ideal type of com- modity flow data to disaggregate as there are actual raw shipment data to support the estimates. FAF data that are estimated to fill in missing cells in the CFS survey may be problematic if they are used as the starting point of additional disaggregation, because there are no raw data to support the estimates. These estimated cells tend to have small shipment volumes and further disaggregating them can lead to potentially large errors. FAF data that are the result of supplementing missing sectors in the CFS through the use of publicly available sources also presents issues. FAF estimates these missing sectors through a combination of economic data and allocation to smaller geographies using employment data. Further disaggregating these estimated data intensifies the potential for errors. Many transportation agencies acquire commodity flow data by purchasing the Global Insight TRANSEARCH database. The methodology for creating the TRANSEARCH database is con- stantly evolving, but it has typically involved a mix of extrapolated survey data, economic output analysis, supply chain analysis, and geographic allocation. The database is typically purchased using county-level geographies, but also can be purchased at smaller levels such as zip codes or Census blocks. To provide subcounty-level commodity flows, TRANSEARCH tends to rely heavily on disaggregation. Therefore, the issues described related to FAF disaggregation also are applicable to TRANSEARCH subcounty commodity flows. Example No example is provided for this step. A full example of commodity flow disaggregation is provided in Step 5. Note that snippets of the FAF database can be found in Section 1.3. The full database can be found at http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm. Examples of the TRANSEARCH database can be found by contacting Global Insight. User’s Guide Worksheet Punch List • Review key features of FAF and TRANSEARCH database. Consider tradeoffs inher- ent in both databases. • Review any other available commodity flow databases for the region of interest. • Select aggregated commodity flow database.

120 Guidebook for Developing Subnational Commodity Flow Data Step 2—Determine Geographic Boundaries Key Considerations There are two geographies to consider when conducting a commodity flow dis- aggregation. First, is the desired geographic unit the most conducive for freight planning activities? For a truck-rail diversion analysis, county-level freight flows are likely to be sufficient. However, for travel demand modeling, the desired geographic unit is often the TAZ. Other potential geographic units include zip codes, BEA regions, or entire MPO regions. However, geographies that change over time, such as MPO boundaries, create additional estimation challenges. The second type of geographic consideration is the geography for which disaggrega- tion variables are available. In some instances, the size of the geographic area can be used as a disaggregation variable. Typically, the disaggregation variable is some type of socioeconomic data or vehicle activity data. Regardless of which disaggregation technique is selected, the accuracy of each technique decreases as the geographic scale decreases. Implementation Process To determine which geographic area should be used for a data disaggregation, consideration must be given to the types of freight planning efforts that are likely to be supported by the dis- aggregated data. Table 5.2 provides a list of freight planning activities and commonly used levels of detail needed in disaggregated commodity flow data to support the planning activity. However, consideration should be given to the specific nature of the freight planning activity conducted in each region. In most cases, the geographical context also is constrained by existing governmental or administrative boundaries such as state, metropolitan region, urban area, or local planning region. Disaggregation requires some relationship to be developed between the aggregate data and the subnational data desired. Establishing this relationship typically involves identification of one or more activity variables, which limits to some extent the geographical disaggregation to those levels for which the activity variable is reported or available. Disaggregation to geographic areas with limited or no information is difficult and therefore constrains the degree to which an accurate and reasonable disaggregation may occur. Freight Planning Activity Level of Commodity Flow Disaggregation Commonly Used Travel Demand Modeling TAZ Trading Partner Analysis State, MPO, or metropolitan region Truck-Rail Diversion Analysis County Long-Haul Corridor Analysis County Short-Haul Corridor Analysis TAZ or other subcity zone system such as zip codes State-Level Freight Plan County MPO or County-Level Freight Plan TAZ or other subcity zone system such as zip codes Table 5.2. Level of commodity flow disaggregation commonly used for freight planning activities.

Developing Subnational Commodity Flow Data Using Disaggregation 121 It is generally recommended that a geographic scale be selected that is as large as possible while still sufficiently refined to meet the planning objectives of the transportation planning agency. This recommendation is made because the accuracy of the data disaggregation decreases as the geographic scale decreases. Additionally, further disaggregation can occur at a later date if future planning applications require more geographic detail. Example No example is provided for this step. A full example of commodity flow data disaggregation is provided in Step 5. User’s Guide Worksheet Punch List • Identify current freight planning applications for the region of interest. • Determine the geographic level of data disaggregation needed for each application. • Select the largest level of geography needed to meet all current freight planning applications. • Note that this step will be revisited following Steps 3 and 4 related to the time dimension and disaggregation variables needed for this analysis. Step 3—Specify Time Dimension Key Considerations The time dimension is closely related to the geographic scale desired and the freight planning application being considered. The time dimension also influ- ences the accuracy of data disaggregation. Given the dynamic nature of freight flows over time, significant inaccuracies may result if the aggregate data are reported using different time segments than the desired time segmentation of the disaggregated data. A typical example of a time dimension mismatch is when national freight flow data report annual flows, while monthly, weekly, daily or hourly flows are desired for the disaggregated data. Another aspect of the time dimension is the need to forecast disaggregated commodity flow data for the future. A smaller set of disaggregation variables is available to support forecasting activities. It also is important to account for any seasonal distortions that may be present in the aggregated data set or disaggregation variable when considering the time dimension. For example, a commodity flow database con- structed by expanding establishment survey data that were collected in a single season will have seasonal bias built into the database. The CFS database was developed using 1 week of data in each of the four seasons in an attempt to minimize seasonal bias in the data set. Implementation Process For the vast majority of freight planning applications, annual disaggregated freight flow data are sufficient. This is consistent with the annual format of most national freight flow data. One notable exception is truck trip tables for travel demand models, which tend to require average

122 Guidebook for Developing Subnational Commodity Flow Data daily or average weekday estimates of truck trips and often even estimates of truck trips by time period. Another potential exception is if peak flows are desired. Freight flows in different regions and different commodities peak at different times of the day, week, and year, so care must be taken in developing conversion factors for moving from an annual database to a more temporally refined database. Potential sources for temporal conversion factors include permanent vehicle classification count data and interviews with local shippers and carriers. The example provided below illustrates how a commodity flow database can be disaggregated along the dimension of time using a combination of classification count data and stakeholder outreach. Example As a hypothetical example of disaggregating commodity flow data along the time dimension, consider an MPO that wants to know the peak daily and hourly truck volumes from a recently expanded intermodal railyard in its area. As a first step, the MPO contacts the railroad that oper- ates the railyard for some information, and the MPO is provided with the following: • Approximately 300,000 containers will be moved at full build-out in 2020 • 90 percent of the containers move during the week • Monthly fluctuations prior to the intermodal yard expansion are as shown in Table 5.3 • Daily and hourly fluctuations are roughly the same throughout the year • This situation is similar to a very simplified commodity flow database with two regions (the intermodal yard and an external region), one mode (trucking), and one commodity (inter- modal goods) and an annual flow between regions of 300,000 units • Using the information provided by the railroad, the MPO decides to conduct truck counts just outside the intermodal railyard gates for 1 week during the month of April. The hourly flows from these counts are shown in Table 5.4. The first step is to calculate the peak monthly volume. The 300,000 annual container estimate is multiplied by the peak monthly volume in September of 13 percent (see Table 5.3). This provides an estimate of 39,000 containers during the peak month. This is then divided by 4.3 (52 weeks divided by 12 months) to get the average number of weeks in a month. Therefore, the Month Percent of Traffic January 4% February 4% March 6% April 8% May 9% June 10% July 10% August 11% September 13% October 11% November 9% December 5% Total 100% Table 5.3. Hypothetical estimate of monthly percentage of activity at intermodal railyard.

Developing Subnational Commodity Flow Data Using Disaggregation 123 peak week container volume is 9,070. Table 5.4 indicates that the peak day of the week is Tuesday with 1,070 (23 percent) of the total 4,635 weekday trucks during the 5 days of data collection in April. Therefore, the peak daily trucks also is on a Tuesday with peak daily flows being 9,070 multiplied by the 90 percent of trucks during the week and then multiplied by 23 percent for the trucks that typically occur on a Tuesday. This amounts to an estimate of 1,877 trucks during the peak day in September in 2020. The hourly data collection shown in Table 5.4 indicates that peak hourly flows occur between 2:00 p.m. and 3:00 p.m. on Tuesdays with 14 percent of the daily flows occurring during this hour. Therefore, the peak hourly volume estimate is 1,878 multiplied by 14 percent, or 263 trucks. User’s Guide Worksheet Punch List • Identify current freight planning applications for the region of interest. • Determine the time dimension requirements for each application. • Select the smallest unit of time needed to meet all current freight planning applications. • Note that this step will be revisited following Step 4, which is related to dis- aggregation variables. Step 4—Select Disaggregation Variable Period Monday Tuesday Wednesday Thursday Friday Total 8 a.m. to 9 a.m. 50 85 80 60 90 365 9 a.m. to 10 a.m. 65 90 95 65 85 400 10 a.m. to 11 a.m. 75 120 130 95 80 500 11 a.m. to 12 p.m. 100 125 125 110 95 555 12 p.m. to 1 p.m. 50 55 45 65 35 250 1 p.m. to 2 p.m. 125 140 135 135 95 630 2 p.m. to 3 p.m. 125 145 130 130 100 630 3 p.m. to 4 p.m. 65 95 75 100 50 385 4 p.m. to 5 p.m. 65 90 80 95 55 385 5 p.m. to 6 p.m. 90 125 105 140 75 535 6 p.m. to 8 a.m. 0 0 0 0 0 0 Total 810 1,070 1,000 995 760 4,635 Table 5.4. Hypothetical two-way truck count data from intermodal railyard. Key Considerations Disaggregation variables are activity variables that are used as a proxy to represent economic activity and freight flows at the desired subnational level. Some of the most common disaggregation variables are employment, output, warehouse space,

124 Guidebook for Developing Subnational Commodity Flow Data Implementation Process To select a disaggregation variable, a thorough review of activity variables available at different geographies should be conducted for the region. As mentioned above, some of the most com- mon variables that are used for disaggregation include employment, output, warehouse space, population, revenue, personal income, and geographic area. However, less commonly used variables can be considered depending on the specific commodities being disaggregated. For example, the location and size of landfills can be used to disaggregate commodity flows of waste materials. In this manner, the local data sources that are discussed in Chapter 4.0 can generally be used for disaggregation of national commodity flow data. The most common disaggregation variables are discussed below. Employment. The number of employees is the variable most commonly used to disaggre- gate national data, due to its availability at the local, regional, and state level and the availabil- ity of forecast employment data in most regions. Total employment is available from Census data, and the number of employees per industrial classification is often available from the state department of revenue at the county or zip code level. Travel demand models typically have estimates of employment at the TAZ level. Traditionally, the relationship between employment and shipments has been assumed to be linear. However, recent research indicates that for several industries, large companies have fewer shipments per employee than smaller industries (Holguín-Veras and Ban 2010). This may be a result of efficiencies of scale that are achieved at larger companies relative to their logistics practices. There also are several reasons why the number of employees may not be a very accurate prediction of freight flows or shipment activity including the following: • There is sometimes a lack of correlation between employment and output. Some companies within the same industry may rely more on equipment than on personnel for their output, which can generate widely different output-per-employee rates. A recent research paper con- ducted a statistical analysis of employment relative to tonnage output by commodity using the FAF data set. The correlation was found to vary widely, with some industries having a high correlation (as reflected in a high R2 value) and others having a low correlation. Table 5.5 shows the correlation by industry. population, revenue, personal income, and geographic area. When using a disag- gregation technique to develop subnational freight forecasts, it also is necessary to have a forecast of the disaggregation variable. For example, if employment is used as the disaggregation variable, then there must be a forecast of employment in the forecast year to develop a forecast of freight flows for the study area. The lim- ited availability of forecasted disaggregation variables is often the primary factor in identifying an appropriate disaggregation variable. For example, the difficulty in developing a long-term forecast for warehousing space at the geographic level consistent with most study areas precludes warehouse square footage from being a useful disaggregation variable. The selection of the disaggregation variable also significantly impacts the accu- racy of the disaggregation, because for each commodity the variable is only partially relevant or there are other influences that are not incorporated into the model and these relationships are likely to change over time. Depending on the availability of activity variables at different geographies, it may be necessary to conduct the disaggregation in multiple steps.

Developing Subnational Commodity Flow Data Using Disaggregation 125 • There also is often a wide variation of companies within a single industry even at refined levels of industry codes. For example, NAICS code 326111 is for Plastic Bag and Pouch Manufactur- ing, while NAICS code 326122 is for Plastics Pipe and Pipe Fitting Manufacturing and NAICS code 326211 is for Tire Manufacturing. All of these activities are within the same three-digit NAICS code of 326, but are likely to have very different output-per-employee rates. • There can be a “headquarters” issue, in which employees are recorded in one location while the freight activity associated with their employment occurs in a different location. In the agricultural industry, it is common for employees to be recorded in an urban area where the company’s SCTG Dependent Variable Explanatory Variable(s) R2 20-23 Variousa Chemical mfg 11% 10-15 Variousb Mining (except oil and gas) 13% 9 Tobacco Products Beverage and tobacco product mfg 15% 1 Live Animals/Fish Support activities for agriculture and forestry 17% 16 Crude Petroleum Oil and gas extraction 21% 38 Precision Instruments Miscellaneous mfg 34% 24 Plastics/Rubber Plastics and rubber products mfg 43% 2 Cereal Grains Food mfg, farm acres 48% 8 Alcoholic Beverages Beverage and tobacco product mfg 50% 39 Furniture Furniture and related product mfg 56% 4 Animal Feed Support activities for agriculture and forestry 60% 31 Nonmetallic Mineral Products Nonmetallic mineral product mfg 61% 6 Milled Grain products Food mfg 62% 19 Other Coal and Petroleum Products Oil and gas extraction, petroleum and coal products mfg 62% 34 Machinery Fabricated metal product mfg, machinery mfg 63% 40 Misc. Manufactured Products Miscellaneous mfg 64% 3 Other Agriculture Products Food mfg, farm acres 65% 33 Articles of Base Metals Fabricated metal product mfg 65% 25 Logs Forestry and logging, support activities for agriculture and forestry, wood product mfg 70% 35 Electronic and Electrical Machinery mfg, computer and electronic product mfg, electrical equip and appliance and component mfg 70% 27 Newsprint/Paper Forestry and logging, printing and related activities 73% 30 Textiles/Leather Textile mills, textile product mills 73% 36,37 Various Transportation equipment manufacturing 74% 7 Other Foodstuff Food mfg, chemical mfg 75% 26 Wood Products Wood product mfg 75% 32 Base Metals Primary metal mfg, machinery mfg 75% 18 Fuel Oils Petroleum and coal products mfg 77% 28 Paper Articles Paper mfg, printing and related activities 81% 17 Gasoline Petroleum and coal products mfg 83% 29 Printed Products Paper mfg, printing and related activities 85% 5 Meat/Seafood Food mfg 86% 41 Waste and Scrap NAICS 115, 221, 321–327, 331–339 86% 43 Mixed Freight NAICS 321-327, 481, 483–488, 492–493 86% a SCTG 20-23 is Basic Chemicals, Pharmaceutical Products, Fertilizers, and Chemical Products and Preparations n.e.c. b SCTG 10-15 is Monumental or Building Stone, Natural Sands, Gravel and Crushed Stone, and Nonmetallic Minerals n.e.c. Table 5.5. Correlation between employment and tonnage output by industry (based on FAF data).

126 Guidebook for Developing Subnational Commodity Flow Data administrative activities occur while the freight-related aspects of the employee’s employment often occurs hundreds of miles away in a rural location. • At finer geographic levels, there are typically fewer employment categories available than desired. For example, at the TAZ level, employment is frequently at the one-digit SIC level. At the county level, several of the two-digit employment categories that are publicly available through the U.S. Census Bureau are suppressed due to privacy or accuracy concerns. It is often necessary to extrapolate commodities based on limited employment specificity. • Employment databases are typically industry based, while freight flow databases are typically commodity based. Therefore, companies that produce goods in multiple commodity catego- ries are typically classified as being associated with a single industry. For example, Hewlett- Packard sells a combination of electronic hardware, electronic software, and information technology services. However, if the company is listed as only being in electronics, there is a danger that all of the employees within the company will be misrepresented as being involved in freight-intensive activities. Additionally, the industry codes in employment databases and the commodity codes in commodity flow databases are not always a one-to-one match, thereby generating the need to shift goods between categories that are not ideally compatible. • There is often a need to consider matching both inputs and outputs to employment catego- ries. For example, a food manufacturing employee can reasonably be associated with food manufacturing output. However, the inputs associated with food manufacturing are a combi- nation of other manufactured foods, raw agricultural products, and other commodities likely in smaller quantities. This matching of inputs to output categories can be handled through the application of input-output factors to industry categories, as described in Chapter 1.0. • The number of employees needed 20 years ago to produce a car, computer chip, or potato chip is likely not the same as it is today or will be 20 years in the future. A productivity factor is recommended for employment forecasts to account for additional goods produced per employee in many industries. Revenue. Revenue data are typically available from the state department of revenue. These data are typically available at the state level and by industry or even establishment type. Sometimes revenue data are available by county. Revenue has many of the same drawbacks as employment as a disaggregation variable. The issues can be further exacerbated when using revenue because there are several bulk commodities that generate low-revenue sales, but have high tonnage amounts. Population. Population also is a widely used disaggregation variable, due to its availability and accessibility at various geographic scales. The U.S. Census Bureau provides very detailed population estimates for cities, towns, urban areas, counties, and zip codes making population easily applicable for subnational commodity flow data disaggregation. Travel demand models typically have employment estimates at the TAZ level. Population can be an accurate reflection of the consumption of select commodities that are typically sold at the retail level, including gasoline, food products, and other products that are transferred to retail stores through ware- houses and distribution centers. However, as with other activity variables, commodity flows are generated and affected by more than population. In particular, there are several production activities that are barely related to population, and the consumption of industrial commodi- ties is typically not well correlated to population. Developing different relationships between population and commodity flows by zoning category (residential, commercial, manufacturing, construction, etc.) may help mitigate these inaccuracies. Personal Income. Income data also are available from the U.S. Census Bureau. This activ- ity variable is available at various subnational geographies such as state, city, county, BEA zone, zip code, and urban/metropolitan area. Income is an indicator of per capita purchasing power and can impact the specific type of freight activity in a region. Personal income as a disaggregation variable has most of the same drawbacks as population, especially in being an indicator of consumption and much less linked to production.

Developing Subnational Commodity Flow Data Using Disaggregation 127 Output (Gross Domestic Product). Gross domestic product, or output, is another activ- ity variable that may be used to disaggregate national data. It is geographically limited to either state or BEA zone, thus is not applicable where more localized disaggregation is sought. At the county level, many of the output data are suppressed for proprietary reasons. However, while between and within industries there may be significant variation in the relationship of output and freight activity, across all industries increases in output should translate into increases in freight activity. Using output as a disaggregation variable generates similar concerns to those generated using revenue or employment—issues with industry and commodity classification, headquarters issues, and issues with matching inputs to outputs. Warehouse Space (Square Feet). The amount of warehousing space also may be used as an activity variable for disaggregation of national freight data. This information is often avail- able through the state department of revenue and is used as an indicator of freight activity and intensity. This indicator variable works best for commodities with supply chains that feature warehouse usage. To develop commodity-specific disaggregation factors, it may be necessary to contact warehouse operators to determine the types of goods that are stored at the facility. One of the more challenging considerations of data disaggregation and, subsequently, the cal- ibration of disaggregated data, is the degree to which data disaggregation accurately represents reality. The degree of accuracy or inaccuracy is often difficult to assess given that a thorough understanding of subnational commodity flows at a local and regional level is often limited or unknown and comparisons to flows from disaggregated national data are difficult in this infor- mation vacuum. The issue of accuracy is further complicated by the fact that the national data source itself is often an estimation with various geographical and spatial limits. As discussed in Chapter 4.0, accuracy can be improved by validating estimates from a variety of published local data or primary data collection efforts. Example No example is provided for this step. A full example of commodity flow data disaggregation is provided in Step 5. User’s Guide Worksheet Punch List • Review available disaggregation variables in the region of interest. • Select the appropriate disaggregation variable for each commodity in your desired database. • Review Steps 1 through 3 to determine whether changes are needed in the aggregated database used, the geographic boundaries of the region of interest, or the time dimension used for the disaggregated database. Step 5—Select and Implement Data Disaggregation Technique Key Considerations There are several disaggregation techniques that are available for consideration. The simplest of the techniques are easy to implement, but do not allow for consideration of all available freight flow and economic information. The more complex methods can require significant resources to implement, but allow for

128 Guidebook for Developing Subnational Commodity Flow Data Implementation Process There are four disaggregation techniques that are discussed in this section: (1) geographic allo- cation, (2) regression methods, (3) iterative proportional fitting, and (4) input-output methods. Geographic Allocation. This technique involves disaggregating commodity flows to smaller geographies based on features of each of the smaller geographic units. In the simplest sense, one can imagine taking a county-level commodity flow database, dividing up a county within that database into two equally sized halves, and then allocating half of the full county’s commodity flows to each of the half counties. This new allocation represents the subcounty commodity flow database. Other, more complex, examples revolve around this basic concept and can include the following: • Allocating commodity flows to subnational levels based on the physical size of the disaggre- gated geographies • Allocating commodity flows based on some socioeconomic data variable within each of the subregions such as total employment or employment in all freight-related sectors • Allocating commodity flows to the subregional level based on industry-specific or commodity- specific activity data within each of the subregions For each of these examples, the commodity flows are allocated to the subregions based on unique characteristics of each subregion. Regression Models. Regression models are used to establish a statistical relationship between two or more variables. Regression analysis includes a dependent variable that is the function of different levels of independent variables. For the purpose of developing subnational commodity flow databases, the independent variables are the commodity flows for a subregion typically defined by commodity type. The dependent variables are unique characteristics that are available at the subregional level. The regression analysis develops mathematical equations that define the commodity flows that are likely to occur for varying values of the dependent variables. Using the analysis conducted in a report prepared for FHWA, it is possible to apply regres- sion equations to the FAF database and to establish relationships between commodity flows and several different combinations of dependent variables (Cambridge Systematics 2009). In the report, the independent variables were tonnage output by two-digit SCTG commodity code. The dependent variables included industry-specific employment, population, and other industry- specific factors where relevant such as farm acres and oil/gas extraction. The regression was run for each of the 89 domestic zones that are part of FAF. The regression was used to identify the best variable(s) to predict the tonnage flows for each commodity. The best fit variables and level of confidence in the relationship are shown in Table 5.5. As the table demon- strates, for some commodities, a strong statistical relationship between commodity flows and use of a broader range of data sources to shape the disaggregated commodity flow database. From simplest to most complex, the techniques are geographic allocation, regression models, iterative proportional fitting, and input-output methods. When multiple rounds of disaggregation are used to develop a sub- national commodity flow database (e.g., FAF region to county, then county to zip code), more than one of these methods may be incorporated. Alternatively, it is possible that different techniques could be used for different commodities within a single round of disaggregation.

Developing Subnational Commodity Flow Data Using Disaggregation 129 socioeconomic activity variables was identified. For other commodities, there was no variable that was found to have a strong predictive relationship with commodity flows. The relationships established through this process can be used to estimate tonnage flows at the subnational level. The estimates can actually be done for any geographic level for which the independent variables can be identified. In the report prepared for FHWA, commodity flow estimates were developed at the county level using the regression results and county-level activity data (Cambridge Sys- tematics 2009). Regression methods are essentially similar to the methods discussed in the establishment survey section that are used to extrapolate commodity flows from sampled companies to the universe of companies in a region. See Chapter 2.0 for a description and examples of this process. Iterative Proportional Fitting. The most common approach to disaggregating national freight data is to apply some form of the iterative proportional fitting process, first developed by William Edwards Deming and Frederick F. Stephan (1940). This technique is ideal for two-dimensional tables where the marginal (column and row) totals are known (or estimated through an activity variable), but the distribution throughout the matrix is unknown. For a commodity flow database, the columns and rows correspond with origins and destina- tions. The totals of commodity flows for an origin at a large aggregation level are known, while the commodity flows for subregions would not be known. Iterative proportioning can be used to develop estimates of flows at the subregional level. If the subregional commodity flow database has I rows and J columns, then we can create an I x J table. Assuming independence amongst the origin-destination freight flows at the subregional level and some multinomial distribution of freight flows between subregions, it can be estimated that the flow between a specific origin i and a specific destination j is mij, where mij = aibj for all i and j. The percentage of the total commodity flow in row I is ai, while the total commodity flow in column J is bj.    (Eq. 10) 2 1 2 2 2 2 1 m m x m ij n ij n i ik n k J∑= ( ) ( ) ( ) − − + − =    (Eq. 11) 2 2 1 2 1 1 m m x m ij n ij n j kj n k I∑= ( ) ( ) ( ) − + − = Notice that the row and column totals are constant, denoted by (Eq. 12)x xi ijj∑=+ (Eq. 13)x xj iji∑=+ This particular form of the iterative proportional fitting model is often used at the national level for completing unknown cell values for which historical values or values from other cells are available, and mˆij is estimated using either a maximum likelihood estimation technique or utilizing a log-linear model (Lee and Viele 2001). Unfortunately, at the subnational level, historical table values are not always available, and esti- mation using one of these approaches is difficult. Therefore, what is usually applied is a simplified case of the model above assuming quasi independence and resulting in a two-step factor estima- tion process where the initial bˆj (0) = 1 and n ≥ 1 and the resulting cells are estimated such that: ˆ ˆ (Eq. 14)1a x b i n i j n j∑ = ( ) ( ) + −

130 Guidebook for Developing Subnational Commodity Flow Data ∑= ( ) ( ) + b x a j n j i n i ˆ ˆ (Eq. 15) Input-Output Methods. One approach to developing disaggregated data is through input- output analysis. Input-output analysis is the process of relating the quantity and type of products produced at a given location to the quantity and type of products supplied to the location. The location can be a single facility, a collection of unrelated facilities within a region, or a specific industry within a region. The theory of input-output analysis is that the relationship between inputs and outputs is relatively constant across different geographies and time periods, and therefore inferences can be made about both the inputs and outputs if just one of the factors is known at a single location. Input-output data are publicly available from the U.S. Department of Commerce Bureau of Economic Analysis for the year 2002 at http://www.bea.gov/industry/io_benchmark.htm. The data available include commodities that are generated by specific industries in what are termed “make tables.” It also includes commodities that are consumed by specific industries in what are termed “use tables.” Make tables tend to be straightforward in that industries tend to produce only one or two commodities that are directly affiliated with their industry. For example, the forestry and logging industry in 2002 was found to produce $18.9 million of crop products and $32.0 billion of forestry and logging products, while producing none of the other 138 commodi- ties that are tracked in the summary database. An industry as diverse as basic chemical manufac- turing produces 22 different commodities, but 94 percent of their production by value is in two commodities: basic chemicals (82 percent) and resins/rubber/artificial fibers (12 percent). Note that the detailed input-output database contains thousands of commodities and may be useful for analysis of very specific goods. Use tables show the commodities consumed by a specific industry and reflect much higher levels of diversity. Table 5.6 shows the use table for three industries: crop production, forestry and logging, and basic chemical manufacturing. The value in the top row of the crop industry column shows that the U.S. crop industry purchased $893 million of basic chemicals in 2002. In total, the crop industry purchased items from the 16 commodities shown in Table 5.6 and from the 64 commodities that have been combined into the “other commodities” category. Four of these commodities represent over 10 percent of the total commodities purchased by the crop industry. The forestry and logging industry has a more consolidated use table than the crop industry, but it still purchased 7 different commodities listed in the table and another 47 that have been combined into the “other commodities” category. The basic chemicals industry purchased 136 different commodities, including over 100 that have been combined in the “other commodities (products and services)” category. The term “commodities” is used to refer to all items that are purchased by companies in these industries. For example, included in commodities is everything from petroleum and coal prod- ucts to truck transportation to real estate. To use an input-output make or use table for a com- modity flow input-output analysis, it is necessary to remove all of the items that would not be listed as a commodity in the SCTG format or other similarly structured commodity codes. Then, the dollar value of commodities used and made by each industry can be converted to tonnages using ton-value ratios from a source such as the BTS CFS or local sources, if available. The ton- nage values can then be used to understand the quantities and types of input commodities needed to produce a unit of output of each commodity. As mentioned above, these input-output rela- tionships can then be applied to individual facilities, a group of companies, or an entire industry. More recent input-output data are available from proprietary sources. The current and most widely used input-output package (IMPLAN Professional Software Version 2) includes nearly 500 industry sectors and allows geographic aggregation at the state, county, subcounty, and zip

Developing Subnational Commodity Flow Data Using Disaggregation 131 Select Industry Commodity C ro p P ro d u ct io n Percent of Total Fo re st ry a n d L og gi n g Percent of Total B as ic C h em ic al M an u fa ct u ri n g Percent of Total Basic chemicals 893 1% – 0% 28,779 34% Real estate 14,249 19% 115 1% 242 0% Support activities for agriculture and forestry 10,761 15% 2,811 14% – 0% Forestry and logging products – 0% 12,924 62% 22 0% Petroleum and coal products 4,476 6% 187 1% 6,233 7% Wholesale trade 4,491 6% 1,267 6% 5,054 6% Crop products 8,063 11% 5 0% 886 1% Agricultural chemicals 7,897 11% 29 0% 581 1% Monetary authorities, credit intermediation and related activities 6,415 9% 292 1% 344 0% Electric power generation, transmission, and distribution 2,681 4% 10 0% 3,697 4% Management of companies and enterprises – 0% – 0% 6,224 7% Natural gas distribution 683 1% 1 0% 3,455 4% Truck transportation 1,725 2% 471 2% 1,369 2% Scientific research and development services – 0% – 0% 2,850 3% Plastics and rubber products 738 1% 21 0% 1,501 2% Rights to nonfinancial intangible assets 118 0% 3 0% 2,076 2% Other fabricated metal products 67 0% 19 0% 1,543 2% Maintenance and repair construction 778 1% 32 0% 759 1% Rail transportation 451 1% 71 0% 866 1% Other commodities (products and services) 8,863 12% 2,562 12% 17,556 21% Total intermediate inputs 73,351 100% 20,819 100% 84,037 100% Compensation of employees 14,569 4,115 15,324 Taxes on production and imports, less subsidies (6,707) 1,135 898 Gross operating surplus 38,308 5,990 5,742 Total Value Added 46,171 11,240 21,964 Total Industry Output 119,522 32,060 106,001 Note: Numbers may not add up due to rounding. Table 5.6. Use table for three industries (dollars in millions). code level. The IMPLAN data consist of (1) a matrix of industry-specific technical coefficients that specifies the quantity of inputs necessary to produce a given unit of output and (2) sector- specific final demand, final payments, industrial output, and employment. This combination of industry-specific activity for a given geographical area allows a more robust estimation of freight activity for that region by industry sector. Input-output analysis does have limiting issues that are worthy of consideration. The techni- cal coefficients are treated as constants, thus they do not account for the real-life variability in the number of specific inputs per product produced across various firms or regions. In reality,

132 Guidebook for Developing Subnational Commodity Flow Data firms are constantly adjusting and substituting inputs as market conditions change, technologies change, labor productivity changes, prices for labor and equipment change, and the structure of the industry changes. This limitation is especially problematic if this type of modeling approach is applied to longer period forecasting. Example 1—Geographic Allocation for San Joaquin Valley Truck Trip Table Development In 2004, the San Joaquin Valley Council of Governments (COG) developed a truck component to its travel demand model. Generating a truck trip table at the TAZ level was a critical step in the development of this model. To develop the intercounty portion of the truck trip table, the San Joaquin Valley COG disaggregated county-level California Intermodal Transportation Manage- ment System (ITMS) freight flow data—first to the zip code level, then to the TAZ level, using a series of economic relationships and conversion factors. The ITMS database was a county-level freight flow database managed by the California Department of Transportation generated pri- marily from TRANSEARCH data. The specific steps of this process are described below. For intercounty truck trips, the first step in this process converted the truck tons in the ITMS database into truck trips using average payloads for each of the commodities in the ITMS data. The average payload data were developed from the 1997 Vehicle Inventory and Use Survey. Application of the payload matrix to the ITMS data created a county-level truck trip table from the truck tonnage data for the state of California. The ITMS truck trip data were then grouped geographically to create relevant regions for the truck model. Internal regions were based on the eight counties that constitute the San Joaquin Valley study area. Regions external to the Valley were developed to correspond to each of the external cordons that can be used for trucks exiting the study area. Next, the county-level ITMS commodity flow truck trip data were allocated to zip codes. This allocation was performed using Dun & Bradstreet employment data from 2000. These data include the number of employees by zip code for each of the eight counties in the San Joaquin Valley for thousands of differ- ent employment categories based on the SIC system at a four-digit level. For the agricultural industry, estimated farm acreage was used rather than employment to allocate the ITMS tons to each zip code. Farm acreage was used because it is more representative of the location where the goods are actually produced than employment data. Employment data in the agricultural industry are often inaccurate due to the seasonality of much of the industry’s employment. Employment data also can be geographically inaccurate due to the tendency to report employ- ment at company headquarters in urban areas rather than at the rural locations where the goods are actually produced. The allocation from agriculture was then combined with the allocation from other industries to create two zip-code-to-county tonnage tables. One table contained the tonnage originating in each zip code destined for each county, while the other table contained the tonnage destined for each zip code originating in each county. The zip-code-level tonnage data were then allocated to the TAZs in the truck model. This allocation was done based on employment data from the statewide model combined with the areas of geographic overlap between the zip codes and the TAZs. This process developed the final TAZ-level truck tonnage table for the 1996 ITMS data. This truck trip table was then projected to the year 2000 based on the freight tonnage growth derived from the FHWA FAF data for the state of California. Example 2—Using the Regression Method to Disaggregate the FAF Database For this example, we will consider an MPO whose metropolitan region is not specified as a FHWA FAF region. The MPO wants to develop a county-level flow database for the wood products

Developing Subnational Commodity Flow Data Using Disaggregation 133 commodity. We will assume that this region is composed of three counties: Aspen, Birch, and Cedar. The MPO region is bordered by an ocean to the west and mountains to the east, so it is assumed that the vast majority of its wood is going either north or south. The first step in this process is to review Table 5.5 (provided previously in this chapter) to determine whether there are established variables that have a strong correlation with wood prod- ucts production. Table 5.5 shows that the correlation between wood products employment and wood products tonnage produced is 75 percent. This indicates that wood products employment is a good predictor of wood products tonnage at the regional level. It is therefore reasonable to assume that it also is a good predictor at the county level. The next step is to identify county-level employment for the three counties in the MPO region. The MPO first researched the U.S. Census Bureau web site, but found that employment data in all three counties were suppressed due to proprietary concerns. However, the MPO was able to secure county-level employment data from its state department of commerce by industry that showed 2007 employment for the three counties in wood products was 1,000, 2,000, and 3,000 for Aspen, Birch, and Cedar counties, respectively. Based on this information, the MPO performed a regression with FAF state-level wood prod- ucts tonnage as the dependent variable and state-level wood products employment as the inde- pendent variable. For simplicity, it is assumed that this regression equation is calculated as the following: Wood Products Tonnage = 10,000 tons  Wood Products Employment This regression equation can then be applied to the county-level employment data such that a table can be developed (see Table 5.7). The step shown above is the core of the use of regression in this estimation process. The remaining work needed to develop a commodity flow database is to determine the origin- destination patterns of the wood products that are produced along with the modes used. This can be done using one of several different techniques that are discussed Chapters 2 through 4. These options include the following (generally sorted from the least resource-intensive option to most resource-intensive option): • Leverage FAF origin-destination patterns for the state where the MPO resides. • Interview local industry experts and adopt their estimates (see Chapter 4 for details on this method). • Conduct roadside truck surveys on the main roadways used by trucks to exit the MPO (see Chapter 3 for details on this method). • Conduct an establishment survey of the wood products manufacturing companies in the MPO (see Chapter 2 for details on this method). It should be noted that a similar process can be used to estimate wood products entering the MPO region. Regression equations can be developed for attractions as well as productions as County Wood Products Tonnage Employment Estimated Wood Products Tonnage Output Aspen 1,000 10,000,000 tons Birch 2,000 20,000,000 tons Cedar 3,000 30,000,000 tons Table 5.7. Estimating wood products tonnage output for hypothetical MPO.

134 Guidebook for Developing Subnational Commodity Flow Data demonstrated in a report prepared for FHWA (Cambridge Systematics 2009). Additionally, input-output analysis can be used to determine the full range of products consumed by the wood products industry as a first step in the development of a full supply chain for the wood products industry within the MPO. Example 3—Using Iterative Proportional Fitting: Limited Prior Information This section provides an example of how commodity flow data may be disaggregated where limited or no prior information is available using the iterative proportional fitting approach. Table 5.8 shows a four-by-four matrix of hypothetical origins and destinations corresponding to Lewis, Clark, Adams, and Lincoln. These locations could be counties, states, cities, or any other geographical points or regions where aggregate information from a national source is available, but information on the distribution of flows or shipments between origins and destinations is unavailable. In the initial matrix (see Table 5.8), the lightly shaded areas (the column titled “Marginal Row Totals” and the row titled “Marginal Column Totals”) are known and correspond to the previ- ously presented equations: ∑ ∑= =+ +x x x xi ijj j ijiand (Eq. 12, 13) The cells with a light diagonal background in the center of Table 5.8 represent information that is sought. Initially, when there is no information, all cell values are equal to “1.” Then, beginning with the first iteration of factor adjustments, the row values for each cell are divided based on the proportional total for each cell out of the row total. In the row adjustment table (see Table 5.9), this corresponds to 8.50, 15.75, 4.50, and 6.50 for all Lewis, Clark, Adams, and Lincoln destinations, respectively. Destinations Marginal Row Totals O ri gi n s Lewis Clark Adams Lincoln Lewis 1 1 1 1 34 Clark 1 1 1 1 63 Adams 1 1 1 1 18 Lincoln 1 1 1 1 26 Marginal Column Totals 23 41 62 15 141 Note: Cells with a light diagonal background are values to be estimated and are initially set to 1. “Marginal Row Totals” and “Marginal Column Totals” (light shading) are the known origin and destination totals for each county. Table 5.8. Initial origin-destination matrix. Destinations Marginal Row Totals O ri gi n s Lewis Clark Adams Lincoln Lewis 8.50 8.50 8.50 8.50 34.00 Clark 15.75 15.75 15.75 15.75 63.00 Adams 4.50 4.50 4.50 4.50 18.00 Lincoln 6.50 6.50 6.50 6.50 26.00 Marginal Column Totals 35.25 35.25 35.25 35.25 141.00 Table 5.9. First iteration—row adjustment.

Developing Subnational Commodity Flow Data Using Disaggregation 135 Then a proportional column adjustment occurs based on the new cell values derived from the previous row adjustment and relative to each column total. Thus, in order to arrive at the 5.55 value in the first row and column cell of Table 5.10, the row-adjusted value of 8.50 is divided by the column sum (35.25) (see Table 5.9) and then multiplied by the initial column total of 23 (see Table 5.8). The resulting matrix after the column adjustment, now lightly shaded, represents the estimated flows from each origin and to each destination, yet still summing to the column and row totals. This represents one complete iteration. No further iterations are necessary since they will result in identical cell values. The final origin-destination matrix is shown in Table 5.10. The iterative proportioning process has been automated within several software systems, including recent travel demand modeling software, so this process can be fully captured within the current set of travel demand models available to most MPOs and state DOTs. User’s Guide Worksheet Punch List • Select commodity flow data disaggregation technique. • Implement commodity flow data disaggregation technique. 5.3 Next Steps This chapter describes the data disaggregation process. It illustrates that data disaggregation can be done using basic or very complex techniques. The level of complexity will depend on the precision needed in the answer to the question being asked by the transportation agency along with available data and resources available within the transportation agency. Disaggregation can be considerably powerful because it develops comprehensive data that can cover a wide range of modes, geographies, and commodities within a single analysis. This process can be combined with the supplemental data collection/assembly techniques described in Chapters 2.0, 3.0, and 4.0 to develop subnational commodity flow data that are comprehensive and accurate for indus- tries and geographies of the most importance to local transportation agencies. Refer back to the Playbook section to identify the next portion of the Guidebook that will be most relevant to where your transportation agency is in the data collection process. Destinations Marginal Row Totals O ri gi n s Lewis Clark Adams Lincoln Lewis 5.55 9.89 14.95 3.62 34.00 Clark 10.28 18.32 27.70 6.70 63.00 Adams 2.94 5.23 7.91 1.91 18.00 Lincoln 4.24 7.56 11.43 2.77 26.00 Marginal Column Totals 23.00 41.00 62.00 15.00 141.00 Table 5.10. First iteration—column adjustment.

Next: Chapter 6.0 - Playbook »
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!