National Academies Press: OpenBook

Guidebook for Developing Subnational Commodity Flow Data (2013)

Chapter: Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys

« Previous: Chapter 1.0 - Overview of the Guidebook and Key Issues
Page 16
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 16
Page 17
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 17
Page 18
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 18
Page 19
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 19
Page 20
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 20
Page 21
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 21
Page 22
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 22
Page 23
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 23
Page 24
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 24
Page 25
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 25
Page 26
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 26
Page 27
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 27
Page 28
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 28
Page 29
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 29
Page 30
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 30
Page 31
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 31
Page 32
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 32
Page 33
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 33
Page 34
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 34
Page 35
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 35
Page 36
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 36
Page 37
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 37
Page 38
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 38
Page 39
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 39
Page 40
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 40
Page 41
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 41
Page 42
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 42
Page 43
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 43
Page 44
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 44
Page 45
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 45
Page 46
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 46
Page 47
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 47
Page 48
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 48
Page 49
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 49
Page 50
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 50
Page 51
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 51
Page 52
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 52
Page 53
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 53
Page 54
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 54
Page 55
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 55
Page 56
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 56
Page 57
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 57
Page 58
Suggested Citation:"Chapter 2.0 - Collecting Subnational Commodity Flow Data Using Establishment Surveys ." National Academies of Sciences, Engineering, and Medicine. 2013. Guidebook for Developing Subnational Commodity Flow Data. Washington, DC: The National Academies Press. doi: 10.17226/22523.
×
Page 58

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

16 Collecting Subnational Commodity Flow Data Using Establishment Surveys 2.1 Introduction This section provides an examination of how to develop subnational commodity flow data using establishment surveys. Establishment surveys are conducted by interviewing representa- tives of specific businesses and gathering information about freight flows in and out of specific physical establishments. The Guidebook identifies the following 10 general steps that need to be addressed in administering an establishment survey data collection program: • Step 1—Geographic boundary of concern • Step 2—Industry/commodity classification scheme • Step 3—Universe of companies to survey • Step 4—Determining sample size • Step 5—Establishing data elements • Step 6—Survey questionnaire • Step 7—Conducting the survey • Step 8—Database assembly • Step 9—Data expansion • Step 10—Data accuracy and validation Many of these steps are interrelated, but the Guidebook discussion of each step is ordered as shown in the list above. The description of each step is structured to focus on the following four key elements described in the Playbook (Chapter 6.0): • Key Considerations—A brief description of the main issues encountered and tradeoffs that will need to be made for the step. • Implementation Process—A detailed description of how to implement the step. • Example—An example of how this step has been implemented in other studies. Many of the examples in this chapter are taken from demonstration establishment surveys conducted in Seattle and Spokane as part of NCFRP Project 20. • User’s Guide Worksheet Punch List—Simple bulleted instructions that Guidebook users can check off to ensure that they have implemented each of the major steps involved in developing an establishment survey. Each of these four elements is designed to focus on different aspects of conducting an estab- lishment survey and to reflect the types of activities that might be undertaken by a state or local transportation agency. For transportation agencies that are considering hiring a contrac- tor to develop an establishment survey, reading the “Key Considerations” section of each step will likely provide enough information for the generation of a request for proposals (RFP) on the topic. Transportation agencies that want to understand the details of how to conduct an establishment survey should focus on the “Implementation Process” sections in addition to the “Example” section. The “Example” section also will provide specific references to efforts in C H A P T E R 2 . 0

Collecting Subnational Commodity Flow Data Using Establishment Surveys 17 other regions that can be compared with what already has been done in the agency’s region and with responses to RFPs submitted to the agency. After transportation agencies have a sufficient background in all of the aspects related to developing an establishment survey, the “User’s Guide Worksheet Punch List” sections can be used to walk the agency through all of the specific steps that need to be done to implement the survey. This section can also be used to compare to a response to RFPs submitted to the agency to determine the completeness of the submittals. 2.2 Step-by-Step Process for Conducting Establishment Surveys This section provides a comprehensive examination of the steps involved in developing sub- national freight data using establishment surveys. These examples are designed to provide a detailed description of all of the necessary steps, including addressing relevant implementation issues, that a typical regional planning office, local agency, or state department of transportation may experience when considering and implementing future establishment survey efforts. As part of the research conducted to develop the Guidebook, two pilot-scale demonstrations were conducted of actual establishment surveys. The surveys were conducted in the Seattle metro- politan region and in Spokane (both in Washington state). These demonstrations also tested a number of aspects of the methods described in order to advance the state of practice for sub- national commodity flow surveys. Several of the examples provided in this chapter are based on the pilot-scale demonstration surveys (also referred to as the demonstration survey in this chapter). Step 1—Geographic Boundary of Concern Key Considerations Typically, the study area of an establishment survey is consistent with the jurisdiction of the agency conducting the survey. So a state DOT is obtaining information for an entire state and an MPO is obtaining information in their metropolitan region. However, the definition of subregions of concern within the study area and the definition of external regions from/to which commodities may be shipped to/from the region have ramifications throughout the survey process. For example, the definition of these regions becomes one dimension of the sampling matrix. The more geographic zones there are from a sampling per- spective, the larger the number of cells that will need to be filled, and the more surveys that will need to be collected. However, if the defined geographic subregions are very large, it decreases the ability to relate the collected survey data to the local freight network if any of the applications of the data involve routing the flows on the local freight net- work. It will be especially difficult to determine the network elements used to access other subregions within the geographic area of concern. Similarly, size considerations need to be used in the definition of external regions. An additional consideration will be features that are unique to the local area of concern. For example, there may be specific subregions that are important either because they are known to be freight hubs or because they include key freight facilities of concern. Ideally, subregions will be defined based on boundaries that create unique land use or population characteristics within a region.

18 Guidebook for Developing Subnational Commodity Flow Data Implementation Process The first step in this process is to define the overall boundary of concern for the establishment survey. Typically, this will coincide with the jurisdiction of the agency conducting the survey. Based on the process used for conducting establishment surveys, it will be easier for an MPO region to define its boundaries at the county level to streamline the data expansion and valida- tion processes. However, if county-level boundaries are not feasible or more geographic focus is required for a particular commodity study, there are supplemental estimation methods that can be incorporated at later steps to allow for subcounty estimates. The next step is to identify the number of subregions within the boundary of concern. Some of the subregions to consider include TAZ, city, county, BEA zone, metropolitan statistical area, or BTS CFS zone. The selection of zones should be based on the types of planning activities that are likely to occur in the region along with the boundaries of other data sets that the transpor- tation agency utilizes for other activities. If the purpose of the establishment survey is to feed into a MPO-level travel demand model, then it is likely that the smallest feasible geographic unit will be utilized. For example, there are a substantial number of county-level employment and sales data, number of establishments data, and commodity production data that are published nationally or compiled at the MPO level, but subcounty data are more difficult to find. As these county-level data sets may be used for establishing control totals for databases, inputs to estimate missing data, and expansion factors for the survey itself, collecting data at the county level is a very attractive option. Developing a freight flow database at very refined geographic levels (such as zip code or TAZ) is sometimes desired. One of the biggest issues with developing these databases is the difficulty in getting responses to survey questions at this level of detail. Often this level of detail in an estab- lishment survey is considered proprietary by the survey respondent or too time consuming to realistically include in the survey effort. If the purpose of the establishment survey is to better understand the flows of a specific com- modity, then geographic zones should be determined based on previous information regarding production locations of the commodity. In other situations, the subregions will be defined based on regions that are known to be freight hubs within larger geographic areas. The final step is to define the boundaries for the external regions. Options for defining exter- nal regions are numerous. A few of the common approaches include the following: • A single external region that tracks interregional flows relative to intraregional flows. • External regions defined in terms of the direction(s) in which freight leaves and arrives in the region—north, south, east, or west. This approach can be complicated because it implies knowledge of the routing patterns of the flows. • Regions that are mapped to corridors that are thought to capture large portions of external traffic. For example, external regions can be developed based on likely destinations for truck traffic as it leaves the metro area. • Regions that are mapped to specific clusters of economic activity at the metropolitan, state, or multistate level. This can include regions such as BEA zones, individual state, Northeastern United States, the Midwest, the West Coast, etc. Examples As noted above, there are several options for defining geographic regions for an establishment survey. For the demonstration survey conducted in Seattle and Spokane, the counties that most closely aligned with the MPO boundaries were chosen as the regions of interest. The study area for intraregional flows was defined as consistent with the MPO boundaries, and the internal zones were defined as counties within the MPO boundary. For the Seattle region, this included King,

Collecting Subnational Commodity Flow Data Using Establishment Surveys 19 Pierce, and Snohomish counties. For the Spokane region, Spokane County was the sole internal geographic unit. For both surveys, the external region was defined as a single region (i.e., all flows to/from outside Seattle or Spokane were lumped in a single external geographic zone). This was in large part due to the limited sample that would be taken for the demonstration survey. In 2005, the Georgia DOT conducted establishment surveys of warehouse and distribution facilities in the Port of Savannah subregion. For these surveys, the study area for intraregional flows was defined as the area bordered by I-95, I-16, and the Savannah intercoastal waterway. This study area was then divided into internal zones based on pods or clusters of container activ- ity. The external regions were defined as north, south, west, and east. These zones were not con- sistent with any existing set of zonal boundaries, but were created especially for the study. This way of defining external regions required respondents to the survey to provide street addresses for where shipments were coming from or going to. Then, the Georgia DOT, using a geographic information system (GIS), placed the flows in one of the four external regions. The north and south trucks were assumed to leave the region using I-95. The west trucks were assumed to leave the region on I-16. The trucks going to the northeast were going to the Savannah port termi- nals. Figure 2.1 shows a general schematic of truck trip directions and how they were assigned to external regions from what was termed the “West Pod” in the Savannah port subregion. This method was successful in estimating the number of trucks on each of the subregion major freight highway corridors. Zones also can be set up at the zip code level. Figure 2.2 shows the zip code zones for Dane County in Wisconsin. Zones set up at the zip-code level are convenient for measuring freight activity, because this level provides submetro-level detail without the need for populating the hundreds of zones that are often used for passenger vehicle zone systems. Figure 2.3 shows a zone system for the Seattle metropolitan region. This is similar to the zone system that is discussed in the detailed example discussed later in this chapter. A hypothetical example of how to lay out external regions would have to be consistent with the BTS CFS. This would allow a comparison of results between the two establishment survey efforts. For example, if the state of Tennessee were to conduct an establishment survey, it could define the state’s counties as internal regions. For external regions, it could use boundaries of aggregated FAF regions, such as those shown in Figure 2.4. Assuming that the four zones within Tennessee are the internal zones and the five zones outside of Tennessee are external zones, the number of origin-destination combinations would be calculated as 4 multiplied by 5, or 20. Figure 2.1. Matching directions to external regions in Savannah.

20 Guidebook for Developing Subnational Commodity Flow Data Figure 2.2. Zip-code-level map for Dane County (Wisconsin). Another hypothetical example could be considered for an MPO region that was developing a travel demand model. This region could collect establishment data internal to its region at the zip code level and then make external regions consistent with the corridors that are commonly used for external trips. Using one of the disaggregation methods discussed in Chapter 4.0, the zip-code-level data could be later disaggregated to the TAZ level needed to feed into the travel demand model. User’s Guide Worksheet Punch List • Determine your study area for the establishment survey. • Identify the boundaries for internal regions for the study area. • Identify the boundaries for the external regions for the study area. • Calculate the number of unique origin-destination pairs in your region. This is the number of internal regions multiplied by the number of external regions.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 21 Figure 2.3. Zone system for Seattle metropolitan region establishment survey. Figure 2.4. Tennessee example of aggregated FAF regions used to define external regions. Source: (Cambridge Systematics, Inc. 2007)

22 Guidebook for Developing Subnational Commodity Flow Data Key Considerations The topic of commodity classification schemes and their relationship to industry classification was introduced in Chapter 1.0. For an establishment survey, both an industry classification system and a commodity classification system need to be adopted. The industry classification system will be used to determine which com- panies to survey and to obtain information about these companies/industries when expanding the survey data (see Step 9). Commodity classification is based on which commodities are of interest, the specific classification system (which will be selected on the basis of whether the data need to be combined with or compared to another data set that uses a particular classification scheme), and how recognizable the commodity classifications are to potential survey respon- dents. For example, it has already been noted that it may be useful to classify commodities using the SCTG system that has been adopted by both the CFS and FAF so that it will be possible to use these two databases for control totals at an aggregate level of geographic detail (e.g., FAF data on commodity flows may be available for an entire MPO region and the survey is attempting to collect data for counties within the region) or to fill gaps in the survey data. To determine how well the SCTG codes will work during data collection, it might be useful to conduct a pilot survey or outreach interviews with representatives of the indus- tries to be surveyed asking them to identify the products/commodities they ship and receive and then see how easily these “self-classifications” can be converted into the commodity classification system that is chosen. It also will be important to understand the relationship between commodity classifications and industry classifications. As mentioned in Chapter 1.0, an input- output model’s “make-use” table can be very helpful in making this connection because it shows what industries make and use each commodity. An example of a “make-use” table is shown in Table 2.8. If an establishment survey is going to focus on only a few very important commodities in the region, it will be impor- tant to know which industries produce and consume those products so that the proper companies are surveyed. It is useful if commodities are classified in a manner that makes it easy to see which industries produce which commodities. Additionally, it will be necessary to consider how to classify companies that are in multiple industries and/or produce multiple types of goods. The next consideration is the level of detail to include in the industry/commodity classification scheme. There is a tradeoff between the amount of commodity detail that is included in the survey responses and the sample size needed to accomplish the survey design and therefore the resources needed to implement the survey. It also is possible that there are particular industries or commodi- ties of concern to a local area. These can include industries that employ a large number of people in the local area, industries that produce a high dollar value of goods in the local area, or commodities whose production is known to be espe- cially sensitive to the performance of the transportation system (for example, those that are involved in “just-in-time” supply chains). Ideally, these specialized industries or commodities will be considered as a unique classification to maxi- mize the potential of the establishment survey to estimate its freight flows. Step 2—Industry/Commodity Classification Scheme

Collecting Subnational Commodity Flow Data Using Establishment Surveys 23 Implementation Process The first step is to review some of the industry and commodity codes that are typically used for establishment surveys. Many freight flow databases, including CFS and FAF, are based on the SCTG at the two-digit level. Other commonly used commodity codes include the STCC and the HS Code. Commonly used industry codes include the North NAICS and the SIC. SCTG codes and STCCs are listed and described in Tables 2.1 and 2.2, respectively. The next step is to determine whether the survey will attempt to cover all industries and commodities in the study area or only a subset of industries or commodities. If the survey is industry/commodity specific, then special care should be taken to ensure that the industries and/or commodities are well defined. Additionally, both the industries that produce and con- sume a commodity should be considered for incorporation into the survey. For example, the apparel industry generates several inbound commodity flows due to the commodities that it consumes in its manufacturing process. The commodities that it consumes include yarn, tex- tiles, apparel, and even printed matter. Table 2.3 shows a range of commodities consumed by select industries based on input-output data developed by BEA. Following the determination of what industries and commodities will be included in the survey, a classification scheme can be chosen. If the survey covers all industries and commodities, then consistency with national freight flow data sets, business listing databases, and other local freight data sets is a more important consideration. Note that the selection of an industry/classification scheme will be revisited following consid- eration of sampling issues in Step 4. Example Only a small number of industries were included in the demonstration survey of both the Seattle and Spokane regions. The two primary criteria for selecting industries were (1) the importance of an industry to each region’s economy and (2) the need to demonstrate commod- ity shipment patterns that are expected to be important to regional economies but that may not be well captured in the CFS or FAF. To identify the largest industries in each region, the U.S. Census County Business Patterns (CBP) database was used to determine number of employees, annual payroll, and number of establishments by industry sector. For the Seattle region, the transportation equipment manu- facturing industry was found to be the largest manufacturing sector in terms of both employ- ment and economic output measured in dollars. The selection of the second industry to survey in the Seattle region was targeted towards an industry with heavy reliance on the Port of Seattle. This decision was based on the exclusion of imported freight flows in the CFS survey process. Data were examined related to containerized import commodities at the Port of Seattle. Addi- tionally, FAF data were reviewed for the Puget Sound region, and the determination was made that apparel is a major commodity moving through the port and also a major consumer retail commodity for the region. Due to the existing commodity classification coding options for Another consideration is the industry or commodity classification of the database that will be used to identify companies to survey. To the extent that the chosen classification scheme is consistent with the scheme used to categorize companies in this database, it will be easier to select companies that are consistent with the desired sample for each industry or commodity.

24 Guidebook for Developing Subnational Commodity Flow Data SCTG Code Description 01 Live Animals and Fish 02 Cereal Grains (including seed) 03 Other Agricultural Products, except for Animal Feed 04 Animal Feed and Products of Animal Origin, n.e.c. 05 Meat, Fish, and Seafood, and Their Preparations 06 Milled Grain Products and Preparations, and Bakery Products 07 Other Prepared Foodstuffs, and Fats and Oils 08 Alcoholic Beverages 09 Tobacco Products 10 Monumental or Building Stone 11 Natural Sands 12 Gravel and Crushed Stone 13 Nonmetallic Minerals, n.e.c. 14 Metallic Ores and Concentrates 15 Coal 16 Crude Petroleum Oil 17 Gasoline and Aviation Turbine Fuel 18 Fuel Oils 19 Coal and Petroleum Products, n.e.c. 20 Basic Chemicals 21 Pharmaceutical Products 22 Fertilizers 23 Chemical Products and Preparations, n.e.c. 24 Plastics and Rubber 25 Logs and Other Wood in the Rough 26 Wood Products 27 Pulp, Newsprint, Paper, and Paperboard 28 Paper or Paperboard Articles 29 Printed Products 30 Textiles, Leather, and Articles of Textiles or Leather 31 Nonmetallic Mineral Products 32 Base Metal in Primary or Semifinished Forms and in Finished Basic Shapes 33 Articles of Base Metal 34 Machinery 35 Electronic and Other Electrical Equipment and Components, and Office Equipment 36 Motorized and Other Vehicles (including parts) 37 Transportation Equipment, n.e.c. 38 Precision Instruments and Apparatus 39 Furniture, Mattresses and Mattress Supports, Lamps, Lighting Fittings, and Illuminated Signs 40 Miscellaneous Manufactured Products 41 Waste and Scrap 43 Mixed Freight n.e.c. = not elsewhere classified Table 2.1. Two-digit SCTG codes and descriptions.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 25 STCC Description 01 Farm Products 08 Forest Products 09 Fresh Fish or Other Marine Products 10 Metallic Ores 11 Coal 13 Crude Petroleum, Natural Gas or Gasoline 14 Nonmetallic Minerals; except Fuels 19 Ordnance or Accessories 20 Food or Kindred Products 21 Tobacco Products; except Insecticides – see Major Industry Group 28 22 Textile Mill Products 23 Apparel, or Other Finished Textile Products or Knit Apparel 24 Lumber or Wood Products; except Furniture – see Major Industry Group 25 25 Furniture or Fixtures 26 Pulp, Paper or Allied Products 27 Printed Matter 28 Chemicals or Allied Products 29 Petroleum or Coal Products 30 Rubber or Miscellaneous Plastics Products 31 Leather or Leather Products 32 Clay, Concrete, Glass or Stone Products 33 Primary Metal Products, including Galvanized; except Coating or other Allied Processing 34 Fabricated Metal Products; except Ordnance – see Major Industry Groups 19, 35, 36 or 37 35 Machinery; except Electrical – see Major Industry Group 36 36 Electrical Machinery, Equipment or Supplies 37 Transportation Equipment 04 Business Services Division 38 Instruments, Photographic Goods, Optical Goods, Watches or Clocks 39 Miscellaneous Products of Manufacturing 40 Waste or Scrap Materials Not Identified by Producing Industry 41 Miscellaneous Freight Shipments 42 Containers, Carriers or Devices, Shipping, Returned Empty 43 Mail, Express or Other Contract Traffic 44 Freight Forwarder Traffic 45 Shipper Association or Similar Traffic 46 Miscellaneous Mixed Shipments 47 Small Packaged Freight Shipments 48 Hazardous Wastes 49 Hazardous Materials 50 Bulk Commodity Shipments in Boxcars Table 2.2. Two-digit STCCs and descriptions.

26 Guidebook for Developing Subnational Commodity Flow Data Distribution of Select Commodities Consumed by Industries Select Industries Crop Prod Food Mfg Apparel Mfg Wood Product Mfg Crop products 8,063.3 37,533.7 0.0 92.0 Animal products 368.6 74,690.5 0.0 0.0 Forestry and logging products 0.0 239.3 5.4 16,468.2 Fish and other nonfarm animals 0.0 3,835.3 0.0 0.0 Support activities for agriculture and forestry 10,760.9 0.0 0.0 0.0 Oil and gas extraction 0.0 5.1 0.0 5.1 Coal mining 0.0 279.2 1.6 2.0 Metal ores mining 0.0 0.0 0.0 0.0 Nonmetallic mineral mining and quarrying 506.9 0.0 0.0 0.0 Mining support services 0.0 0.0 0.0 0.0 Food products 0.0 79,681.4 0.0 9.5 Beverage products 0.0 101.4 0.0 0.0 Tobacco products 0.0 0.0 0.0 0.0 Yarn, fabrics, and other textile mill products 28.5 0.0 9,064.3 371.7 Nonapparel textile products 218.2 283.2 526.0 229.9 Apparel 0.0 0.0 2,292.0 0.0 Leather and allied products 0.0 19.4 155.0 8.9 Wood products 440.1 72.9 0.0 18,274.7 Pulp, paper, and paperboard 0.0 555.9 20.7 113.0 Converted paper products 300.7 16,353.3 69.3 267.7 Printed products 23.4 248.5 245.1 26.2 Petroleum and coal products 4,476.4 722.9 24.1 238.0 Basic chemicals 892.9 1,275.3 16.9 228.6 Resins, rubber, and artificial fibers 0.0 250.7 108.6 454.1 Agricultural chemicals 7,897.3 0.0 0.0 0.0 Pharmaceuticals and medicines 0.0 908.1 0.0 13.5 Paints, coatings, and adhesives 5.9 25.0 0.0 409.5 Soaps, cleaning compounds, and toiletries 4.1 391.0 0.0 86.7 Other chemical products 40.5 1,075.4 7.5 80.4 Plastics and rubber products 737.9 8,393.9 36.1 781.6 Nonmetallic mineral products 14.0 1,097.8 0.0 966.3 Warehousing and storage 548.5 626.7 101.8 185.8 Table 2.3. Use table for select industries. apparel, apparel manufacturing was the industry that was surveyed in the demonstration survey. But much of the apparel that is moved into and out of the Seattle region (and all of the apparel imported through the Port of Seattle) is actually flowing to businesses other than apparel manu- facturing (such as apparel wholesalers and retailers). So it is important to identify these addi- tional industries for sampling inbound commodity flows. The identification of these industries (that purchase apparel) involves the “make-use” table. For the Spokane region, the largest manufacturing industry in the region in terms of employment and number of establishments was the fabricated metal manufacturing industry (NAICS 332). Another large industry was the food manufacturing industry (NAICS 311). This industry was selected because Spokane is the largest metro area in Eastern Washing- ton, and food manufacturing in an urban area surrounded by rural agricultural regions may exhibit some unique commodity flow patterns. Additionally, the differences in produced and consumed commodities highlight one of the key differences between national and regional establishment surveys.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 27 The final industries selected for the demonstration survey were the following: • Transportation Equipment Manufacturing (Seattle). The major commodity produced by this industry is easily classified in the SCTG system as Transportation Equipment (SCTG code 37). • Apparel Manufacturing (Seattle). Examining the commodity classification coding options for apparel, either the STCC classification (STCC 23, Apparel) or SCTG 304 seem to be appropriate. • Food Manufacturing (Spokane). The STCC classification system provides a simpler bridge of commodity classifications for the outbound shipments of this industry (STCC 20, Food, and Kindred Products). • Fabricated Metal Products Manufacturing (Spokane). The primary commodities shipped outbound by this industry can be classified as SCTG code 33, Articles of Base Metals. User’s Guide Worksheet Punch List • Review the most commonly used industry and commodity codes. • Review any existing state and local freight flow databases. Note the classifica- tion schemes that are used. • Determine if the local establishment survey will develop a final database based on industry or commodity codes. • Review available economic data for the study area. This should include both employment and output data by industry. The U.S. Economic Census is a com- monly used database for this type of data. State and regional economic devel- opment agencies often have industry-specific economic data as well. • Determine if the local establishment survey will focus on a subset of industries and commodities or a full set. • Determine which classification scheme to utilize for the establishment survey. Utilize two-digit SCTG codes as a default unless the responses to the bullets listed above indicate otherwise. Step 3—Universe of Companies to Survey Key Considerations Identifying the universe of companies to survey can involve use of preexisting business databases, information from local trade associations, and local knowl- edge of major companies in the region. Business databases are proprietary lists of companies that can be purchased for developing comprehensive lists of busi- nesses. However, these databases are compiled through a variety of different sources with businesses categorized in different fashions. Therefore, it is impor- tant to review these databases with other local sources to ensure that at least the major known companies in key industries are included. Trade associations, along with staff at economic development agencies or chambers of commerce, may also be able to assist in this review. State and federal labor agencies typically maintain a list of firms participating in unemployment insurance programs, which may be a good source of business

28 Guidebook for Developing Subnational Commodity Flow Data data. Licensing agencies may also maintain lists of companies in certain indus- tries. However, this information may not be made available to the agency con- ducting the survey due to confidentiality restrictions. It also is important to capture as much information as possible about each com- pany in the list of potential companies to survey. This will assist in stratification, as there is typically a desire to oversample companies that are larger. Larger com- panies are typically identified through either the size of the facility, number of employees, sales at the location, or local knowledge. Implementation Process The survey effort will need an accurate list of firms in the region. There are a number of sources for the necessary data elements, including names, addresses, contact person, and phone number, with some sources able to provide additional administrative information, such as on-site activi- ties (e.g., NAICS code designations), number of employees, and value of shipments. One source may be from another local agency. For example, economic development agencies assemble lists of firms, and labor agencies have a list of firms participating in unemployment insurance pro- grams (e.g., “Quarterly Census of Employment and Wages” data) or have assembled a list for a particular project. Trade associations will have a list of their members, and licensing agencies have lists of firms as well. If agencies have sufficient interest in improving and main- taining the integrity of their databases, it may be possible to use their data and update any out-of-date information using a memorandum of understanding (MOU) or data-sharing agreement. If no local sources are available or no local partners are willing to share their lists, there are vendors who specialize in lists of firms for any metropolitan area. Contracts regarding the shar- ing of these data contain some restrictions; however, with special documentation and confidenti- ality agreements, it is possible for a state DOT to purchase the list of all firms in its state and share these data with MPOs and researchers. Vendors charge varying prices for the fields, including specific contact information and geocoding information. Data elements should include company name, address, industry sector, number of employees, value of shipments, latitude/longitude of physical address, and contact person(s) phone number(s)/e-mail(s). The CBP database, produced annually by the U.S. Bureau of the Census, is extracted from the Business Register, which has the most complete, current, and consistent data for business estab- lishments, but the CBP database only publishes aggregate statistics about businesses in each U.S. county and does not provide names of individual establishments. InfoUSA and Dun & Bradstreet data are obtained from less systematic information-gathering processes. Sources include business licenses, trade associations, phone book directories, and other proprietary sources that are identi- fied by these companies. In addition, the categorization of businesses may be somewhat different between these two sources and relative to the CBP data. If the survey budget allows, it is recom- mended to purchase data from both of these companies to obtain as comprehensive a list of com- panies as possible. This will provide as exhaustive a list of potential companies to survey as possible. This is especially important given the relatively low response rate to establishment surveys. If a database like InfoUSA or Dun & Bradstreet is used, it will be useful to compare information such as the number of establishments and the size distribution of establishments as reported in CBP with similar aggregate statistics from the commercial databases to detect any systematic biases in the commercial databases (for example, under-representation of small companies). If these types

Collecting Subnational Commodity Flow Data Using Establishment Surveys 29 of biases are detected, it will be useful to work with trade associations and the local business com- munity to identify firms that are under-represented to include in the sampling frame. Using geocoded firm locations in a GIS environment makes it possible to build a sampling frame with geographic specificity rather than choosing a random sample of firms. The information developed from the surveying effort is intended to be descriptive statistics rather than inferential statistics. The collection of descriptive statistics is preferred since it is more easily translatable into a quantifiable commodity flow database. The descriptive statistics need to be coupled with geo- graphic targeting to ensure that specific inputs necessary for transportation planning are available. The data will be representative of the freight activities and freight community perspectives on problems located in specific areas. Example For the demonstration surveys in Seattle and Spokane, a list of firms was developed in each of the four industries using purchased establishment data from both Dun & Bradstreet and InfoUSA. For the survey purposes, the following data were received: • Contact name • Contact phone number • Company web site • Estimated revenue of the company • Size of establishment by square feet • Size of establishment by number of employees All of these data items are useful in collecting establishment survey data. These items also are useful for expanding collected data to represent the full population across an entire industry. However, it should be noted that the quantitative data provided in the commercial databases are typically provided in ranges, so estimates of these data were ultimately developed. For exam- ple, rather than reporting the actual employment at an establishment, a range of employees is reported. The U.S. Census CBP data also were used to compare to the purchased databases. Table 2.4 shows the number of firms identified in each industry and geographic region. As shown in Table 2.4, there was wide variability in the number of establishments that were found in each industry by the separate databases. This is likely the result of the different methodolo- gies that are used to identify and define establishments by each of the sources and the frequency and procedures for updating and purging the database of inactive companies. For the vendor- provided data, it also is important to understand how multiple branches, subsidiaries, etc., are captured in the data and how the industry sector is determined (for example, if a company reports itself in multiple industries, how this is reported in the database). Region Industry Number of Establishments County Business Patterns (CBP) Info USA Dun & Bradstreet Seattle Transportation Equipment Manufacturing 229 263 536 Apparel Manufacturing 59 16 272 Spokane Food Manufacturing 50 72 75 Fabricated Metal Product Manufacturing 112 28 125 Table 2.4. Number of establishments identified by industry and region.

30 Guidebook for Developing Subnational Commodity Flow Data User’s Guide Worksheet Punch List • Contact a state’s labor association, licensing authorities, and economic devel- opment agencies to determine the availability of business establishment databases in the study area. Capture as much information about individual companies as allowable through these sources. • Extract establishment count data from U.S. Census Bureau CBP. • Contact trade associations and chambers of commerce to solicit support for the overall establishment survey and to obtain any available list of companies by industry in the study area. • Determine whether the list obtained thus far is sufficient for the survey by comparing the CBP count of companies to those identified through the sources listed in the above bullets. • If needed, purchase a business list database from one of the proprietary sources. Include information on each business to assist in determining the rela- tive size of the business and industry for each business. • Develop a comprehensive list by combining information from all sources. • Review the comprehensive list with local specialists at economic development agencies and local chambers of commerce to ensure that there are no major oversights in the list. Step 4—Determining Sample Size Key Considerations There is always a question as to how large the sample size should be. Even in the case of the national CFS, sample size is significantly impacted by budget constraints. In designing a sampling strategy, it is preferred to establish statistical criteria for the data in advance of the survey. There is a tradeoff between confidence level achieved, the dimensions of the establishment survey, and the number of samples needed to survey. Additionally, data collection costs tend to increase proportionally relative to the number of samples desired. The dimensions of the survey include the geographic zones included in the survey, the commodities included, and the modes included. A survey that intends to cover 40 commodities, 20 geographic zones (a 20 x 20 origin-destination matrix), and 5 modes would have a sampling matrix with 8,000 cells. While not all cells will have meaningful data, a sample with 10 observations in each cell would require data on 80,000 shipments. It is easy to see how sample size can grow as the dimensions of a sampling matrix increase. Follow- ing the data collection effort, it may be necessary to aggregate cells from a desired comprehensive matrix to one that achieves the statistical criteria desired for the sur- vey and can still be accomplished within the available budget. From a pragmatic perspective, sample size considerations are often impacted by constraints. Oftentimes, the budget for a survey effort will be prescribed prior to the survey development process. In these instances, it will be important to under- stand the statistical confidence levels achievable with survey design elements that are more likely to be under the surveyor’s control such as the number of geographic zones and the number of commodities surveyed.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 31 Implementation Process Sample size is probably the most important determinant of precision for the information col- lected from the drawn sample, and it is jointly determined by (1) the distribution of a variable in the study population, which is reflected by the variable’s mean and standard deviation, and (2) the desired degree of precision and the statistical confidence level with which the analysis needs to be conducted. For establishment surveys, the number of surveys required to generate sufficient accuracy at a regional level will be far less than the number that would be required at a subregional level such as zip codes. Similarly, the number of surveys needed is impacted by the number of external regions that are desired for the survey process. Therefore, defining the geographic level of concern for both the internal region and external regions is important prior to commencing an establish- ment survey effort. Generally, internal and external regions should be defined with only enough detail to match the freight planning activities that are being considered by the transportation agency. Additionally, it should be considered that future freight planning efforts that require more detailed data can be accompanied by smaller data collection efforts designed to validate whether processes such as disaggregation can be used as a surrogate for collecting new establish- ment surveys. Disaggregation is discussed in greater detail in Chapter 4.0. Sample size also is an important determinant of costs in most data collection efforts. Given the budget constraints of a study, it is important to recognize the tradeoffs among the selected sampling method, the desired levels of precision and statistical confidence, and the correspond- ing sample sizes. Given the distribution of the variable values in the population, there are two ways to approach the analysis of sample size questions. The analyst could determine (1) the sample size required to achieve a desired level of precision and statistical confidence (statistical confidence is the prob- ability that an estimated value falls within a specific range) for selected variables of interest or (2) the degree of precision and the confidence level that would be expected for each variable of interest under a range of sample sizes. The approach of identifying the necessary sample size assumes that a reliable estimate of the variance of key variables is available. This estimate can be checked at the conclusion of the survey to confirm its accuracy. With the variance known, Equations 1 through 3 below can be used to arrive at the sample size estimate. Equation 1 is the standard normal expression for a sample mean (the mean is the average or expected value of a sample) stating that with 95 percent confidence, the mean will lie within two standard deviations. (The standard deviation is the measure of the variation or dispersion that exists from the mean. A low standard deviation indicates that data points are located close to the mean, while a high standard deviation indicates that data points are located relatively far from the mean): 2 , 2 (Eq. 1)x n x n )( − σ + σ Equation 2 is derived from the first equation: 4 (Eq. 2) n W σ = Equation 3 is achieved by solving for the sample size, n: 16 (Eq. 3)2 2n W= σ

32 Guidebook for Developing Subnational Commodity Flow Data W is the width in units of the confidence interval. So the wider the confidence interval, the lower the sample size necessary to maintain the 95 percent confidence level. The mean is repre- sented by x, and the standard deviation is s. It is up to the survey designer to decide the tradeoff between expected variance and width of the confidence interval that is acceptable. An alternative approach would be to develop the sample size based upon some acceptable threshold of error. In this case, Equations 4 through 6, reflecting the probability that pˆ lies within two standard deviations of the mean, would be used: ˆ 2 0.25 , ˆ 2 0.25 (Eq. 4)p n p n )( − + 4 0.25 (Eq. 5) n W= 4 1 (Eq. 6) 2 2 n W B = = Again, W refers to the width in units of the confidence interval. B refers to the allowable error. Thus, to allow a 10 percent error (90 percent confidence) in a normal population, sample size would have to be at least 100. Likewise, in order to achieve 99 percent confidence (allowing only 1 percent error), sample size would have to be 10,000. Example To illustrate these geographical considerations and others related to industry classification, sample size, data attributes, questionnaire design, and data extrapolation, consider a specific example. The example involves obtaining subnational commodity flow data for the Spokane region at the county level. The general structure of the type of information desired is represented by an origin-destination matrix shown in Table 2.5. Table 2.5 assumes only 4 geographical units (two origins, two destinations), 10 industry categories, and 5 modes of transportation. In this simple example, the number of cells to populate with freight flow information is 200, which is calculated as Number of cells = the number of origins (2) * number of destinations (2) * number of industries (10) * number of modes (5). Unfortunately, not many organizations are interested in only three geographic units. But it is evident that modification of the scale of data collection may occur from any of the three factors of the matrix (geographical units, industries, and modes). It may be that within cer- tain parts of the county or city, only a small number of industries exist, and therefore includ- ing a large number of industrial categories is not necessary. Likewise, the county of Spokane has no water freight transportation facilities and possibly no pipelines, thus reducing the magnitude of the origin-destination matrix considerably by lowering the number of modes from five to three. These tradeoffs are illustrated in Table 2.6, showing the outcomes of changing each of these three variables. Example 4 in Table 2.6 offers a relatively manageable matrix to populate, with 25 origin-destination combinations (5 origins and 5 destinations), 10 industries and three modes of transportation, resulting in 750 cells. Number of cells = the number of origins (5) * number of destinations (5) * number of industries (10) * number of modes (3).

Collecting Subnational Commodity Flow Data Using Establishment Surveys 33 Origin Industry Mode Volume of Freight Destination Origin A and B Ag Destination B and A Truck Rail Water Air Pipeline Mining Truck Rail Water Air Pipeline Manufacturing Truck Rail Water Air Pipeline XX Truck Rail Water Air Pipeline XX Truck Rail Water Air Pipeline Ten Industries Truck Rail Water Air Pipeline Table 2.5. Sample origin-destination freight flow matrix. Table 2.6. Different combinations of origin-destination matrix scales. Example Geographic Units (origins/destinations) Industries Modes Number of Cells to Populate Example 1 2 10 5 100 Example 2 10 3 3 90 Example 3 4 5 5 100 Example 4 25 10 3 750 Example 5 100 10 5 5,000

34 Guidebook for Developing Subnational Commodity Flow Data It also is evident from Table 2.6 that attempting to obtain detailed information on all three factors can produce a matrix of cells that becomes a massive data collection effort, as would be the case if you had 100 geographical units and industry information at the four-digit NAICS level (110) on five modes of transportation producing a 55,000 cell matrix. In this hypothetical example, the county of Spokane is interested in commodity flows into and out of the county and has allocated resources to conduct an establishment survey. If the variance associated with each two-digit estimate of the number of establishments (see Table 2.7) is known, the necessary sample size for each industry category within Spokane County can be identified. This calculation is based upon the assumption of a normal distri- bution and a population that is independent and identically distributed, where the sample size is calculated as: Sample size =16 variance Confidence Interval Width .2)()(∗ Table 2.7 shows that depending on the confidence interval width (+/- 1, 2, 3 units), the sam- ple size changes, increasing in size as confidence interval becomes smaller. In this hypothetical example, only the number of establishments at the different two-digit NAICS level is considered, but it is possible to use additional information regarding the size of each industry (employment, payroll, or output) to develop a stratified sampling design. The type of geographical unit associated with origin-destination pairs of shipments will not impact the sample size within each industry for the county if it is assumed that industries within the county have similar origin-destination patterns and shipment volumes relative to the indicator variables within the county. Since the sample is designed to capture enough information to be statistically valid (95 percent confidence interval and assuming normal population properties) for the establishments, wherever they physically exist, increasing or decreasing the geographic scale of the commodity flow activity will only impact the number Table 2.7. Calculating sample size for Spokane County industries. NAICS Code and Code Description Employees Payroll Total Establishments Variance Sample Size Assuming Different Confidence Interval Widths 1 2 3 --- Total for all sectors 177,847 $6,492,586 12,515 375 6,007 3,004 2,002 11 Agriculture, Forestry, Fishing and Hunting 105 $5,807 23 1 11 6 4 21 Mining, Quarrying, and Oil and Gas Extraction 28 1 13 7 4 22 Utilities 15 0 7 4 2 23 Construction 10,999 $547,263 1,535 46 737 368 246 31 Manufacturing 14,361 $616,393 548 16 263 132 88 42 Wholesale Trade 10,545 $478,700 735 22 353 176 118 44 Retail Trade 25,492 $667,276 1,646 49 790 395 263 48 Transportation and Warehousing 5,100 $181,864 285 9 137 68 46 51 Information 4,203 $208,232 217 7 104 52 35 52 Finance and Insurance 11,083 $565,205 901 27 432 216 144 53 Real Estate and Rental and Leasing 3,525 $101,663 659 20 316 158 105 54 Professional, Scientific, and Technical Services 8,782 $418,127 1,282 38 615 308 205 55 Management of Companies and Enterprises 2,357 $162,382 89 3 43 21 14 56 Administrative and Support and Waste Management and Remediation Services 9,482 $232,658 625 19 300 150 100 61 Educational Services 6,397 $149,144 140 4 67 34 22 62 Health Care and Social Assistance 35,391 $1,507,822 1,439 43 691 345 230 71 Arts, Entertainment, and Recreation 4,168 $81,750 161 5 77 39 26 72 Accommodation and Food Services 16,472 $245,715 1,026 31 492 246 164 81 Other Services (except Public Administration) 7,657 $173,849 1,142 34 548 274 183 99 Industries not classified 17 $302 19 1 9 5 3

Collecting Subnational Commodity Flow Data Using Establishment Surveys 35 of questions to be asked during the establishment survey or allocated after information has been obtained from the survey and processed. The establishments themselves represent the beginning and ending of the freight shipment activity and if the survey captured enough data to be statistically valid relative to the number of establishments within each industry category, then likewise information regarding the origin/destination of flows has been statis- tically represented. However, if it is believed that similar industries within the county may have vastly differ- ent shipment patterns in terms of either volumes relative to an indicator variable or origin- destination patterns, then the number of sample sizes will increase proportionally relative to the number of geographic units that are developed at the subcounty level. Therefore, assumptions about commodity flow patterns within a county are critical to the determination of sample size for a region. For the demonstration surveys conducted in Seattle and Spokane, there were two additional considerations that impacted the survey samples: • Random versus Nonrandom Samples. Using the establishment data from sampling frames, the research team determined the largest companies in each of the industry categories. These establishments were included in the sample to maximize the usefulness of the responses received by ensuring that those industries that represent a disproportionate share of total commodity flows are not excluded from the sample. This is considered a nonrandom sample. The remaining samples were selected randomly. As a rule of thumb, it is recommended that any company that represents 10 percent or more of an industry within the study area should be included in the survey as a nonrandom sample. For comparison purposes, in the CFS, approximately 40 percent of the sample is nonrandom. • Precanvassing. This refers to conducting an advance survey of selected companies to final- ize the survey questionnaire and provide information on the most effective survey processes. These surveys also can be used to collect field data, but that was not done for this particular demonstration survey effort. User’s Guide Worksheet Punch List • Determine the number of cells to populate for the desired survey by multiply- ing the origins, destinations, commodities, modes, and any other relevant vari- ables together. • Estimate the mean and variance for each of the variables using existing sources. • Determine the desired confidence interval for each variable. • Calculate the number of samples needed for each industry based on Equations 1 through 6 provided above. • If the number of samples for each industry seems higher than reasonable, aggregate cells until the sample size becomes manageable. • Determine the need for nonrandom samples based on the distribution of company size within each industry. This will need to be added to the survey sample. • Review the User’s Guide Worksheet Punch List from Steps 1 through 3 to deter- mine whether the results of the sampling size calculations impact these previ- ous survey design elements.

36 Guidebook for Developing Subnational Commodity Flow Data Step 5—Establishing Data Elements Key Considerations Data elements can be grouped into categories that are generally consistent across establishment surveys such as information about the establishment, origin-destination information for goods moving in and out of the establish- ment, and commodity information. However, there are several different ways of requesting these data, and these result in major differences in how the surveys are administered and the type of data that are collected. One of the key differ- ences across establishment surveys is whether information is collected about spe- cific shipments (such as the last 20 shipments) or whether aggregate information is collected across a specific period of time (such as the most recent year). Specific shipment information tends to be more difficult to obtain from companies par- ticipating in a survey. A request for aggregate information is likely to result in a higher response rate, but is also likely to be estimated by the individual company representative participating in the survey. Another key consideration is the collection of inbound survey data. Typically, companies have less information about inbound shipments. However, as the research team has noted, because inbound flows for a region represent a large fraction of the flows, it is important to collect data on inbound shipments for a local commodity flow survey (this is not done in the national CFS). Collecting data on inbound flows is more important at the local level than at the national level because only a relatively small fraction of total commodity flows in and out of the 50 states comes from outside of the United States (although that fraction has been increasing with globalization). At the local level, a substantial portion of commodity flows will be coming from outside of the study area of interest. There also is the need to consider how many different types of commodities to collect data on at each facility. Some surveys focus on the primary commodity at the location, while others may collect information on all commodities. Obviously, if a survey can be limited to only the primary commodities, it will take less time for the respondent to provide the data, and thereby response rates may increase. However, it will be difficult to determine how much has been missed. Other spe- cific elements that need to be considered are the units of collected data (e.g., tons, containers, value, shipping units). The level of detail requested on origin and destination data should be matched to the geographic boundaries of the internal and external regions for the study. However, more detailed geographic information should be obtained from the survey if it can be done without reduc- ing responsiveness or significantly increasing the duration of the survey. Implementation Process To identify data elements for the survey, it is useful to divide the data collection process into four components: (1) background information about the facility, (2) freight flow information on outbound shipments, (3) freight flow information on inbound shipments, and (4) open-ended questions. Background information on the facility is needed to confirm that the company has the oper- ating characteristics that are expected. This information is also helpful in ensuring that there is appropriate representation of different types of firms in the stratified sample. The data elements

Collecting Subnational Commodity Flow Data Using Establishment Surveys 37 potentially included in background information include address, revenue or sales information, type of business, and size of facility. Information on the size of the facility is particularly impor- tant as it is often used to stratify samples, expand collected data, and determine relationships between company size and shipment volumes. The size of the facility should be asked about using multiple methods to provide greater flexibility to the later statistical processes. The size can be estimated based on the number of employees, square footage of the facility, or sales/ revenue volumes. If it is anticipated that a significant proportion of interviewers will be hesitant to provide specific information, then an alternative process would be to have the interviewer accept an answer in predetermined range categories. Freight flow information on outbound shipments is the core of the survey process. The type of shipment data that will be requested from the participating companies should be determined first. The CFS asks about a fixed number of shipments over a 1-week period during four sea- sons in the year. The number of shipments requested is based on the size of the company, but is between 20 and 40 for each company surveyed. If this shipment information can be successfully replicated in subnational surveys, then the steps of data expansion, validation, and analysis in the local survey would potentially benefit significantly. However, given the effort required on the part of a company being surveyed, the lack of a mandate for companies to participate in these surveys, and the limited resources available for conducting surveys at the local level, it may be difficult to collect detailed shipment data at the local level. An alternative to the CFS approach would be to ask for data from only 1 week in the most recent season and an estimate of the degree to which shipments fluctuate in other seasons. A closely related alternative is to request estimates of annual volumes and percentages of ship- ments in each of the four seasons in order to understand seasonal variations in the data. This alternative tends to be the simplest to implement for both the surveyor and the respondent. One challenge with this approach is that origin-destination patterns may also vary during the year and providing separate origin-destination information for each season may prove too burden- some for survey respondents. If the goal of the survey is to understand peak volumes, then it may be necessary to ask directly about volumes and timing during the peak season and then pivot other volume and origin-destination information off of that response. Another implementation issue to be resolved is whether or not commodities will be pre- classified by the survey team or whether survey respondents will define commodities on their own. Preclassifying commodities tends to standardize respondents’ answers. It also increases the likelihood that respondents remember to include all of the major inputs and outputs into their facility. However, preclassification can make the survey longer as there will be a need to explain the different commodity categories and a dialogue may be needed for respondents to match their inputs and outputs to the predetermined commodities. If preclassifying commodities is selected, then the specific inputs for each industry will need to be identified in advance of the survey so that they can be incorporated into the survey ques- tionnaire. This is done with the assistance of input-output tables for each industry. Input- output tables describe the industries that serve as customers and suppliers for other industries. Table 2.8 shows a BEA input-output table created in 2002. More recent data can be purchased through proprietary economic databases. For each industry, it is typically preferable for the survey to cover 90 percent of the likely inputs and outputs for each facility, which may include between one and four commodities, depending on the industry. Open-ended questions also can be asked as part of an establishment survey. These can be tailored to current hot-button issues in the region or to issues such as the desire to participate in the long-range transportation planning process. The benefit of asking these types of questions is that they provide respondents with an opportunity to bring their concerns to public officials through the survey process, and this may make them feel that the survey will benefit them. How- ever, because establishment surveys tend to be quite lengthy, it may be desirable to not include

38 Guidebook for Developing Subnational Commodity Flow Data open-ended questions in the same effort. Furthermore, if information is collected about issues of concern to freight industries, the agency collecting the information must be prepared to act on this information and ensure that the businesses that provided the information are kept abreast of the actions being taken to address their concerns. Note that the confidentiality needs of survey participants also are a key consideration for selecting data elements. Typically, state and local agencies, similar to federal agencies, are able to ensure the confidentiality of data provided by individual respondents, even in the case of legal proceedings. Example The demonstration surveys conducted in Seattle and Spokane collected data on several data elements from the four industries surveyed. The survey data elements included the following: • Identifying information for the establishment • Square footage of the facility • Number of employees at the facility • Inbound annual tons • Inbound annual value • Inbound seasonal information • City, state, zip code, and port of entry information for the largest four commodities arriving at the establishment C om m od it y C od e For the distribution of commodities consumed by an industry, read the column for that industry. For the distribution of industries consuming a commodity, read the row for that commodity. C ro p P ro d u ct io n A n im al P ro d u ct io n Fo re st ry a n d L og gi n g Fi sh in g, H u n ti n g an d T ra p p in g S u p p or t A ct iv it ie s fo r A gr ic u lt u re a n d F or es tr y O il a n d G as E xt ra ct io n Commodity Code 1110 1120 1130 1140 1150 2110 1110 Crop products 8,063.0 14,996.4 5.4 6.7 239.7 0.0 1120 Animal products 369.0 19,635.6 70.0 0.0 366.0 0.0 1130 Forestry and logging products 0.0 0.0 12,923.6 0.0 11.8 0.0 1140 Fish and other nonfarm animals 0.0 0.0 0.0 0.0 0.0 0.0 1150 Support activities for agriculture and forestry 10,760.0 1,748.4 2,811.2 20.1 0.0 0.0 2110 Oil and gas extraction 0.0 0.0 0.0 0.0 0.0 1,989.9 2121 Coal mining 0.0 118.4 0.4 0.0 0.0 125.4 2123 Nonmetallic mineral mining and quarrying 507.0 76.0 0.0 0.0 1.5 0.0 2130 Mining support services 0.0 0.0 0.0 0.0 0.0 1,569.9 3110 Food products 0.0 13,817.9 43.5 22.7 169.3 0.0 3121 Beverage products 0.0 34.8 0.0 0.0 0.0 0.0 3122 Tobacco products 0.0 0.0 0.0 0.0 0.0 0.0 3130 Yarn, fabrics, and other textile mill products 28.5 0.0 0.0 0.0 53.6 0.0 3140 Nonapparel textile products 218.2 21.5 0.0 32.0 240.1 0.0 3150 Apparel 0.0 0.0 0.0 0.0 0.0 0.0 3160 Leather and allied products 0.0 50.2 0.0 0.0 0.0 0.0 Source: BEA. Table 2.8. BEA input-output table, 2002 (select commodities and industries, dollars in millions).

Collecting Subnational Commodity Flow Data Using Establishment Surveys 39 • Outbound annual tons and value • Outbound seasonal information • City, state, zip code and port of exit information for the largest four commodities leaving the establishment. The response rate for several data elements in this survey was calculated and is shown in Table 2.9. These response rates can be used to estimate response rates for similar questions in other survey efforts. User’s Guide Worksheet Punch List • Determine the background information to include in the survey and how the size of establishment will be captured. • Determine the shipment information to request in the survey. • Determine whether the survey will focus on outbound goods, inbound goods, or both. • Determine whether or not to preclassify commodities for the survey. • Determine the level of geographic specificity at which to survey. Refer to the geographic zones developed in Step 1 and the response rates at different levels of geography shown in Table 2.9. • List all data elements on which data will be collected. Step 6—Survey Questionnaire Key Considerations It is most effective to design a survey questionnaire that moves from the easiest to hardest questions. With this design, if the respondent decides to stop partici- pating in the middle of the survey, at least some data can be collected. Partial data can still be used in many cases. For this reason, respondents are typically asked to provide general information about the facility first. From there, the sur- vey goes on to request information regarding outbound shipments and inbound shipments, and it concludes with open-ended questions. Within the portions of the survey addressing outbound and inbound shipments, origin-destination information is generally the most complex. Therefore, it is typical to ask for commodity information first, then information about modal usage, and then origin-destination information for each commodity. Within the origin- destination questions, it is generally easiest to ask for state information, then city information (if easily available), and then about subcity geographic units, if needed. The questionnaire should be designed so that data are captured in a format that best serves the purpose of the survey. However, this aim must be balanced with the need to capture information in a form that is comfortable for survey respon- dents. Additionally, there is a tradeoff between the length of the survey and response rates. For industry-specific surveys, it is generally beneficial to have an industry expert review a draft of the survey to confirm that the questions are reasonable given the typical structure and sourcing patterns for each industry.

Metropolitan Region and Industry Item Spokane Food Spokane Fabric. Metal Seattle Apparel Seattle Trans. Equip. Total Background Information Response Rates Total Establishments Contacted 41 34 30 42 147 Percent Establishments Responding 25% 30% 33% 24% 27% Number of Establishments Responding 10 10 10 10 40 Provided Response to Revenue Information 60% 90% 30% 70% 63% Provided Response to Background Information 100% 100% 100% 100% 100% Provided Response to Size of Facility 100% 100% 90% 90% 95% Inbound Shipment Response Rates Provided Response to Inbound Annual Tons or Other Shipping Units 100% 100% 100% 100% 100% Provided Response to Inbound Value of Shipments 0% 0% 0% 0% 0% Provided Response to Inbound Seasonal Question 30% 20% 70% 90% 53% Percent of Respondents that Provided Inbound Seasonal Information 30% 10% 70% 30% 35% Provided Response to Inbound Distribution of Comm. 1 100% 100% 70% 80% 88% Provided Response to Inbound Distribution of Comm. 2 0% 30% 50% 0% 20% Provided Response to Inbound Distribution of Comm. 3 30% 10% 20% 30% 23% Provided Response to Inbound Distribution of Comm. 4 0% N/A N/A 0% 0% Provided Response to Comm. 1 Origin Attribute City State Zip Code Country Port of Entry Mode 40% 90% 70% 60% 65% 100% 100% 80% 80% 90% 0% 0% 0% 0% 0% 100% 100% 70% 80% 88% 0% 0% 60% 0% 15% 100% 100% 80% 80% 90% Provided Response to Comm. 2 Origin Attribute City State Zip Code Country 50% 60% 10% 60% 45% 80% 70% 10% 60% 55% 0% 0% 0% 0% 0% 80% 80% 10% 70% 60% Table 2.9. Response rates for individual questions in demonstration surveys.

Port of Entry Mode 0% 0% 10% 0% 3% 80% 80% 10% 60% 58% Provided Response to Comm. 3 Origin Attribute City State Zip Code Country Port of Entry Mode 10% 20% 0% 30% 15% 20% 50% 0% 40% 28% 0% 0% 0% 0% 0% 30% 50% 0% 20% 25% 0% 0% 0% 0% 0% 30% 50% 0% 40% 30% Provided Response to Comm. 4 Origin Attribute City State Zip Code Country Port of Entry Mode 30% N/A N/A 10% 20% 30% N/A N/A 10% 20% 0% N/A N/A 0% 0% 30% N/A N/A 30% 30% 0% N/A N/A 0% 0% 30% N/A N/A 30% 30% Outbound Shipment Response Rates Provided Response to Outbound Annual Tons or Other Shipping Units 100% 100% 100% 100% 100% Provided Response to Outbound Value of Shipments 0% 0% 0% 0% 0% Provided Response to Outbound Seasonal Question 0% 0% 0% 0% 0% Percent of Respondents that Provided Outbound Seasonal Information N/A N/A N/A N/A N/A Provided Response to Outbound Distribution of Comm. 1 100% 80% 90% 20% 73% Provided Response to Outbound Distribution of Comm. 2 0% 40% 30% 70% 35% Provided Response to Outbound Distribution of Comm. 3 10% 30% 0% 40% 20% Provided Response to Outbound Distribution of Comm. 4 N/A N/A N/A N/A N/A Provided Response to Comm. 1 Destination Attribute City State Zip Code Country 50% 50% 30% 0% 33% 90% 90% 70% 20% 68% 0% 0% 0% 0% 0% 80% 100% 100% 20% 75% Port of Entry Mode 10% 0% 0% 0% 3% 90% 100% 100% 20% 78% (continued on next page)

Metropolitan Region and Industry Item Spokane Food Spokane Fabric. Metal Seattle Apparel Seattle Trans. Equip. Total Provided Response to Comm. 2 Destination Attribute City State Zip Code Country Port of Entry Mode 40% 60% 10% 0% 28% 60% 80% 60% 20% 55% 0% 0% 0% 0% 0% 70% 80% 70% 20% 60% 20% 20% 0% 0% 10% 80% 90% 70% 20% 65% Provided Response to Comm. 3 Destination Attribute City State Zip Code Country Port of Entry Mode 0% 10% N/A 0% 3% 40% 40% N/A 10% 30% 0% 0% N/A 0% 0% 60% 50% N/A 10% 40% 0% 10% N/A 0% 3% 60% 50% N/A 0% 37% Table 2.9. (Continued).

Collecting Subnational Commodity Flow Data Using Establishment Surveys 43 Implementation Process As mentioned in the previous step, the more that state and DOT establishment survey ques- tionnaires can be made similar to the CFS instrument, the more the data collected in the two efforts can be used in a complementary fashion. However, there are significant differences between this national survey and surveys that would be done at the state or metropolitan level. These differences mark the key decision points for developing the questionnaire. An alternative to using the CFS questionnaire is to consider a survey structure that describes a standard set of 16 questions, based on the work of “Approach to Collecting Local Freight Information” (Thompson et al. 2010). This questionnaire was developed specifically for state and local transportation agencies in the hope of creating a standardized format that also would allow for comparison across subnational establishment surveys. This survey questionnaire begins by confirming general information about the facility, includ- ing the following: • Company name • Address of the actual site to be visited • Street number • Street • City • State • Zip • Name of person(s) to be interviewed Table 2.10 contains the description of the 16 standardized questions (based on “Approach to Collecting Local Freight Information”[Thompson et al. 2010]). The questions must be asked and recorded with a level of rigor that allows for more probing at any point in the interview, but all of the questions need to be asked and recorded to complete the local commodity flow database. We recommend reviewing the survey questionnaire in Table 2.10 to observe the similarities and differences that are possible. The survey questionnaire developed for each region should then be customized based on local freight planning needs, available resources, and current understanding of potential survey respondents’ willingness to answer key questions. It is recommended that experts within the industries being surveyed review and comment on the data elements that are being considered for the survey as well as on the entire survey ques- tionnaire. These experts can be located through trade associations, chambers of commerce, or previous local knowledge of representative companies in each industry. An alternative to using known experts is to conduct precanvassing of select companies within targeted industries to discuss the data elements and the survey questionnaire that will be used. Feedback from the precanvassing process can be used to revise the survey. This is described in greater detail in Step 7 (Conducting the Survey). Example For the demonstration establishment surveys conducted in Seattle and Spokane, the survey questionnaire had four main sections: 1. Background information about the company and surveyor—filled out in advance, but con- firmed via the survey process. 2. Size of the facility—measured in terms of square footage and number of employees. 3. Outbound shipments—tonnage and value, timeframe, seasonality, modes, and origins and destinations. 4. Inbound shipments—tonnage and value, timeframe, seasonality, modes and origins, and destination.

44 Guidebook for Developing Subnational Commodity Flow Data Question Number Topic Interviewer Instructions/Explanation Q1 Business Description Keywords used by interviewee to capture primary business activity(ies) – to be converted into single industry sector designation for commodities using the 43 categories available in the SCTG. Note other activities that generate freight. Q2 Number of Employees Current number of full- and part-time employees at the site (e.g., head count). If response is only available for multiple locations in the region, note this aggregation and make sure other data elements also are aggregates. Q3 Shipments by Mode Capture how the company receives and ships most of its goods—clearly indicating if the goods are being received (inbound) or being shipped (outbound). Q4 Deliveries Received by Mode WEEKLY Average number of deliveries received weekly. If interviewee can provide only monthly or annual numbers, convert these figures to weekly data in postprocessing procedures. Q5 Shipments Generated by Mode WEEKLY Average number of shipments generated weekly. If interviewee can only provide monthly or annual numbers, convert these figures to weekly data in postprocessing procedures. Q6 Origins of Inbound Shipments Capture major origins where shipments come directly to the site. For example, if interviewee knows origin is California, but last leg is from Dallas, record information as “California through Dallas.” Continue to probe until it is possible to determine the origins by percentages of their total activities (to sum to 100 percent). Probe for the direction for “within an MPO” or “outside the MPO” (use a compass icon to illustrate directions) and indicate any other location information (e.g., ports by name). Q7 Destinations for Outbound Shipments Capture major destinations for shipments from the company. For example, if interviewee knows destination is California, but last leg is through Dallas, record information as “California through Dallas.” Continue to probe until it is possible to determine the destinations by percentages of their total activities (to sum to 100 percent). Probe for the direction for “within an MPO” or “outside the MPO” (use a compass icon to illustrate directions) and indicate any other location information (e.g., ports by name). Q8 Size of Shipment Check for each mode used for inbound and outbound activities and whether most of these shipments are “less than full load” or “full load.” For containers, indicate size of container (e.g., 20’ or 40’). Q9 Weight of Shipment NORMAL weight of a full shipment (not including vehicle weight) inbound and/or outbound by all modes. Verify rail spur if rail is mentioned and connection to port if barge is mentioned. Otherwise, query for shipments actually arriving/ leaving by truck. Containers assumed as truck trip, but indicated separately as container on truck. Indicate weight in tons or pounds; however, only enter pounds in database (e.g., pounds x 2,000 = tons). Q10 Size of Facility Size in square feet under roof. Indicate outdoor space used as separate information. Q11 Expansion Plans Note any comments indicating plans for expansion within the next 5 years. Note the anticipated year and amount of increase in size, converting to square footage if percentage is provided. Table 2.10. Questionnaire elements.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 45 Question Number Topic Interviewer Instructions/Explanation Q12 Value of Goods Total value in dollars of goods received/shipped for most recent year. If unavailable, indicate annual sales amount as proxy for shipment value. Q13 Annual Volume of Shipments — Actual Note year and the total ANNUAL number of shipments inbound/outbound for most recent year. (Check by multiplying Q4 and Q5 by 52 and comparing weekly answers to annual). Q14 Annual Volume of Shipments — Forecasted Note the year and the ANNUAL total number of shipments the company expects 5 years in the future (or sooner). Q15 Problems at the Location Note any location or site-specific problems described by the interviewee. Probe for details, including impacts. When noting comments, be sure not to leave the impression that problems will be fixed, only that their descriptions will be forwarded to the MPO and used for transportation planning infrastructure/ operations. Q16 Problems in the Area Make note of any route or significant problems in the area. Probe to get enough information to clarify any issues mentioned. When noting comments, be sure not to leave the impression that problems will be fixed, only that their descriptions will be forwarded to the MPO and used for transportation planning infrastructure/operations. Source: Thompson et al. 2010 Table 2.10. (Continued). The survey questionnaires and the survey design also allowed for the respondent to provide information at different levels of detail. For example, if the respondent was only willing to pro- vide county detail for origins and destinations, this was collected, but if they were willing to provide city or zip code detail, this was collected. Both inbound and outbound freight flows were covered. The primary outbound and inbound commodities that were expected to be received or shipped by each industry were preclassified, and information about these commodities was requested. Respondents also were given the opportunity to identify other commodities that they shipped or received. While the identification of outbound commodities was generally fairly straightforward in the cases that were tested, these industries may purchase supplies that represent a broad range of different commodities. For the design of the survey, an input-output table (otherwise known as the “make-use” table of an input-output model) was used to identify the principal commodi- ties purchased by each of the industries that were surveyed, and this information was used in the survey design for outbound shipments. Thus, the questionnaires were customized for each industry at least in the naming of commodities in the inbound and outbound shipments section of the questionnaire. The final survey questionnaires used are located in a subtask report associ- ated with the development of the Guidebook titled “Demonstration of Application of Establish- ment Survey,” which is available at www.trb.org/Main/Blurbs/169330.aspx. Precanvassing was conducted to finalize the survey questionnaire for the demonstration sur- veys. The precanvassing was done in in-person interviews by a team member from Washington State University. A total of 11 firms were included in the precanvassing (see Table 2.11). Five of the firms were located in the Seattle region and six were located in the Spokane region. Three of the firms in Seattle were in the transportation equipment industry, and two of the Seattle firms

46 Guidebook for Developing Subnational Commodity Flow Data were in the apparel industry. Four of the Spokane firms were in the food manufacturing industry, and two were in the fabricated metal product manufacturing industry. These firms were identi- fied based on previous research team contacts. The precanvassing process led to several key conclusions regarding the full demonstration survey. All 11 interviewees were interested in the goal of the study and felt it was a worthwhile endeavor. This seemed to be borne out by response rates in the final survey. This indicates that if the survey is relatively short and focused, it may not be necessary to expend significant time and resources convincing interviewees that surveys are important or that survey data can be used for transportation planning. Interviewees in the precanvassing process thought that commodity information would be available from the demonstration survey respondents. Some of the interviewees thought that the companies should provide commodity information rather than having it identified before- hand. The interviewees were of the opinion that the identification of subregional flows could be accomplished using the questionnaire. However, they also believed that survey respondents would be providing varying levels of detail. The interviewees doubted that revenue or value data would be provided. The interviewees felt that providing shipment value information would be equivalent to providing proprietary information about the company such as rate schedules and profit margins. Nine of the 11 interviewees were somewhat concerned that too much detail was being requested. They believed that survey respondents would not have detailed data available and that only general responses would be received. This was seen as an issue particularly for the origin and destination data for both inbound and outbound shipments. Giving respondents the ability to respond for “typical” patterns and allowing them to adjust the level of geographic detail at which they were willing to respond were approaches recommended by the pre canvassed interviewees. The interviewees consistently suggested that the questionnaire script should be rephrased to request “estimates of data,” “typical years,” or “last year” shipments or attributes. Such an approach was expected to generate more complete responses, even if these would need to be considered expert estimates rather than precise data amounts. Overall, the interviewees suggested that the conversation with the phone interviewer be “dynamic” rather than scripted. This would allow for a dialogue to unfold that would generate on the origin-destination patterns of shipments and the percent allocation by location and temporal factors. These results led to some changes in the questionnaire. Specifically, different or fewer categories were offered in the revised questionnaire. Additionally, space was changed on Region Industry Number of Companies Interviewed In Person Seattle Transportation Equipment Manufacturing 3 Seattle Apparel Manufacturing 2 Spokane Food Manufacturing 4 Spokane Fabricated Metal Product Manufacturing 2 Total 11 Table 2.11. Number of companies included in precanvassing by industry.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 47 the physical questionnaire to allow the interviewer to capture information that was being offered. Also, information that was known prior to the survey (e.g., name and location of firm) was moved to an earlier part of the instrument. The final survey questionnaires used are located in a task report associated with the development of the Guidebook titled “Demonstration of Application of Establishment Survey,” which is available at www.trb.org/ Main/Blurbs/169330.aspx. The determination of whether or not precanvassing is needed in a local survey effort depends on two factors: (1) the degree of familiarity with the industry being surveyed and (2) the types of questions that are being included in the establishment survey. If transportation agencies are comfortable with their familiarity with an industry and they are asking standard questions, then precanvassing is generally not needed. However, if “out-of- the-box” questions are being considered, then a precanvassing survey would be recommended. Additionally, if a survey is being conducted on an industry that the transportation agency is not familiar with, then precanvassing also would be recommended. Precanvassing surveys such as the surveys conducted in this process are most effective when they are implemented using companies that already are familiar with the local MPOs or DOTs. These companies are most likely to take the time to assist the transportation agency, and they are more likely to provide thoughtful, complete answers than companies that are unknown prior to the survey effort. Alternatively, companies also can be identified by using state or local chambers of com- merce, industry associations, and establishment information provided from companies such as InfoUSA or Dun & Bradstreet. This implies that conducting an establishment survey should be one component of a broader private-sector freight stakeholder effort that DOTs or MPOs operate. This ongoing outreach activity will improve precanvassing efforts, provide a sounding board to confirm reasonableness of full survey efforts, and provide guidance on freight planning decisions that are made based on collected data. User’s Guide Worksheet Punch List • Develop a full survey questionnaire for the region using the data elements identified in Step 5. • Categorize these questions into general information, outbound flows, inbound flows, open-ended questions, and other relevant categories for the questionnaire. • Sort the categories of questions from easiest to hardest to respond to, based on general knowledge of the interviewees. • Develop a draft questionnaire based on fine-tuning the sorted list for any needed practical considerations. • Provide the draft questionnaire to industry experts to confirm the reasonable- ness of the data elements and the survey process. Use a formal precanvassing effort, if needed. • Edit and finalize the survey based on feedback from industry experts. • Document potential changes in the region’s freight stakeholder outreach based on the experience of developing the survey.

48 Guidebook for Developing Subnational Commodity Flow Data Step 7—Conducting the Survey Key Considerations With the survey questionnaire finalized and potential companies to interview determined, the next step is conducting the survey. One of the key consider- ations in this process is the considerable effort involved in identifying and con- firming a time to interview a specific individual. Another key consideration is that surveyors should have some background in freight transportation to success- fully implement the survey. This familiarity assists in the dynamic conversation that is needed to extract information that will support the statistical analysis of survey data and generalized freight planning efforts. Surveyors should briefly research each company prior to the survey to assist in this effort. Implementation Process The first step is to identify the correct individual at each company to survey. This can some- times be determined from a vendor list or recommendations from trade associations. Often, they can provide an initial point of contact. Large firms may have a logistics specialist, while smaller firms may have plant managers with sufficient knowledge. Surveyors should main- tain a log of each of the companies contacted and individuals spoken to, with summaries of each conversation and next steps. All of these items should be date and time stamped, so that follow-up can be done in an appropriate amount of time. It is not uncommon for establish- ment surveys to have response rates of 10 to 25 percent, so many potential surveys will turn out to be dead ends. Campaigns to encourage participation carried out with local trade associations and the Chamber of Commerce could help improve response rates. If the state or MPO has a freight advisory council, then they also can be used to spread the word about the survey and encour- age participation by target companies. Another option would be taking a freight neighborhood approach. This kind of approach would make it possible to let freight “neighbors” know who else has already participated. In this approach, those who have completed the process are asked if they would be willing to encourage neighbors to participate by allowing their participation to be part of the initial contact information. This strategy may work best with a mix of industry types to reduce any concerns with competition among similar firms. At least two hard copies of the questionnaire should be available, one for the interviewer and one for the survey participant (provided only if requested, however, as it could be a distrac- tion). At the beginning of the interview, the interviewer should reconfirm the time needed to complete the data collection and remind the participant that all the information will be held in confidence. Business cards should be offered to persons met during the interviewer’s visit. It is not advisable for an interviewer to leave partially completed forms and expect firm employees to fill in and return information. Due to the flexibility required to capture information from establishment surveys, they are typically done with pen and paper. However, even if the survey is conducted electronically, hard copies of the questionnaire should still be made available to the survey participant. The interviewing process is intended to build trust and confidence, but should not result in unrealistic expectations of how the information collected will be used. This can especially be an issue with any open-ended survey questions that ask participants to list problems and issues. The

Collecting Subnational Commodity Flow Data Using Establishment Surveys 49 interviewer needs to confirm that this information will be submitted as input to the long-range transportation planning process for future consideration. The interviewer should thank the person(s) participating in the interview and encourage them to call at any time if they want to offer additional information. As soon as possible, review the data collected to ensure that the information was correctly written/typed and the appropriate units were recorded (e.g., weekly or annual totals). As a final outreach, staff at the transportation agency should either mail a note of thanks for participation or send an e-mail within a week of the interview. Example Graduate students at Washington State University were trained to administer the demonstra- tion survey in Seattle and Spokane. Surveyors briefly summarized the purpose of the survey, the transportation agency that was implementing the survey, what the data would be used for, and the privacy controls on the data that would be collected. Additionally, surveyors were physically located such that respondents could call them back through a general number at the university to confirm that the survey was being conducted by the party stated. When surveys are conducted by outside consulting firms, then the sponsoring transportation agency contact and contact information should be readily provided to the respondent. The surveyors were encouraged to check the web sites of the selected firms before making their initial call. Interviewer knowledge of the firm characteristics information, prior to the phone call, was expected to make the phone survey more effective and efficient in terms of increasing response rates and receiving more precise responses. The contact information provided in the two sampling frames (Dun & Bradstreet and InfoUSA) for establishments was found to be very accurate. Only 2 out of roughly 120 contacts had incorrect phone numbers. However, there were some cases in which area codes were not provided and had to be researched online. It was generally found that each company’s reception- ist was able to identify the correct person to respond to the survey, even though it often took multiple attempts to speak to this individual. Tracking down and receiving approval from this individual is the key determinant in the response rate for the overall survey. The introductory conversations generally began with the following: Hello, my name is ____________ and I’m a student at Washington State University, in Pullman, WA. We are working with the U.S. Department of Transportation to survey businesses in the Pacific NW in regards to their use of the transportation infrastructure. In particular, we are interested in your com- pany’s annual shipments into and out of facilities and the locations where shipments originate and final destinations. Is there someone within your business that I can talk to in regards to this study? Initially, no time length was mentioned in the introduction; however, once the survey began it became clear that without some stated bounds on the time required to complete the survey, respondents were not willing to participate. Thus, the students began stating that the survey would take 5 minutes or less. Approximately 15 percent of survey respondents ended up taking more than 5 minutes, but respondents were relieved if the 5-minute time limit was maintained. Given that the propri- etary establishment lists also contained information on company web sites, the students were asked to spend a few minutes browsing the company web site prior to calling to familiarize themselves with the company and the freight activities that this type of business was likely to conduct. This additional preparation increased survey labor time, but it also improved the stu- dents’ ability to understand who they were calling and target their questions toward collecting the desired information.

50 Guidebook for Developing Subnational Commodity Flow Data In all cases, during the survey, respondents were asked if they shipped or received any of the preclassified commodities, but the respondents were also given the option to name commodities they ship and receive using their own terminology. In this demonstration survey, there was a trend of smaller companies being more responsive to the survey process both in terms of overall responsiveness and specific responses to individual questions. Many of the larger firms stated that they were too busy to respond to the survey or that they needed approval from staff at corporate headquarters that was not colocated within the establishment. The bureaucracy of larger organizations caused them to have a lower response rate than the smaller firms. At smaller firms, individuals felt more empowered to provide this information to the surveyor. The smaller companies also seemed more confident in the responses that they did provide. This is probably related to it being easier for a single person to understand shipping practice in full at a smaller firm than it is at a larger firm. At larger firms, shipments are likely to be managed by a team of people with expert knowledge of only the shipment types under their purview. The lower response rate of larger firms underscores the need to incorporate an establishment survey process as part of a larger freight stakeholder outreach effort. The larger companies will likely need more time to approve survey participation, and this approval is more likely to occur if they have worked with the transportation agency extensively in previous efforts and if their participation is likely to impact actual decisions that are made by the agency. Having senior-level executives serve on an ongoing freight advisory council can accelerate this approval process and allow for transmittal of more detailed shipment information. It is difficult to obtain detailed origin/destination information from larger companies. Many of the larger firm respondents indicated they ship or receive from “all over the United States,” and attempting to ask for their top three was problematic. Smaller firms had origin-destination information more readily available and seemed to be more confident in the accuracy of the information that they were providing. The students’ affiliation with the local university seemed to make respondents much more willing to cooperate, since many expressed some association themselves (either a graduate or had children attending there). A private phone survey company may have had lower participa- tion and response rates. When companies refused to provide a value of shipments, the interviewers were able to adjust and ask for company revenue information. Initially, companies were unwilling to provide this, but by offering a broad range of revenue categories (0 to 5 million dollars, 5 to 10 million dollars, 10 to 20 million dollars . . . ), it was possible to obtain revenue information from approximately 63 percent of respondents. It can be challenging to provide categories of inbound or outbound shipments for some industries. For the Seattle transportation equipment industry, the outputs were simply too many to mention or categorize. These ultimately were labeled “aerospace parts,” even though they may have been electronic assemblies, mechanical components, commercial, military, etc. As mentioned above, collecting origin-destination information at the state level is the most easily accessible, at the city level it is somewhat less accessible, and zip code level information is nearly impossible to capture using this survey approach. Outbound data were generally more readily available than inbound data. This confirmed that requests for outbound data should be made first in establishment surveys. However, the availability of data depends in large part on the structure of the industry and the complexity of

Collecting Subnational Commodity Flow Data Using Establishment Surveys 51 individual companies’ supply chains. Therefore, as part of the precanvassing effort, companies should be asked which portion of the supply chain is easier for the industry to provide. User’s Guide Worksheet Punch List • Begin conducting the survey. Track the responses of all companies contacted, including failed surveys. • For successfully scheduled surveys, allow for surveyors to make additional interest- ing notes that can be reviewed in the near term and long term. • Following the successful completion of 10 percent of the survey in terms of either desired sample or completed responses, review the collected data to confirm that it includes the information most important to support the region’s freight planning efforts. This includes the need to expand the data to represent the full region. • Complete survey. Step 8—Database Assembly Key Considerations It should be expected that a significant amount of effort will be needed to process the information that is collected in a single database. The survey can be designed to capture information in a way that is easiest for the surveyor to process. However, many of the respondents will provide information in a fash- ion that is inconsistent with the survey structure. This information will require postsurvey calculations by the surveyor in order to fit into the survey structure. Expansion variables should be determined prior to survey implementation and postprocessing to ensure that collected data are sufficient to support the data expansion process. Implementation Process This implementation consists of entering the collected data into a database format. Ideally, the data entry is done by the staff that conducted the survey to better interpret the results. The field names and data relationships need to follow a formal system to allow for statistical analy- sis of results in later steps. However, there should be flexibility regarding some of the entries, including comment fields that allow the database to retain nuances captured by the surveyor that do not easily fit into a quantitative format. There will likely be several instances of needing to translate units into a consistent format. For example, respondents may respond in terms of tons, containers, or pallets, and this will need to be converted into a consistent format across the entire survey. Additionally, the commodity information is likely to be captured in varying levels of detail, particularly if the survey participant is given the option to self-describe the goods that are shipped to and from the location. The survey database should preserve the original response of each participant, but it should also convert the response into the classification scheme that will be used for future analysis. It may be possible to develop a “seed” file in the database that could be made available for easy data entry and report production. An example of a similar tool is the Land-Based Classification Standards (LBCS). (See www.planning.org/lbcs/ implementation.)

52 Guidebook for Developing Subnational Commodity Flow Data There will need to be a quality control check on the data to confirm that the entries are read- able and reasonable. This quality control check should include a review of origin-destination information for consistency with logical shipping patterns. Additionally, there should be a check of the shipping units. Problems that can occur in terms of volumes include recording a response in tons that should be in pounds and developing the right conversion factor for containers to a weight-based unit. In select instances, follow-up with survey respondents may be necessary to clarify entries. Alternatively, a small portion of the data may need to be removed from the larger data set if it fails the quality control check. Example In the Spokane and Seattle demonstration surveys, responses were recorded using pencil and paper. The collected data were input into a Microsoft Excel spreadsheet. A review of the data identified that there were no issues with reasonableness or consistency. The commodity infor- mation was captured at several different levels. The raw responses were captured in the spread- sheet. Additionally, each response was classified into the two-digit SCTG code consistent with the CFS commodity classification. A snippet of the raw spreadsheet database is shown in Table 2.12. This table shows the data elements that were asked about for general information on the company and inbound ship- ments. Additional information included in the spreadsheet is the modal percentage for each commodity and the origin information for Commodities 2 to 4. All of this data was also collected for outbound shipments. The full data entry spreadsheet can be found in a subtask report associ- ated with the development of the Guidebook titled “Demonstration of Application of Establish- ment Survey,” which is available at www.trb.org/Main/Blurbs/169330.aspx. User’s Guide Worksheet Punch List • Input raw data into an electronic format. • Check raw data responses for reasonableness. • Refine data by making any necessary conversions such that all entries can be analyzed at the same level of commodity and geographic detail. Background Information Inbound Shipments Company C om p an y N am e A d d re ss C it y S ta te Z ip N am e D at e of S u rv ey M et h od N u m b er o f E m p lo ye es A n n u al T on s O th er S h ip p in g U n it s V al u e S ea so n al P ea k s (Y /N )? T im in g of P ea k s C om m . 1 C om m . 1 % C om m . 2 C om m . 2 % C om m . 3 C om m . 3 % C om m . 4 C om m . 4 % C 1 – C it y C 1 – S ta te C 1 – Z ip E tc … Co. 1 Co. 2 Etc… Table 2.12. Raw survey database for Seattle and Spokane Demonstration Survey (select fields).

Collecting Subnational Commodity Flow Data Using Establishment Surveys 53 Step 9—Data Expansion Key Considerations The key to data expansion is identifying the appropriate indicator variable and estimating the correct control totals. Typically, employment is used as the expan- sion (or indicator) variable because employment data tend to be relatively easy to collect through surveys. Additionally, employment data are available or can be estimated for various regions or subregions of concern within the study area. Implementation Process and Example Since only a sample of establishments is surveyed, it is necessary to develop statistical weights to expand the sample data to reflect the characteristics of the entire population of establishments for each industry. The key step in this process is to determine an appropriate expansion (or indi- cator) variable. The most straightforward expansion variable is based on information already contained in the establishment database from which the survey sample was drawn. Sample expansion variables can be based on number of employees, amount of output, or square footage size of the establishment. Employment is the most commonly used expansion variable due to its ease of collection during a survey and availability for various geographies and industries across the country. Using data on total employment by industry from sources such as CBP to establish the control total or population estimate, it is possible to estimate the fraction of total employ- ment in an industry the sample represents. The fraction of total employment captured in the establishments surveyed represents the sampling fraction. By taking the reciprocal of the sam- pling fraction and multiplying this factor by whatever variable is being estimated (for example, total tons shipped), it is possible to get an estimate of the totals for the region. Employment at individual companies can be inquired about via the survey process. It can also be cross-referenced with information included in establishment databases. In establishment databases, the information is provided in ranges; the midpoint for each establishment range should be used to estimate the total employment in the industry and the total employment of the companies included in the survey. Expansion factors can be developed by dividing the indicator variable amount in the control total to the level of the indicator variable in each of the geographic regions in the survey study area. This needs to be done for each industry that is included in the survey. For example, if the total employment in the metropolitan Seattle transportation equipment industry was found to be 10,000 and the employment of the companies surveyed in the trans- portation equipment industry was 1,000, then all of the data collected for this industry would be multiplied by 10 to develop estimates of freight flow patterns for transportation equipment. It also is possible to use external employment data to conduct an expansion. For example, if the Census Bureau’s Economic Census estimates that there are 20,000 employees in the trans- portation industry in Seattle, then it may be deemed appropriate to expand the data by multi- plying by 20 rather than 10. However, it is generally preferable to utilize an expansion variable that was obtained from the original survey sample data to avoid circumstances where different data estimation processes can generate significantly different estimates. One method for refining the employee expansion process during the survey is to ask survey respondents how their level of productivity (in terms of revenue or tonnage per employee) compares to other companies in their field and in the survey region. Additionally, survey respondents can be asked what they think the average level of productivity is in their industry. This will enable the development of specific local employment estimates from local sources.

54 Guidebook for Developing Subnational Commodity Flow Data Another option for survey expansion variables would be the information provided in freight flow databases such as the BTS CFS. The CFS provides shipment values at the metro politan level across two-digit commodities and several freight modes. In the previous example, the commodity-specific tonnage totals for the Seattle region can be considered the control total. The tonnage collected through the survey can be expanded based on the proportion of tonnage surveyed relative to the CFS total. Therefore, if the CFS estimates 40 million tons of transportation equipment shipped outbound and the survey estimates 1 million tons of transportation equipment shipped outbound, then an expansion factor of 40 can be applied to the surveyed data to estimate freight flow patterns of transportation equipment in the Seattle region. It should be noted that this process of using indicator variables is limited by three factors: • The statistical correlation between the indicator variable and the amount of freight used. • The detail available for indicator variables at the subregional level (e.g., if apparel data is avail- able at the zip code level). • The accuracy of the indicator variable at the subregional level (e.g., some databases have esti- mates of these variables that are developed from other estimation processes). Because of these limitations, the expanded data should be checked to confirm that the expan- sion process produced results that can be used for planning purposes. The CFS can be considered one source to validate the reasonableness of the results. The freight flows in each of the CFS regions can be compared to the expanded data. Members of trade associations for related indus- tries also can be used to check the results for reasonableness. User’s Guide Worksheet Punch List • Determine the indicator variable for the survey. • Determine the control total for the indicator variable for each geographic zone and each industry within the study area. • Determine the expansion factors for the study area by dividing the value of the indicator variable by the control total for each geographic zone. • Confirm reasonableness of expanded data. Step 10—Data Validation and Accuracy Key Considerations Data validation should be done by comparing the expanded survey data to other freight data sources in the region. Options include the CFS, local trade associa- tion data, and economic output data. In select instances, truck count data or roadside intercept survey data may be used to confirm local truck volumes and generalized commodity distributions estimated through the establishment survey process. Discrepancies between the expanded survey and these sources indicate that additional surveys may be needed to fully validate the local survey process. Data accuracy is measurable through detailed statistical analysis of several of the variables col- lected in the survey process. There is a tradeoff between the number of samples collected in each geographic zone for each industry and the accuracy of the freight flow estimates.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 55 Implementation Process For establishment surveys using simple random sampling, all potential companies have an equal probability of being selected to be part of the sample. This sampling method creates an element of error due to the very randomness of the way in which the sample is chosen. The selected sample may result in sample data that are not necessarily representative of the whole population of interest, and this, in turn, results in inaccuracy in the estimates developed through the establishment survey process. Equations 7 and 8 show how to assess the accuracy (or precision) (D) of a given sample size and confidence level using two steps. First, the standard error (SE) of the mean (x) is calculated as a function of sample size, population, and standard deviation. The relative or absolute accuracy is then calculated as a function of the standard error, and the desired confidence level is reflected in the value of the z-statistic. ( ) ( )= σ ∗ −Standard Error: (Eq. 7) 2 SE x n N n N where s2 = population variance, n = sample population, and N = total population. Precision: (Eq. 8)D SE x z)(= ∗ Similarly, Equations 7 and 9 show how to calculate in two steps the confidence level that cor- responds to a given sample size and the desired degree of variable accuracy. First, the standard error of the mean is calculated as a function of sample size, population, and standard deviation. The confidence level (as reflected in the value of the z-statistic) is then calculated as a function of the relative or absolute precision and the standard error. The confidence level associated with various z-statistics is shown in Table 2.13. ( ) ( )= σ ∗ −Standard Error: (Eq. 7) 2 SE x n N n N -statistic: (Eq. 9)z z D SE x )(= z-statistic Confidence Level 0.67 50% 0.84 60% 1.04 70% 1.28 80% 1.64 90% 1.96 95% 2.58 99% 3.29 99.9% Table 2.13. Correlation of z-statistic and confidence levels for a given sample size.

56 Guidebook for Developing Subnational Commodity Flow Data For full establishment survey efforts, it is possible to statistically compare the data collected from companies in different strata to determine whether shipping characteristics are related to other variables. For example, the relationship between employment and output for small and large companies in Seattle’s apparel industry could be compared. Similarly, this relationship could be tested in Spokane’s food manufacturing industry in the eastern side of town and the western side of town. These types of comparisons can also provide clues as to the types of future survey efforts that are most critical for a region. Another type of test that can be done identifies whether different industries have similar truck trip generation characteristics. Industries with similar truck trip generation ratios may be candidates for consolidation. Conversely, within a single industry, if truck trip generation is found to have a large standard deviation, then further examination may be needed to determine whether there are other factors that need to be incorporated into the development of a truck trip generation function. As sample size increases, the margin of error for collected data decreases and the confidence interval for collected data narrows. The percent margin of error at the 95 percent confidence level can be calculated at various sample sizes. This relationship can be plotted to illustrate how mar- gin of error decreases as the sample size increases. The generalized relationship between sample size and margin of error for variables that are normally distributed is shown in Figure 2.5. At the 95 percent confidence level, there will be a 14 percent margin of error when 50 samples are taken. When 250 samples are taken, the margin of error in the estimate decreases to 6 percent. It is important to note that the sample size represents the number of completed survey responses for the specific variable not for the overall survey. Therefore, different variables that are estimated through the survey will have different levels of accuracy. Example An example of calculating accuracy and sample sizes is useful in illustrating the theory described in the previous section. This Guidebook cannot use the demonstration survey results for this example due to privacy guarantees given to respondents during the survey process. Therefore, a hypothetical example of an establishment survey of the paper products industry in the Seattle region will be used. For purposes of this analysis, Seattle will be divided into two regions: West Seattle and East Seattle. The data collected in this hypothetical example are shown in Table 2.14. 31% 14% 10% 8% 7% 6% 6% 5% 5% 5% 4% 26% 12% 8% 7% 6% 5% 5% 4% 4% 4% 4% 0% 5% 10% 15% 20% 25% 30% 35% 10 50 100 150 200 250 300 350 400 450 500 M ar gi n of E rr or Sample Size n 95% Confidence Level 90% Confidence Level Figure 2.5. Relationship between sample size and margin of error.

Collecting Subnational Commodity Flow Data Using Establishment Surveys 57 To determine the accuracy of the estimate of the total tons of paper products shipped from the Seattle region, the first step is to calculate the mean and the standard deviation of this value for the data collected, using the data in the far right column of Table 2.14. The mean is estimated using the following formula: Mean = + + + + +( ) =1 000 1 500 2 000 7 000 8 000 10, , , . . . , , 4 150, tons Standard Deviation 1,000 4,150 1,500 4,150 . . . 7,000 4,150 8,000 4,150 2,461 tons 2 2 2 2 ) ) ) ) ( ( ( ( = − + − + + − + − = The next step is to calculate the standard error using the following formula: Standard Error 2,461 square root 10 778)( )(= = Using a confidence level of 90 percent gives us a z-statistic of 1.64. This is used to calculate the precision of the estimate using the following formula: = = ∗ =DPrecision 778 1.64 1,275.9 Therefore, the average number of tons produced by these 10 firms is 4,150, while the precision of the estimate of this average is +/-1,275.9 tons. This precision can be applied to the total tons estimated to be produced from this sample as well. In this case, the total tonnage estimated by the sample is 41,500 tons and the precision of this estimate with a 90 percent confidence interval is +/-1,275.9 tons. The precision value can be applied to the fully expanded data as well. To determine the accuracy of the estimate of the total tons of paper products shipped from the Western Seattle region to the Eastern Seattle region in this hypothetic example involves the same formulas, but different values within the formulas. The mean is estimated using the following formula: Mean tons= + + + +( ) =100 150 200 250 300 5 200 Standard Deviation 100 200 150 200 200 200 250 200 300 200 79.1 tons 2 2 2 2 2) ) ) ) )( ( ( ( (= − + − + − + − + − = The next step is to calculate the standard error using the following formula: Standard Error square root 5= ( )( ) =79 1 35 3. . Survey ID Tons Shipped to West Seattle Tons Shipped to East Seattle Total Tons Shipped West Seattle Company 1 10 100 1,000 West Seattle Company 2 20 150 1,500 West Seattle Company 3 30 200 2,000 West Seattle Company 4 40 250 2,500 West Seattle Company 5 50 300 3,000 East Seattle Company 1 25 700 5,000 East Seattle Company 2 35 750 5,500 East Seattle Company 3 45 800 6,000 East Seattle Company 4 55 850 7,000 East Seattle Company 5 65 900 8,000 Table 2.14. Data from hypothetical survey of the paper products industry in Seattle.

58 Guidebook for Developing Subnational Commodity Flow Data Using a confidence level of 90 percent gives us a z-statistic of 1.64. This is used to calculate the precision of the estimate using the following formula: Precesion = = ∗ =D 35 3 1 64 58 0. . . Therefore, the total tons estimated to be produced from this sample is 1,000 and the precision of this estimate with a 90 percent confidence interval is +/-58.0 tons. The precision value also can be applied to the fully expanded data. The formulas described in this section also can be used on the collected data to determine the sample size needed to achieve a specific confidence level for specific types of origin-destination combinations. Using this process, future surveys can target specific data elements within surveys for which confidence levels and precision are desired to be increased. User’s Guide Worksheet Punch List • For each industry, estimate the mean, standard deviation, and standard error of tons for the companies included in the survey. • Calculate the precision of the estimate of the tons produced based on the z-statistics and the desired confidence level. 2.3 Next Steps The information presented in this chapter was designed to serve multiple purposes depending on where transportation agencies are in terms of considering development of an establishment survey. To read a description of collecting subnational commodity flow data using roadside surveys, proceed to Chapter 3.0. To identify the best next steps for your specific effort, refer to Chapter 6.0, “the Playbook.”

Next: Chapter 3.0 - Collecting Subnational Commodity Flow Data Using Roadside Truck Intercept Surveys »
Guidebook for Developing Subnational Commodity Flow Data Get This Book
×
 Guidebook for Developing Subnational Commodity Flow Data
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s National Cooperative Freight Research Program (NCFRP) Report 26: Guidebook for Developing Subnational Commodity Flow Data explores how state departments of transportation and other subnational agencies can obtain and compile commodity flow data.

The Guidebook contains descriptions of existing public and private commodity flow data; standard procedures for compiling local, regional, state, and corridor databases from these commodity flow data sources; procedures and methodologies for conducting subnational commodity flow surveys and studies; and methods for using commodity flow data in local, regional, state, and corridor practice.

In addition to the Guidebook, two subtask reports from NCFRP Project 20--Review of Subnational Commodity Flow Data Development Efforts and National Freight-Related Data Sets and Demonstration of Application of Establishment Survey--are available only in electronic format.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!