**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

*Using Commodity Flow Survey Microdata and Other Establishment Data to Estimate the Generation of Freight, Freight Trips, and Service Trips: Guidebook*. Washington, DC: The National Academies Press. doi: 10.17226/24602.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

*Using Commodity Flow Survey Microdata and Other Establishment Data to Estimate the Generation of Freight, Freight Trips, and Service Trips: Guidebook*. Washington, DC: The National Academies Press. doi: 10.17226/24602.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

*Using Commodity Flow Survey Microdata and Other Establishment Data to Estimate the Generation of Freight, Freight Trips, and Service Trips: Guidebook*. Washington, DC: The National Academies Press. doi: 10.17226/24602.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

**Suggested Citation:**"Chapter 6 - Modeling Methodology." National Academies of Sciences, Engineering, and Medicine. 2016.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

22 This chapter provides a succinct overview of the models used in the guidebook, the data required, and the aggregation procedures that could be used to obtain estimates of FSA for conglomerates of commercial establishments (e.g., city blocks, TAZs). Limitations are also discussed. Model Typology To facilitate their use in data-constrained application environments, the guidebook models are designed to be as simple as practically possible, using employment as the sole variable. The guidebook models capture a wide range of FSA patterns. The first group of models is econometric in nature; these models were estimated using statistical techniques to ensure that the resulting parameters meet a minimum threshold of significance. The second group of models present FTG rates as a function of employment and are not statistically estimated. Econometric Models These models express FSA as a statistical function of employment. The estimation process typically starts with a general form that is sequentially reduced by eliminating the parameters that are not significant or are not conceptually valid. At the end of the process, statistical assurances can be found that show that the final model is statistically acceptable. The econometric models in the guidebook are statistically significant and conceptually valid. The models fall into two major families: linear and non-linear. The functional forms of the models are described next. Linear Models (Model Types C, ER, and C-ER) The linear models used in the guidebook are variants of Equation 1: f Ei i (1)= Î± + Î² where: fi = FSA metric for establishment i, and Ei = Employment at establishment i. Three distinct types of linear models are used in the guidebook: â¢ Constant (C): This model type is used for situations in which the FSA metric does not depend on the establishment employment. Model type C is represented by Equation 2: fi (2)= Î± C h a p t e r 6 Modeling Methodology

Modeling Methodology 23 â¢ Employment Rate (ER): This model type is used for situations in which the FSA metric bears a direct relation to establishment employment. Model type ER is represented by Equation 3: f Ei i (3)= Î² â¢ Constant and Employment Rate (C-ER): This model type is used for situations in which the industry sectors exhibit FSA with both a constant and an employment term. Model type C-ER is represented by Equation 4: f Ei i (4)= Î± + Î² Non-Linear Models (Model Type P) Many potential non-linear forms exist; however, the research team decided to use a power function of the kind shown in Equation 5 because (1) it is very flexible and able to accommodate various data patterns, and (2) it is consistent with the Economic Order Quantity (EOQ) model, which is a good approximation for the ordering behavior of commercial establishments. It is worth mentioning that the EOQ model predicts that, irrespective of transportation and inventory costs, the frequency of orders (essentially, the FTG) is a function of the square root of the demand. This result provides a theoretical value that could be used to check the validity of the models estimated in the guidebook. â¢ Power function (P): This model type is used for situations in which the FSA in question increases as a power function of the establishment employment. (5)f Ei i= Ï Î³ Figure 3 provides a graphic representation of the model types discussed in this chapter. As shown in Chapter 7, no single model type is always the best (though model type P most frequently provided the best fit). Use of any specific form could lead to major estimation errors if it is applied to an industry sector with a different FSA pattern. For instance, if an analyst uses FTG rate per employment to estimate the FTG of establishments for which the actual FSA pattern follows the C pattern, it will underestimate the FTG for small establish- ments and overestimate the FSA for large establishments. (See HolguÃn-Veras et al. 2011 for an in-depth discussion.) Measure of freight or service activity (FSA) Establishment employment Power function (P) Constant and Employment Rate (C-ER) Model type ER Model type C Figure 3. Econometric forms estimated in the guidebook.

24 Using Commodity Flow Survey Microdata and Other establishment Data to estimate the Generation of Freight, Freight trips, and Service trips FTG Rates by Employment Bin A modified version of model type C-ER mitigates the problem caused by the use of a constant FTG rate. This model computes FTG rates by employment bins (EB). The research team chose to label this model type ER-EB. ER-EB models are discontinuous models with bins, which could be linear or non-linear. The example shown is the linear model. Model Type ER-EB Mathematically, model type ER-EB is expressed by Equation 6: f Ei li i (6)= Î² where the parameter bli corresponds to the employment bin (l) for establishment i and Ei cor- responds to employment at establishment i. The computation of the FTG rates by EB leads, however, to the undesirable characteristic of discontinuities in the FSA estimates at the points where the FTG rates change values. Not accounting for these discontinuities could lead to unreasonable results (see Figure 4). At the point of discontinuity between Bin 1 and Bin 2, the FSA can take two values. It should be clearly defined to which bin the point should be assigned. Aggregation Procedures In most applications, transportation professionals are interested in estimates of FSA for conglomerates of users, such as those within a city block, a corridor, a ZIP code, or a TAZ. For instance, a city engineer may be interested in determining the parking spaces needed to accom- modate deliveries to a downtown city block. The engineer needs to estimate the total number of deliveries generated by that city block. Obtaining these aggregate estimates requires the use of an aggregation procedure that uses the disaggregate-level estimates to compute the aggregate values of the FSA metric. This section of the guidebook provides a summary of key aggregation procedures, their advantages and disadvantages, and their recommended uses. (For a complete discussion see HolguÃn-Veras et al. 2011.) The techniques discussed are classified into two major cases: (1) scenarios in which establishment-level data are available, and (2) scenarios in which only aggregate data are available. Case 1: Establishment-Level Data Are Available Establishment-level data to estimate the models are available, and could be used to estimate the levels of FSA for the establishments in the data. Measure of freight or service activity (FSA) Establishment employment Bin 3Bin 1 Bin 2 Figure 4. FTG rates by employment bin.

Modeling Methodology 25 Complete Enumeration Recommended use: Complete enumeration is well suited to cases in which data are available for all establishments of interest and the number of establishments is relatively small. This technique is applicable to both linear and non-linear models. This is probably the simplest of all aggregation procedures. It only requires (1) applying the model that corresponds to the establishmentâs industry sector to obtain the establishment- level values; and (2) adding up the establishment-level estimates of FSA. Mathematically, the aggregated FSA, F, equals the summation of the values for the different establishments, fi, as expressed in Equation 7: F fi i n (7) 1 â= = where: F = Aggregate generation of a metric of FSA, fi = FSA metric for establishment i. These calculations are straightforward and could be performed with a typical spreadsheet. Sample Estimation Recommended use: Sample estimation is useful when (1) complete data are not available for the establishments in the study area or (2) the number of establish- ments is very large. This technique is applicable to both linear and non-linear models. In this technique, a sample of establishmentsârepresenting the population of establishments under studyâis used to estimate the average value of FSA being studied. Once this average value is determined, the aggregate number is obtained by multiplying the average by the total number of establishments. Mathematically, the sample estimation technique is expressed in Equations 8 and 9: f f n i i n (8)1 â = = F Nf (9)= where: F = Aggregate generation of a FSA metric, fi = FSA metric for establishment i, f â = Average FSA for small sample, and N = Total number of establishments in study area.

26 Using Commodity Flow Survey Microdata and Other establishment Data to estimate the Generation of Freight, Freight trips, and Service trips The accuracy of this procedure depends on how representative the small sample is. If the sample is representative of the population, the procedure will provide solid estimates of the overall level of FSA. Case 2: Only Aggregate Data Are Available Recommended use: The following techniques are useful when (1) aggregate data are available by industry sector and (2) the establishment-level models are linear. This technique only applies to linear models. The aggregation procedures discussed in this section work if the establishment-level data are available. In some situations, however, only aggregated data are available. This may occur if the data involve: â¢ Official employment statistics. Most publicly available economic data are released at an aggregate level for confidentiality reasons. Employment data released by the Census Bureau, for example, are disclosed as aggregated by ZIP code or county (U.S. Census Bureau 2013). â¢ Planning forecasts at the MPO level. For practical reasons, most forecasts of economic activity are made at the aggregate level (e.g., by TAZ). In most cases, it is not technically possible to produce disaggregated forecasts at the establishment level. Fortunately, the linear models in this guidebook can be used to compute the aggregate FSA if the aggregate data contain the number of establishments and total employment by industry sector. In this case, the resulting estimates of FSA will be exactly the same as if they were produced with the disaggregated data. Each model type requires a different aggregation procedure. In all cases, the aggregation procedure must correspond to the chosen disaggregate model (see HolguÃn-Veras et al. 2011). Because the models in NCFRP Research Report 37 are by industry sector, the aggregation procedure must be performed industry sector by industry sector. The easiest way to illustrate the relationship of the aggregation procedure to the model type is to obtain the aggregation formulas. Constant (Model Type C) In this case, the metric of FSA at the establishment level is constant. Mathematically, this relationship can be expressed by Equation 10: fi (10)= Î± Substituting a from Equation 10 for fi in Equation 7 and taking a out of the summation term yields Equation 11: (11) 1 F n i n â= Î± = Î± = Equation 11 implies that, if the FSA at the establishment level is constant, the correct way to estimate the aggregate metric FSA is to multiply the unit value of FSA, a, by the number of establishments. Employment Rate (Model Type ER) This case can be represented mathematically as Equation 12, where b is a constant FTG rate per employee: f Ei i (12)= Î²

Modeling Methodology 27 Substituting bEi from Equation 12 for fi in Equation 7 and taking b out of the summation term yields Equation 13: (13) 1 1 F E E Ei i n i i n â â= Î² = Î² = Î² = = p Equation 13 shows that, if the underlying FSA pattern is proportional to employment, the total FSA could be obtained as the multiplication of the total employment (Ep) times the FSA rate. FTG Rates by Employment Bin (Model Type ER-EB) In this case, the aggregation formula is adapted as shown in Equation 14: (14) 1 F El l i L â= Î² = p where: bl is the ER for employment bin l, L is the number of bins, and E lp is the total employment in employment bin l. It should be noted that the total estimate of FSA may be affected by the discontinuities illustrated in Figure 4. Constant and Employment Rate (Model Type C-ER) In this case, the FSA at the establishment level has an intercept and a term that depends on employment. Mathematically, this relationship can be expressed by Equation 15: f Ei i (15)= Î± + Î² The total is then: (16) 1 1 F E n E n Ei i n i i n iâ â( )= Î± + Î² = Î± + Î² = Î± + Î² â = p The correct way to obtain the total FSA is to multiply the total number of establishments by the constant FSA term plus the total employment times the FSA rate. Model type C-ER is thus a mix of model types C and ER. Data Used The models in NCFRP Research Report 37 enjoy significantly more substantial empirical support than those presented in its predecessor, NCHRP Report 739/NCFRP Report 19 (HolguÃn-Veras et al. 2012). Since the publication of the earlier report, the research team collected data and secured access to additional data sources, most notably the CFS microdata. The data used to produce this guidebook come from (1) the 2007 CFS microdata, (2) an estab- lishment survey conducted by the Hartgen Group, and (3) establishment surveys conducted by the research team. The data used are summarized in Table 4. The 2007 CFS microdata only contain FP data on account of its shipper-based nature. The 2008 survey from the Hartgen Group collected data about shipments received and sent out from 1,000 establishments in the United States. Finally, the RPI establishment surveys include data from about 1,400 observations about FA and FP, FTA and FTP, and STA.

28 Using Commodity Flow Survey Microdata and Other establishment Data to estimate the Generation of Freight, Freight trips, and Service trips CFS Microdata The CFS is the most important source of freight demand data in the United States, and one of the oldest data collection programs in transportation. The CFS collects data on the movement of goods in the 50 states plus the District of Columbia. The establishments selected are asked to provide data on shipments sent during one week for each quarter. The CFS provides information on commodities shipped; their value, weight, and mode of transportation; and the origin and destination of shipments. The main focus is on shipments sent by domestic establishments in manufacturing, wholesale, mining, and other selected industries. The CFS excludes crude petro- leum and natural gas extraction, farms, service industries, government establishments, imports (until a shipment reaches the first domestic shipper), and trans-border shipments (Fowler 2001; Bureau of Transportation Statistics 2008). According to federal law governing Census Bureau reports, the data collected for the CFS cannot be disclosed in any way or form that permits identification of individual firms or establishments. To protect the confidentiality of the data, the NCFRP Project 25(01) research team used the CFS microdata, complemented these with other data sources, and estimated FP models at a secured Census Bureau facility. The guidebook models were subjected to a rigorous disclosure procedure to ensure that no confidential information could be inadvertently disclosed. 2008 Data and Models A separate dataset and set of models were generously provided to the research team by Dr. David T. Hartgen, from the Hartgen Group. The data were collected in 2008 as part of a study to assess the impacts of congestion on employers across the United States (Hartgen et al. 2014). Approximately 1,000 companies were surveyed in all states except Alaska and Hawaii. The respon- dents were asked about the number of shipments sent or received per week at their establish- ments, mode of transport, and percent of deliveries affected by local congestion (Clark & Chase Research, Inc. 2008). The models used in the Hartgen study differ in several respects from the other models included in this guidebook. First, they estimate the summation of FTA and FTP, whereas the other guide- book models separately estimate FTA and FTP. With the ER-EB models, aggregating FTA and FTP could lead to errors because FTA and FTP do not always follow the same pattern. Also, the models from the 2008 study assume that FTG increases with employment, which is not always the case. Apart from these differences, however, the 2008 models provide a pragmatic way to estimate FTG that is otherwise consistent with the rest of the models in the guidebook. 2006 NYSDOT 2011 USDOT 2015 NCFRP/ SHRP C-20 Freight Attraction (FA) Freight Production (FP) Freight Generation (FG=FA+FP) Freight Trip Attraction (FTA) Freight Trip Production (FTP) Freight Trip Generation (FTG=FTA+FTP) Service Trip Attraction (STA) Service Trip Production (STP) Service Trip Generation (STG=STA+STP) Number of observations 100,000 1,000 691 263 450 Description of the Data Collected 2007 Commodity Flow Survey 2008 Hartgen Group Survey RPI Establishment Surveys Table 4. Data sources used in this guidebook.

Modeling Methodology 29 Establishment Survey Data The final data source was a series of three datasets collected by RPI using establishment sur- veys. The first datasetâassembled in 2006 as a part of a project conducted for the New York State Department of Transportation (New York State DOT)âwas initially used to estimate the models in NCHRP Report 739/NCFRP Report 19: Freight Trip Generation and Land Use (HolguÃn-Veras et al. 2012). In that effort, disaggregated data were collected at the establishment level through two surveys targeting carriers and receivers. The questionnaire inquired about company attributes and operational and FTG patterns. The receiver sample was selected from receivers in Manhattan with more than five employees. Carriers selected had at least 25 employees and were based in New York and New Jersey. The data collection process resulted in a sample for the Manhattan and Brooklyn receivers with 362 observations and a sample of New York and New Jersey carriers comprising 339 observations (HolguÃn-Veras et al. 2012). The second dataset came from an establishment survey that was part of a project funded by the U.S.DOT. Conducted in 2011, the survey included sections that inquired about deliveries and ship- ments received, and current operations. These sections included questions concerning the number of deliveries received, shipment size, type of good(s) received, number of vendors, and number of employees. Data were collected from 263 receivers in Manhattan (HolguÃn-Veras et al. 2013b). The third and most recent dataset is based on an establishment surveyâspecifically designed to collect data about FSAâconducted in 2015 using a modified version of the survey instrument from HolguÃn-Veras et al. (2012). This data collection effort was co-funded by NCFRP and the second Strategic Highway Research Program (SHRP2 Project C-20, âFreight Demand Modeling and Data Improvementâ). The survey targeted establishments in the New York City metropolitan area and the New York State Capital Region. Three sections contained questions to collect data on deliveries received and shipments sent out, service trips, and current operations and flexibil- ity. The deliveries and shipments section included questions pertaining to number of deliveries received, number of shipments sent out, typical size and weight of deliveries and shipments, vehi- cle type used for both deliveries and shipments, type of products received and shipped, and who transports the deliveries and shipments. The current operations and flexibility section surveyed the respondents on the numbers of full-time and part-time employees, the fleet owned, and other operations-related questions. The service trip section inquired about the number of service trips received, type of vehicle used, most common types of planned and emergency service trips, and percentages of planned and emergency service trips that occur during both regular business hours and off-hours (7:00 p.m.â6:00 a.m.). It is important to note that the NCFRP/SHRP2 Project C-20 effort originally did not plan to collect data about service trips. Recognizing the importance of these data, however, the research team decided to include questions about service trip attraction (STA). Regrettably, budget constraints prevented collecting data about STP. Data were collected from 450 respondents; 280 were from the New York City metropolitan area, and 170 were from the New York State Capital Region. The research approach adopted to produce the guidebook has produced remarkably consistent results. Figure 5 shows a comparison of the âall sectorsâ models estimated by the Hartgen Group (labeled FTG-H in the figure) and the models estimated for the guidebook (labeled FTG-RPI). Because the former models use FTG from a nationwide survey, they provide an external test to the models estimated for the guidebook. These âall-sectorsâ models can be interpreted as repre- senting a âgenericâ establishment. They could be very useful to produce rough estimates of FTG. As Figure 5 shows, the Hartgen and guidebook models have very good agreement through- out the entire range of employment. As highlighted earlier in this chapter, however, significant discontinuities exist in the estimates of the ER-EB model that must be properly accounted for to ensure sensible results. The largest difference occurs for the highest values of employment, which may be due to the presence of outliers in the data used to estimate the ER-EB models.

30 Using Commodity Flow Survey Microdata and Other establishment Data to estimate the Generation of Freight, Freight trips, and Service trips Limitations The guidebook models are a significant step forward, offering some of the best models avail- able, but they have some limitations. Among their limitations are the following: â¢ Lack of geographic diversity in the estimation data. The data used to estimate FTG and STG comes primarily from the Northeast United States (the exception are the ER-EB models). Although the âall-sectorsâ models produce results that are consistent with each other, no data could be used to assess the transferability of the industry sector models. â¢ Type of data collected. Only the formal 2015 FSA generation survey collected a full complement of data. The other projects collected only basic data. Because the models were estimated with the pooled data, only the basic variables common to all datasets could be used. Consequently, the full potential of the 2015 data could not be realized. â¢ Lack of data about STP and FA. As shown in Table 4, no data exist for FA and STP. Collecting data to fill these gaps would complement the data collected by the CFS and this project, enabling a more comprehensive modeling of FSA. â¢ Sparsity of data from large establishments. The data collected by the research team includes information from a very small number of establishments with more than 100 employees. This could be an issue because the FSA patterns at these establishments may differ from those at smaller establishments. â¢ Low explanatory power. The decision to use relatively simple models based on employment lowers the explanatory power of the models because other relevant variables are left out. Although the guidebook models are practical and adequate for most applications, they are not necessarily the best ones for every application. For applications where more accurate estimates are needed, models like the ones estimated by SÃ¡nchez-DÃaz et al. (2014) are better. Some of these limitations reflect the constraints of the data collection process used in NCFRP Project 25(01). For this project, the research team repurposed data from surveys originally designed for different purposes. The approach enabled the team to collect data that otherwise would not have been collected. Although pragmatic, such an ad-hoc process cannot replace a comprehensive data collection program. The latter is needed to improve the empirical foundations of FSA modeling throughout the country. 0 20 40 60 80 100 120 1 10 100 1000 FT G (s hip me nt s ( in+ ou t)/ da y) Establishment Employment in FTE FTG-H FTG-RPI FTG-H = based on 2008 data from the Hartgen study; FTG-RPI = based on establishment survey data. Figure 5. Comparison of FTG estimates (all sectors).