Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
77 This chapter explains the data preparation process that preceded the estimation of freight mode choice models using the 2012 CFS confidential microdata. The estimation of shipment- level freight mode choice models required data regarding the shipment characteristics, e.g., shipment size and commodity type; the shipper attributes, e.g., establishment size and indus- try; and the model attributes of all modes to be analyzed, e.g., transit times and freight rates. A major challenge was that CFS data do not include some of the data required to estimate freight mode choice models. These data are (1) shippersâ characteristics, such as location, industry sector, and employment and (2) attributes of the modes to be analyzed (truck and rail), such as distance, transit time, cost, number of transfers, and drayage distance. As a result, the research team had to use other datasets with data on shippersâ characteristics and modal attributes to complement the CFS data. The datasets used were the LBD, the confidential version of the Waybill sample, HERE data, and rail network data from the FRA. It should be noted that because the CFS has only a minuscule number of observations for water and air freight modes, these modes could not be included in the freight mode choice estimation process. Table 18 shows the data assembled for the freight mode choice analysis, the datasets from which the respective variables were derived, and the sources of the data. In summary, the shipment data are obtained from 2012 confidential CFS microdata; the shippersâ attribute data are obtained from the 2012 LBD data from the Center for Economic Studies; the rail attribute data are derived from the 2012 Waybill data and rail network data from FRA; and the truck attribute data are derived based on HERE data. To obtain modal attributes (costs and transit times) for each shipment in the CFS microdata, statistical inference techniques were used based on the HERE and Waybill datasets. The sources of the data, relevant variables, challenges, and limitations are explained in latter sections. Shipment Data: 2012 CFS Microdata The CFS is a database obtained from a shipper-based survey conducted every 5 years as part of the Economic Census. The survey is conducted as a partnership between the U.S. Census Bureau and the BTS and is the primary source of data on freight shipments at the national and state level (BTS and U.S. Census Bureau 2015). For each shipment, the CFS includes shipment ID; Federal Information Processing Standard Publication 6-4 (FIPS) state code for shipment origin and destination state; metropolitan area; CFS area; quarter of the year; commodity types as desig- nated by SCTG; mode of transportation; shipment value in dollars; shipment weight in pounds; great circle distance (GCD) in miles; routed distance in miles (for the mode used to send the shipment); hazardous material code; weighting factors; and binary variables indicating whether the shipment is an export, and, if so, export country and whether it is temperature controlled or C H A P T E R 6 Data Preparation
78 Impacts of Policy-Induced Freight Modal Shifts not. It is worth mentioning that the Public Use Microdata Sample (PUMS) released by the BTS with information on about 4.5 million shipments from the 2012 CFS (U.S. Census Bureau 2017), although generally useful, cannot be used to estimate freight mode choice models because it does not contain information about the shipper, particularly its location. Establishment Data: 2012 LBD The LBD covers all non-farm sectors listed in the Standard Statistical Establishment List, also known as the Business Register. The LBD has been collected and maintained by the Center for Economic Studies at the Census Bureau every year since 1976. The LBD contains informa- tion from the Standard Statistical Establishment List, Economic Censuses, and surveys. The longitudinal nature of the LBD allows the study of an establishment or firm over a period of time, including the year of entry and exit. Since LBD also contains confidential dataâsuch as age of the firm, employment, and payrollâobtaining access to LBD microdata requires a spe- cial sworn status (Jarmin and Miranda 2002, U.S. Census Bureau 2015). The LBD provides the shippersâ attributes, mainly industry sector in North American Industry Classification System, employment, location, and payroll. CFS and LBD are merged to form a dataset composed of shipment and shipper attributes. Modal Data This section presents the various modal data used in the preparation of the freight mode choice dataset. Truck Data To obtain the distances and transit times by truck for each shipment in the CFS data, the research team obtained the HERE data, which were generously provided by the Caliper corpora- tion. The HERE data contain layers with distances and transit times for the entire U.S. highway and street network, along with additional geographic information such as ZIP codes, census M od e( s) o f tr an sp or t E xp or t m od e Sh ip m en t I D D at e V al ue W ei gh t i n lb C om m od ity ( SC T G ) G re at er c ir cl e di st an ce T em pe ra tu re c on tr ol le d H az ar do us c ar go D es tin at io n in U .S . E xp or t d es tin at io n N um be r of e m pl oy ee s In du st ry ( N A IC S) L oc at io n Pa yr ol l T im e C os t G en er al iz ed c os t T im e C os t G en er al iz ed c os t D at as et C FS + W ay bi ll C FS + H E R E So ur ce Census Bureau Center for Economic Studies (CES) Surface Transportation Board (STB) NAVTEQ (processed by Caliper Corporation) D at a It em s H E R E Sp ee d/ T ru ck D at a Shipment Data Establish- ment Data Modal Data Confidential Commodity Flow Survey (CFS) Micro-data Confidential Longitudinal Business Database (LBD) C on fi de nt ia l W ay bi ll, R ai l N et w or k D at a Rail Truck NAICS = North American Industry Classification System Table 18. Summary of datasets.
Data Preparation 79 tracts, counties, cities, and states (HERE 2018). The final maps from the HERE data were pro- cessed by the Caliper corporation and provided to the research team in GIS format. The team post-processed the HERE data to obtain distances and transit times between all ZIP codes in the entire United States. Truck distances and transit times were calculated assuming each shipment by truck follows the shortest path from origin ZIP code to destination ZIP code. The output included origin ZIP code, destination ZIP code, truck transit time (minutes), and truck dis- tance (miles). Truck rates were calculated based on an updated version of the model presented in HolguÃn-Veras and Brom (2008), which estimates truck direct cost in dollars for a shipment as a function of the distance and time traveled. The finalized truck distances, transit times, and costs were merged with the CFS-LBD dataset by matching the ZIP-ZIP origin-destination pairs between the CFS-LBD data and the transit time and distance matrices. Rail Data Thanks to the assistance of the FRA, the research team secured access to the confidential 2012 Waybill Sample, a stratified sample of carload waybills collected by the Surface Transporta- tion Board. This database contains information on commodity type in Standard Transportation Commodity Code (STCC) codes, shipment size, types of car used, origin-destination informa- tion ranging from country and state up to ZIP code, Freight Station Accounting Codes, number of cars, revenue, shipment rate, variable rates, distance (routed, shortest path), and number of transfers. In addition, the team secured FRA rail network data, which contain geocoded infor- mation about rail nodes, rail stations (Freight Station Accounting Codes), and link distances. The network contains information on privately owned freight rail lines in the actual freight network at county and city levels; rail lines are presented with labels that include information on the primary owner of each particular freight rail line. The network data were updated in 2010. The research team used these data to produce rail distances between all ZIP codes (nearly 40,000) in the United States, assuming rail takes only the shortest paths and a drayage by road to the closest rail station. Transit times were estimated for various commodities assuming an average waiting time of 24 hours for each transfer. These rates, distances, and transit times by rail between all ZIP codes in the United States were incorporated into the freight mode choice modeling process. Since the commodity types are defined differently by CFS (using SCTG), and Waybill (using STCC), a conversion matrix between SCTG and STCC was used to obtain the rail distances, transit times, and rates for each shipment in the CFS data.