National Academies Press: OpenBook

Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests (2014)

Chapter: Chapter 1 - Literature Review and Industry Assessment

« Previous: Front Matter
Page 1
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 1
Page 2
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 2
Page 3
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 3
Page 4
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 4
Page 5
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 5
Page 6
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 6
Page 7
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 7
Page 8
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 8
Page 9
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 9
Page 10
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 10
Page 11
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 11
Page 12
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 12
Page 13
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 13
Page 14
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 14
Page 15
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 15
Page 16
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 16
Page 17
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 17
Page 18
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 18
Page 19
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 19
Page 20
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 20
Page 21
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 21
Page 22
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 22
Page 23
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 23
Page 24
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 24
Page 25
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 25
Page 26
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 26
Page 27
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 27
Page 28
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 28
Page 29
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 29
Page 30
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 30
Page 31
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 31
Page 32
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 32
Page 33
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 33
Page 34
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 34
Page 35
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 35
Page 36
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 36
Page 37
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 37
Page 38
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 38
Page 39
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 39
Page 40
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 40
Page 41
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 41
Page 42
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 42
Page 43
Suggested Citation:"Chapter 1 - Literature Review and Industry Assessment." Transportation Research Board. 2014. Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests. Washington, DC: The National Academies Press. doi: 10.17226/22370.
×
Page 43

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

1 Transportation professionals have been enamored with the potential uses of Global Positioning System (GPS) data ever since GPS became fully operational in 1995. Early GPS- enhanced household travel surveys, such as the 1996 FHWA Lexington Pilot Study and the 1997 Austin Household Travel Survey, have led the way in evaluating GPS use in travel surveys (Battelle Memorial Institute 1997; Murakami and Wagner 1999; Casas and Arce 1999). These initial studies were hindered by the U.S. government’s intentional degradation of GPS’s positional accuracy (known as selective availability). Selective availability was eliminated in early 2000, thereby accelerating the rapid development and implementation of a wide range of commercial, consumer-oriented, location- based services (LBS) and supporting GPS devices. Over the past 11 years, more than 25 household travel surveys (HTSs) conducted within the U.S. have used GPS augments to help assess the level, breadth, and magnitude of travel underreporting or misreporting by the large diary- based reporting sample. And, with each survey, GPS sample sizes have steadily increased, with some of the most recent surveys involving the deployment of GPS data loggers to thousands of households, either with large subsamples (e.g., New York City, Atlanta, and California) or with the entire surveyed population (e.g., Cincinnati, Cleveland, and Jeru- salem). Over this same time frame, consumer-based GPS products, such as stand-alone or in-dash personal navigation devices (PNDs), GPS-enabled smartphones, and fleet track- ing systems [e.g., automatic vehicle location (AVL) systems] have led to the creation of large-scale GPS data sets that can be mined or translated into detailed travel behavior informa- tion. In addition, other fixed-location approaches to tracking personal travel, such as those supported by Bluetooth, radio frequency identification (RFID), and mobile phone tower technologies, offer alternative methods for providing some level of travel behavior information. The combination of these large-scale GPS travel survey data collection events, the increasing availability of large consumer-based GPS data sets, and ongoing studies evaluat- ing the use and benefits of fixed-location sensors have led to many discussions within the transportation community about the roles, advantages, and disadvantages of various GPS data sources for transportation planning and modeling, as well as for other travel behavior research initiatives. Given the need for more data to support a wide range of transporta- tion planning and modeling activities, combined with ongo- ing budgetary constraints, the time has come to clearly and objectively evaluate the multiple sources of GPS data that could be leveraged and used for transportation planning beyond the traditional application area of travel time and speed studies. Overview of Literature Review and Industry Interview Process The increasing availability of travel data collected from location-aware technology, such GPS devices, combined with the availability of open application programming interfaces (APIs) and open-source software (OSS), has peaked interest in the application of GPS data for use in travel forecasting, planning analysis, and transportation system management. Frequently, however, the initial attraction by public agencies to these detailed travel data has met with roadblocks related to cost, challenges with integration into existing modeling paradigms, concerns about data privacy, sample bias, and data management difficulties. GPS and other tracking tech- nologies can provide a depth of insight into travel behav- ior and activity patterns that exceeds traditional modeling data needs (such as trip rates and travel times) and that complements standard system performance metrics (e.g., average speed and congestion identification). Realization of these potential benefits will require an objective assess- ment of these various data sources along with guidance to assist transportation data users in decision making and data management. C H A P T E R 1 Literature Review and Industry Assessment

2In 2011, the Transportation Research Board initiated a study to evaluate these GPS data sources and to provide guid- ance on the use of these sources by transportation planners, travel modelers, and travel survey practitioners; this study is NCHRP Project 8-89, “Applying GPS Data to Understand Travel Behavior.” This chapter reports on a broad literature review conducted on GPS data sources, actual and potential uses of these data in the field of transportation, standards for GPS data collection and storage, and concerns about the various sources with respect to coverage, bias, accuracy, and privacy. To supplement this literature review, comprehensive ques- tionnaires were sent to industry experts in the areas of travel surveys, travel behavior research, travel demand modeling, and traffic data provision. The responses from these questionnaires provided additional direction to the literature review process and have been summarized at the end of this chapter to provide both state-of-the-practice and state-of-the-art confirmation of industry uses and plans for GPS data. Hereafter, information gathered from these industry responses are referred to as the 2012 Industry Survey. For the purpose of this report, it is important to clarify the scope of this research initiative. The prioritized GPS data sources evaluated were: 1. GPS data loggers and GPS-equipped smartphones deployed to households recruited within a household travel survey; 2. Passive GPS or cell phone data collected from devices pur- chased by consumers, such as mobile phones and PNDs; 3. Other GPS and location-based data sources that have been used for understanding various aspects of travel behavior (i.e., transit surveys, transit AVL data, private fleet tracking systems, and probe vehicle studies). In addition, other fixed-location sensors, such as mobile phone towers, RFID readers, Bluetooth sensors, and Wi-Fi sensors, that have known locations and can detect when rel- evant devices pass by can also provide useful information about transportation system performance as well as travel behavior. Although not GPS technology as defined in the pre- vious list, these technologies are also discussed in this chapter. GPS-Based Travel Behavior Data Collection and Uses The following subsections discuss the use of GPS tech- nology to enhance HTSs, provide notable examples of GPS-augmented HTSs, present ways in which GPS data have been used to improve travel demand models (TDMs), and conclude by identifying other travel behavior study types that have benefited from GPS technology. Overview of GPS-Enhanced Travel Surveys This section presents an overview of the use of GPS in travel surveys and its evolution over the past two decades. It concludes with a discussion on the emerging use of smart- phones as GPS data collection alternatives for these surveys. As data needs for developing TDMs have increased and survey participation rates have generally fallen over the past several decades, more sophisticated methods of data collec- tion have been developed by the travel survey community in an effort to address these problems. There was a shift first from traditional travel diaries to activity diaries accompanied with major advances in survey techniques that were gener- ally driven by increases in computing power, portability, and availability along with decreases in cost. The evolution of travel survey methods has continued with the introduction of GPS-enhanced travel survey tech- niques. The use of GPS data collection has been found to have many advantages over traditional survey methods. First, GPS-enhanced surveys provide a more accurate and detailed account of the spatial and temporal aspects of personal travel than what survey respondents are able to recall and report, and GPS data sets have been used to correct significant trip underreporting errors associated with pen-and-paper or phone-based activity surveys (Battelle Memorial Institute 1997; Wolf, Bricka, et al. 2004). GPS-enhanced surveys should have less respondent burden for capturing travel details by leveraging passive GPS data collection while collecting more information and more accurate information. In addition, by further reducing respondent burden through the use of automated activity type, location, timing, and travel mode identification routines, GPS-based prompted-recall surveys allow for more complex questions to be asked. The latest generation of GPS-based surveys includes GPS- only studies, in which basic household information is collected first and then GPS data loggers are used by study participants, with software algorithms and models used to generate all necessary details of travel. Finally, the combination of more accurate spatial–temporal data along with reduced respon- dent burden allows for multiday data collection, which in turn enables more in-depth aspects of travel behavior to be studied, including variability in travel patterns, route choice, activity location selection, and mode selection. Furthermore, multi- day data collection can support reductions in required sam- ple sizes, thereby offsetting some, if not all, of the additional costs inherent in GPS-enhanced and GPS-based travel surveys (Stopher, Kockelman, et al. 2008). GPS-Based Subsamples for Travel and Activity Surveys The use of GPS data in activity and travel surveys is a rela- tively new practice, made possible through improvements

3 in the technology itself and the demand for more accurate travel data. Initially, GPS data collection was used mostly to provide corrections for trip rates obtained from traditional household travel surveys or to demonstrate the feasibility of doing so. These studies tended to be conducted in conjunc- tion with traditional diary-based household travel surveys. GPS-enhanced surveys of this type primarily used passive GPS data collection systems, where the GPS traces were col- lected and analyzed without any input from the participants, with a few studies using more active or interactive systems that employed a combination of technologies such as an onboard computer or handheld device combined with a GPS receiver to gain additional input from the participants (Battelle Memo- rial Institute 1997, Doherty and Oh 2012, Guensler and Wolf 1999). Some of these studies compared the GPS-identified trips with the diary report trips from the same GPS sub- sample as a means to correct larger, traditional activity diary samples. This method is also referred to as the dual-method approach because it requires the GPS subsample to use both GPS devices and diaries, which results in increased burden to these participants. Several examples of GPS-enhanced surveys that have used the dual-method approach are statewide surveys in Califor- nia (NuStats 2002) and Ohio (Pierce et al. 2003), and regional studies in Austin (Casas and Arce 1999), Laredo (Forrest and Pearson 2005), Kansas City (NuStats 2004), Seattle (Cambridge Systematics 2007), Chicago [Chicago Metro- politan Agency for Planning (CMAP) 2008], and Denver (NuStats 2010). Similar dual-method studies have recently been completed in California (performed by NuStats and GeoStats) as well as in regional surveys conducted for Phila- delphia and Los Angeles by Abt SRBI (2012 Industry Sur- vey). The primary intent of the GPS component in each of these studies was to develop trip rate correction factors. Fig- ure 1-1 shows GPS data collected during the California HTS pilot study conducted in 2011. Additional analysis was performed using GPS data col- lected in the 2001 California statewide (Wolf, Oliveira, and Thompson 2003; Zmud and Wolf 2003), Ohio statewide (Pierce et al. 2003), Kansas City (Wolf, Bricka, et al. 2004; Bricka and Bhat 2007), and Denver (Bachman et al. 2012) sur- veys, among others, to gain insight into the underreporting phenomenon. Most recently, a study by Bricka and Murakami (2012) used a combined GPS and diary sample from India- napolis to not only evaluate trip underreporting in traditional surveys, but also to test potential trip reporting errors with the use of GPS-only samples. These survey and research efforts have led to a large body of knowledge about trip underreporting in household travel surveys as well as the methods for identifying and correcting this problem. The use of a GPS subsample within a larger tra- ditional travel survey for correction factors continues to be an important way for this technology to support travel demand modeling needs. In the 2012 Industry Survey of market research firms that specialize in travel surveys (conducted as part of this research effort), most respondents reported either recently using or continuing to use GPS samples to correct self-reported trip rates (2012 Industry Survey). Figure 1-1. Example of GPS data collected during 2011 California HTS pilot study.

4travel and activity diaries and associated retrieval methods. Efforts in this area have been conducted along two main lines: (1) processing the GPS data into basic trips and attributes and then having the participants confirm, complete, and/or correct these data through a GPS-based prompted-recall inter- view, and (2) using a GPS-based prompted-recall subsample to calibrate models that are then used to impute details on com- pletely passive GPS data collected by the majority of the sample without further input from survey participants. The following sections discuss these two approaches in more detail. GPS-Based Prompted-Recall Travel Surveys The use of passive GPS logging coupled with a follow-up survey that is based on the trips identified within the GPS data is usually referred to as a GPS-based prompted-recall (PR) survey. This is because the GPS data are used to recon- struct the activity-travel pattern of the respondent, with the detected trips and trip attributes then presented to the partici- pant, who is prompted for further responses. This mode com- bines the automated data processing routines of purely passive surveys with respondent verification of the auto-generated data and, usually, the collection of additional data that may be difficult to extract from GPS traces alone (i.e., trip purpose, vehicle occupancy, parking cost, etc.). Several different types of prompted-recall surveys have been conducted, including both vehicle-based and person-based, that use various strate- gies for prompting the individual recall of travel patterns. The primary advantages of this survey mode are the collection of detailed information about aspects of travel and activity participation that cannot be automatically deduced (Auld et al. 2009) and the reduced respondent burden during actual travel, which is limited to carrying the device, something that most respondents do not seem bothered by (Lawson, Chen, and Gong 2010). An early example of a GPS-based prompted-recall survey was implemented by Bachu, Dudala, and Kothuri (2001), who used vehicle-based GPS data to track a sample of 10 house- holds over several days. The results of this study showed that the survey participants could recall the details of trips identi- fied in the GPS traces several days after initial data collec- tion with little loss of recall ability. A small pilot study was also conducted by Stopher, Bullock, and Horst (2002) using prompted-recall survey methods with a similar process of auto-identifying activity-travel episodes with manual adjust- ment. In this study the travel patterns were shown in maps and in a sequential tabular format, with unknown attributes (such as purpose, travel companions, and costs incurred) left blank for the respondents to fill in. This survey also had the respondents correct the generated travel patterns. A similar mail-out PR follow-up study was conducted for a portion of the Kansas City GPS subsample, in which respondents were A Move Toward the Replacement of Travel Surveys with GPS Although the first significant use of GPS in travel surveys was to measure and correct for trip underreporting, it has long been thought that GPS-based surveys could someday completely replace the travel reporting component of house- hold travel surveys (Wolf 2000). The expectation has been that a completely GPS-based survey would significantly lower the respondent burden while increasing the quality and quantity of information captured, specifically in the automatic collection of trips and their attributes, including trip start and end times, activity locations and durations, and route choices (Wolf 2000; Murakami, Morris, and Arce 2003). Accurate travel reporting has traditionally been a challenge for survey respondents due to limitations in memory recall, the tendency to filter out what is considered by the partici- pant as either unimportant (i.e., ATM visit or convenience store stop) or confidential, and the inherent complexities of trip reporting methods. Furthermore, there has been interest from travel demand modelers to extend the reporting period of traditional travel surveys beyond a single day to better measure the variability in day-to-day travel. A few travel surveys conducted outside of the United States have done this; for example, the Mobidrive survey conducted in Germany collected travel information for 6 weeks (Axhausen et al. 2002). However, in the United States there have been few travel surveys that have attempted to col- lect even 2 days of travel data due to a significant decline in participation rates and trip rates attributed to higher respon- dent burden (Chicago Metropolitan Agency for Planning 2008; Bricka 2008). Consequently, reducing respondent bur- den is critical to recruiting and retaining a good, representa- tive sample of the targeted population—and even more so if multiday travel information is desired. It is worth noting that most of the recent GPS-enhanced travel surveys conducted in the United States have collected multiday GPS data ranging from 2 to 7 consecutive days. During the industry interviews conducted for this project, a leading researcher from the Institute of Transportation and Logistics Studies (ITLS) touched on many of these aspects in his industry survey response, “Accuracy [of GPS] is clearly far greater than in diaries. People are notoriously bad at esti- mating the times at which they travel, how long they travel, and certainly how far they travel. . . . A huge advantage is the ability to collect multiday data as well as the accuracy and coverage already described. We believe that personal passive GPS loggers reduce respondent burden substantially” (2012 Industry Survey). With the relative ease and accuracy of collecting travel data through GPS logging established by early studies, subsequent research has looked into using processing techniques on the collected GPS data to completely replace the traditional

5 was remarkable in its aggressiveness with respect to technol- ogy adoption by the planning agency and acceptance by diverse population groups within the region. Furthermore, the survey platform made it possible to conduct PR interviews imme- diately following GPS data downloads to the laptops in the participants’ homes, with no opportunity for interviewer pre- processing or cleaning of the GPS trip data prior to participant review. The Ohio Approach to GPS-Based Household Travel Surveys Since 2009, the Ohio Department of Transportation has initiated two large-scale GPS-based household travel surveys within the state, which advanced the current state of practice in GPS-based travel surveys. The first survey, derived from the work previously implemented by Stopher and Collins (2005), was conducted in the Cincinnati area where a completely GPS-based household travel survey was performed. The travel survey included over 2,000 house- holds, of whom 601 completed a 1-day prompted-recall study (Stopher et al. 2012). The study found that GPS-only travel surveys were feasible, although the authors state that prompted-recall data had limited usefulness because “it was also quite clear from the results of the prompted recall that it does not provide ‘ground truth,’ because people still mis- understand what is required and misremember what they did” (Stopher et al. 2012). The second GPS-based travel survey in Ohio was con- ducted in Cleveland by GeoStats (2012 Industry Survey). In this study, more than 4,000 households provided travel infor- mation using GPS data loggers, with approximately 1,300 of these households participating in a GPS-based prompted- recall interview using CASI or CATI survey methods. The purpose of the prompted-recall sample was to assist in the calibration and validation of the algorithms and models developed for imputing critical trip attributes such as travel mode, companions, and trip purpose for the remaining GPS- only sample. Another 453 households composed entirely of persons over the age of 75 reported their travel using travel logs (which seems simpler and more appropriate for this demographic group), yielding a final overall sample size of 4,545 households. Smartphone Use in Household Travel Surveys According to the Pew Research Center (Smith 2012), 46% of adults in the United States own a smartphone, with almost three-quarters (74%) of them getting real-time location- based information on their smartphones. These statistics are particularly impressive given that the two most popular smartphone platforms did not exist until the middle of the first decade of the 2000s. prompted to fill in details of GPS-identified trips that were not recorded in the standard diary the respondents filled out or mentioned by the household during the travel reporting interview (NuStats 2004). As mentioned previously, most of the early prompted-recall studies involved creating maps or other displays, then mailing these to the respondents for completion, which could involve significant delays and, therefore, a potential degradation of recall ability. More recently, GPS-based prompted-recall sur- veys have been implemented using web-based data collec- tion platforms. The use of web-based prompted recall allows much more detailed information regarding travel behavior to be collected. Yasuo Asakura of Kobe University states, “The combination of GPS [and] web has made [it] possible to obtain whole travel behavior data that were not observed only by the GPS,” (2012 Industry Survey). Examples include the collection of detailed travel planning behavior (Auld et al. 2009) and activity rescheduling strategies (Clark and Doherty 2010), among others. Studies by Marca (2002), Stopher and Collins (2005), Lee-Gosselin, Doherty, and Papinski (2006), Li and Shalaby (2008), and Auld et al. (2009) were performed using PR surveys over the Internet, and web-based computer- assisted self-interviews (CASIs). A computer-assisted tele- phone interview (CATI) GPS prompted-recall component was added to the recent household travel survey for the New York Metropolitan Transportation Council, which also used a web-based CASI PR component (Chiao et al. 2011; Wilhelm, Wolf, and Oliviera 2012). In each of these PR studies, members of recruited house- hold wore GPS data loggers for one or more days, and the data were later transferred to a central server for processing, either by direct uploading of the data removed from the device after the survey was complete, as in the surveys by Stopher and Collins (2005); Auld et al. (2009); Wilhelm, Wolf, and Oliviera (2012); and Oliveira et al. (2011), or through con- tinuous wireless communication as in Lee-Gosselin, Doherty, and Papinski (2006). Regardless of the data transfer method, the collected raw points were then processed to identify the activities and trips, and the recall survey was built upon the identified activity-travel episodes. Another variation of a GPS-based prompted-recall sur- vey was implemented in Jerusalem in 2010–2011, where the regional planning agency used an internal team to conduct a 100% GPS-based travel survey. They used laptops to administer face-to-face interviews using commercial off-the-shelf (COTS) computer-assisted personal interview (CAPI) software that was integrated with a custom GPS prompted-recall tool devel- oped by GeoStats. This approach was used to carry out both the initial recruitment and subsequent GPS-based prompted- recall interviews (Oliveira et al. 2011). This survey collected detailed GPS-based prompted-recall travel data from more than 8,800 households located within the Jerusalem region and

6can become dominant in household travel surveys; these challenges include: 1. Market fragmentation, 2. Power management, 3. Data plans and associated costs, and 4. Self-selection and capture mode biases. The first issue is that there are several active smartphone platforms in the United States, each with multiple versions in active use and varying levels of API and technology support. Table 1-1 shows the breakdown of the top five smartphone platforms in the United States and includes the number of active different versions for each. This fragmented reality makes it difficult and costly to develop apps for multiple platforms and operating system (OS) versions to support the majority of participants with smartphones. For example, the iPhone 3G running iOS ver- sions older than 4 does not have the ability to log GPS data in the background while the phone is performing other activi- ties, such as running a different app or during a call. There are also significant API differences between platforms that make it challenging to offer the same features and logic consistently across all apps offered to all participants. The second issue that needs to be addressed is that of power management when logging trace data, be it from GPS or other sources such as Wi-Fi. A viable travel survey app has to allow a participant to make normal use of his or her smartphone while capturing the required data. Incremental improvements in battery technology, central processor units (CPUs), and GPS chipsets have alleviated this limitation, but it is still the case that continuously logging GPS data will rapidly deplete a smartphone battery. Researchers have dealt with this limitation through various strategies such as pro- viding external power sources (Doherty and Oh 2012), limit- ing the use of GPS and relying mostly on Wi-Fi and cell tower data for positioning combined with algorithms for identify- ing when to start and stop logging (see Quantifiable Traveler In addition to being capable of running custom software applications (commonly referred to as “apps”), most smart- phones integrate multiple technologies, such as keyboard and voice inputs, GPS, accelerometers, gyroscopes, and cameras, most of which are applicable to conducting travel surveys. Modern platforms make the use of these imbedded technolo- gies available to software developers, with GPS-based location- referencing services becoming one of the most popular features in smartphones. For example, there are several free apps avail- able for the most popular platforms that allow users to record GPS locations as frequently as one point per second. The opportunity to leverage smartphones is appealing to travel behavior researchers, and these devices are quickly becoming another method in the travel survey toolbox for collecting GPS-based travel data. According to Murakami and Bricka (2012), the possibility of using devices owned by participants can address common implementation challenges in GPS-based travel surveys by: (a) eliminating the need to ship out and retrieve GPS loggers, (b) shortening the time between travel date collection and data review, and (c) reduc- ing costs associated with equipment loss. When combined, the growing market penetration and technical capabilities of smartphones makes them an attractive medium for con- ducting travel surveys. This has sparked a growing interest in the travel survey community, with several pilot studies having been conducted over the past few years (Bricka and Murakami 2012). Smartphones have the ability to be used in travel surveys as either active or passive data collection devices. In the active scenario, respondents would use the phone app to respond to survey questions before (i.e., recruitment questions) and during their travel day, either by confirming stops or explic- itly starting and stopping the recording of GPS traces. Early deployments of smartphones to collect GPS data within travel surveys have used this approach, with notable examples being the TRAC-IT research project (Center for Urban Transporta- tion Research, University of South Florida 2012) and the PTV Pacelogger app (Bricka and Murakami 2012). Passive use of smartphone technology requires partici- pants to download and initialize the app and identify them- selves within the household persons roster; from that point on, all recording takes place automatically in the background, with the app detecting when the monitoring period ended and transmitting the captured data for processing. This pas- sive data collection scenario can also be complemented by a PR interview completed in the same app or via the web. Rel- evant examples of this approach are the Quantifiable Traveler app developed by UC Berkley and the Future Mobility Survey conducted in Singapore (Murakami and Bricka 2012). However, there are still a few technological and method- ological challenges to overcome before smartphone solutions Platform February 2012 Number of Major OS Versions* Android (Google) 50.1% 8 iOS (Apple) 30.2% 6 Blackberry (RIM) 13.4% 4 Windows (Microsoft) 3.9% 3 Symbian (Nokia) 1.5% 4 Others 0.9% N/A *Only includes versions released since 2007. Source: comScore MobiLens from http://www.comscore.com/Press_Events/ Press_Releases/2012/4/comScore_Reports_February_2012_U.S._Mobile_ Subscriber_Market_Share, accessed on 08/31/2012. Table 1-1. Top U.S. smartphone platforms.

7 The relevant travel demand model’s data requirements have a large influence on the data elements collected in a household travel survey. Simple four-step TDMs may require basic household, person, vehicle, and trip-level information, whereas advanced activity-based models require more precise details about household, person, and vehicle characteristics, as well as expanded information about actual travel behavior, travel options, and costs. These three surveys have been selected as representative examples of an expansive survey design (whose requirements were driven by many state agency and regional agency stake- holders), a typical survey design (in which the requirements were driven by a regional metropolitan planning organiza- tion (MPO) that has both a four-step TDM and an activity- based TDM), and a GPS-/technology-driven design (that was intentionally defined to require the minimum data elements needed to support the current four-step regional TDM), respectively. The purpose of this comparison is to show the range of current data requirements in household travel sur- veys to be considered when evaluating the reduced respondent burden associated with GPS-based travel surveys as well as when evaluating other data sources to replace travel surveys. The California Statewide Household Travel Survey—An Expansive Survey The California Department of Transportation (Caltrans) sponsors the decennial CHTS, the most recent of which began survey data collection in February 2012 and was completed in January 2013. This statewide survey was designed to support the statewide travel demand model. Additionally, an attempt was made to accommodate regional travel demand models by including representatives from the MPOs and councils of governments from across the state in the planning and design process. Other California state agencies, such as the California Energy Commission, were also active participants in the design of the survey to meet their own agency data needs. By trying to accommodate the data needs of this wide range of users, the CHTS was a significantly longer and more comprehensive survey than typical household travel surveys. The design of the survey also included a long-distance trip diary in addition to the regular single-day travel diary; the purpose of the long-distance trip diary was to collect infor- mation about inter-regional travel within the state that is not captured in a typical 1-day survey. The final sample size for the full survey was approximately 42,500 households, and 5,717 of these households also participated in the GPS com- ponent. The survey used a dual-method approach (partici- pants receive both diaries and GPS devices) with three GPS subsamples: diary and wearable GPS, diary and vehicle GPS, and diary and vehicle GPS supplemented with an onboard diagnostic (OBD) device (or engine sensor). and Future Mobility Survey in Chapter 26 of the Travel Sur- vey Manual Update by Murakami and Bricka and also Battelle Memorial Institute 2012), and providing direct control over the logging to participants (Center for Urban Transportation Research, University of South Florida 2012). The third issue is related to the need to transmit and down- load data from the app and the fact that participants may have limitations on their data plans. Even after applying data compression, high-resolution GPS traces can get fairly large, and transmitting them back for processing could have con- siderable cost impacts on an unknowing participant. This can be alleviated by providing materials that explain the expected data transfer demands of the application up front, applying trace simplification algorithms such as SQUISH (Muckell et al. 2011), providing incentives that will offset data trans- mission costs, or only transmitting minimal information back to a central location (Center for Urban Transportation Research, University of South Florida 2012). The fourth challenge has to do with the fact that travel sur- veys are typically conducted at the household level and that not all adult members will have a compatible smartphone. This means that passive GPS data loggers will still need to be shipped to households even if there is a smartphone owner/user in the household. Of course, households without any smartphone will require one or more passive GPS loggers as well. These mixed GPS methods could be confusing for a survey household. Finally, as seen with other data collection methods and technologies, there are multiple biases (e.g., age, gender, income, and ethnicity) related to smartphones. To mitigate these biases, it is important to provide alternative means of participating and to ensure that the data collected, regardless of survey mode, is properly integrated into the overall survey platform and framework. Developing a comprehensive sys- tem to support and integrate multiple survey modes across and within households is not a trivial task, and the costs to develop, maintain, and update this system and all compo- nents, as well as to provide technical support to participants, will be incurred on an ongoing basis. Despite all of these challenges, it should be noted that the widespread availability and use of smartphones are relatively recent phenomena, and the technology as well as its uses are likely to continue changing and evolving at a fast pace over the next several years. Examples of GPS-Enhanced Household Travel Surveys This section describes the data requirements of three recent household travel surveys: the 2012–2013 California Household Travel Survey (CHTS), the 2011 Atlanta Regional Travel Survey, and the 2012–2013 Northeast Ohio Regional Travel Survey (covering the greater Cleveland region).

8study conducted in Cincinnati and an 8,800 household GPS- based travel survey conducted in Jerusalem.) The Cleveland survey collected detailed socio-demographic and travel data from 4,545 households, including a 30% subset who partici- pated in a GPS-based prompted-recall interview designed to confirm trip details via CATI or CASI survey methods. Trip details for the remaining GPS-only sample were imputed based on land use data, geocoded addresses, GPS data char- acteristics, and information collected during the recruitment interview. This means that the majority of households in the study completed a recruitment interview, wore a GPS device for 3 or 4 days, completed a record of usage, and sent the device(s) back, thereby concluding their participation. The smaller per- centage of the sample used GPS to record their travel while recording a few basic details of the trips made on their assigned travel day to reference during their retrieval interview. Given the use of GPS as a primary means of data collection and software algorithms for imputing travel details, the survey sponsors (the Northeast Ohio Areawide Coordinating Agency and the Ohio Department of Transportation) agreed that they would also try to minimize the number of data elements required in both the recruitment and retrieval interviews so that only essential variables needed for model development or support were required. Consequently, this survey represents a minimalistic approach to HTS data collection. Comparison of Data Requirements Table 1-2 and Table 1-3 provide summaries of the counts of variables for each of the three surveys discussed in this sec- tion. The summary includes the count of variables in each of the typical tables provided in the final data set. The two right- most columns show the difference between the variables col- lected for the two sample types in Cleveland, with a notable difference in the second table illustrating the reduced burden for the GPS-only participants. As mentioned previously, the number of variables in these TDM data sets is also a reflection of what might be required when trying to use or reuse existing data sets for travel survey purposes. Appendix A contains the complete listing of all variables delivered by table (household, The 2011 Atlanta Regional Travel Survey—A Typical Travel Survey The Atlanta Regional Commission (ARC) conducted its most recent regional travel survey in 2011; this survey had a targeted sample size of 10,000 households with a subset of 1,000 GPS households (PTV NuStats 2011). Recruitment methods offered to participants were telephone (CATI) or web (CASI) interviews, and retrieval methods included CATI, CASI, and diary mail back with data entry into the web-based retrieval system. The purpose of the 10% GPS subsample was to collect detailed information about all trips made to estimate levels of trip underreporting that could be applied to the larger, non-GPS sample. Consequently, the dual GPS and diary method was implemented. A split design was also recommended, with the objective being to obtain 667 com- plete households with in-vehicle GPS data and the remaining 333 complete households with wearable GPS data. The GPS devices were used for 7 days by the vehicle sample and 3 days by the wearable sample, with the first day coinciding with the assigned diary/travel day. This split technology design allowed for the collection of 7 days of highly accurate vehicle-based data with minimal respondent burden while focusing the use of the wearable GPS device to those households that reported some incidence of transit use for a work or school commute. Households selected for the wearable GPS component were deployed for 3 days, with all household members between the ages of 16 and 75 receiving GPS equipment. A $25 incentive per instru- mented vehicle or person was offered to all recruited GPS households for successfully reporting travel data, using all GPS devices provided, and for returning all devices. The final data sets for the survey contained 10,278 completed house- holds and 1,061 completed GPS households. The Northeast Ohio Regional Travel Survey—A State-of-the-Art, GPS-Only Survey The Northeast Ohio Regional Travel Survey, covering the Cleveland metropolitan area, was one of only three large-scale travel surveys to use GPS for nearly 100% of the participating households. (The other two were a smaller 2,583 household Description California Statewide Atlanta Cleveland PR Cleveland GPS Only Household variables 50 38 32 32 Person variables 104 92 93 93 Vehicle variables 29 15 7 7 Location/place/trip/activity variables 53 54 43 43 Long-distance travel 51 0 0 0 Totals 287 199 175 175 Table 1-2. Number of delivered variables.

9 during model development, the systematic use of these data sets for other purposes is just now beginning to grow. Perhaps most intriguing is the use of passive GPS data collected by survey participants to replace traditional diary-based report- ing methods (as discussed in the previous section). Over the past decade, GPS data have been applied in trans- portation planning model development to: • Generate trip rate correction factors, • Identify activity schedules, • Explore activity interactions within a household and within larger social networks, • Identify activity locations, • Identify route choice and mode choice preferences, • Explore variability and pattern formation in activity-travel patterns, • Identify baseline network roads and conditions, • Evaluate bike/pedestrian travel behavior, • Validate travel demand models, and • Identify trip purpose and activity type. While uses such as trip rate correction factors and the identification of baseline transportation network conditions have been applied in several regional model development efforts to date, other uses are still emerging and are found only in research studies. Table 1-4 lists some of the known uses of GPS data that have been applied in practice. person, vehicle, location/place/trip, and long-distance travel) and by survey. Use of GPS Travel Data in the Development of Transportation Models The decision-making demands on applied transportation models are requiring an ever-increasing level of complexity to estimate transportation policy impacts beyond capacity expansion (Cambridge Systematics, Inc., et al. 2012). The increasing complexity of models and their planning roles require higher-quality data to identify travel behavior and transportation system existing conditions. GPS technology has been targeted as an important tool for collecting the qual- ity data needed in today’s models. More specifically, GPS- based travel/activity surveys have been implemented with the expressed intent of improving the quality of behavioral data needed for trip, tour, and activity-based models. Data from GPS and consumer technologies are also emerging as a source for identifying baseline network operating conditions and for validating model outputs. Over the last several years, the primary incentive for regions to invest in a GPS-enhanced travel survey component has been the identification of trip rate correction factors that adjust model trip rates based on unreported travel measured in the GPS subsample for the larger diary-based samples. While GPS survey data have been used for other investigative analyses Description California Statewide Atlanta Cleveland PR Cleveland GPS Only Household variables 25 15 8 8 Person variables 97 85 83 83 Vehicle variables 27 13 5 5 Location/place/trip/activity variables 45 36 34 0 Long-distance travel 47 0 0 0 Totals 241 149 130 96 Table 1-3. Number of questions asked of participants. Use of GPS Data Applied in Practice Trip rate correction factors Atlanta, California, St. Louis, Kansas City, Washington, D.C., Chicago, Massachusetts, New York City Activity schedule development Jerusalem, New York, Cincinnati Activity interaction analysis Jerusalem, Cincinnati, New York Activity/trip end geocoding Cincinnati, Jerusalem, Cleveland Baseline network development Many (GPS probe vehicle data, consumer data) Route and/or mode choice analysis Jerusalem, Zurich, Seattle, San Francisco, Portland Model calibration/validation Many (GPS probe vehicle data, consumer data) Bike/pedestrian models San Francisco, Monterey Bay Full travel diary replacement Cincinnati, Jerusalem, Cleveland Table 1-4. Uses of GPS data in transportation model development.

10 to restore the individual disaggregate details that include not only the number of trips by purpose but also their sequence and timing based on a small subsample. This becomes espe- cially important as scheduling models begin to account for intra-household interactions among household members, which require consistent schedules for all household mem- bers. Consequently, missing or underreported data for just one household member can invalidate the data for the entire household, increasing the prevalence of unusable data. One advanced surveying approach that fully addresses this issue and minimizes the underreporting biases is a 100% GPS-assisted prompted-recall method (Oliveira et al. 2011). The recent comparisons between GPS and non-GPS sub- samples of the Jerusalem HTS have shown that, all else being equal, a GPS subsample provides trip rates that are 50%–70% higher and tour rates that are 10%–20% higher than diary- only households, with rates varying based on trip/tour pur- pose. The most frequently underreported travel components are short trips, nonmotorized trips, and intermediate stops on commuting tours. While these trips might not contribute significantly to regional vehicle miles traveled (VMT), they are important for understanding and modeling travel behavior and other (longer) trips. For example, the presence of inter- mediate stops on commuting trips (like dropping off a child at school on the way to work or visiting a gym or shopping mall on the way from work) can be a major reason for a person’s resistance to switch to transit. A data collection method that systematically simplifies tours may result in an overly optimis- tic mode choice model that would overpredict the number of transit users for a new service. Such GPS data collection efforts as the types described pre- viously will be even more vital to emerging travel demand modeling paradigms. These include advanced activity-based models that focus more on the dynamic behavioral aspects of the traveler, such as in models by Habib and Miller (2008), Nijland et al. (2011), Auld and Mohammadian (2012), and others. These models all have a focus on day-to-day dynam- ics and choice behavior that cannot be observed in standard travel diary data. This is especially true for several large-scale, next-generation travel demand models in the process of devel- opment, including ADAPTS (Auld et al. 2009), SimAGENT (Goulias et al. 2012), and SIMTRAVEL (Pendyala, Konduri, and Chiu 2012), which could greatly benefit from new GPS- based data collection techniques. Many of these models rely on somewhat esoteric con- cepts, such as choice set formation, activity time-space con- straints, and scheduling flexibility, which individuals rely on when forming activity-travel patterns but are likely to have a fairly limited ability to recall in a survey setting. For example, a concept such as flexibility (e.g., spatial flexibility) can be factored into models when attempting to formulate realistic choice sets from which survey participants can choose. How- Trip Rate Correction Factors One of the main attractions of using GPS devices to iden- tify travel behavior is that the data set is observed (passive) instead of reported (active). The basic limitations of reported travel behavior through diaries have been recognized for many years (Stopher and Greaves 2007; Casas and Arce 1999) as respondents frequently forget some trips and interpret the definition of a “trip” differently. When GPS became viable for use as part of a household travel survey, one of the first analy- ses was the comparison of GPS trips to diary trips. The differ- ences between the observed and reported trips and the need to account for these differences in demand models have justified the inclusion of GPS subsamples in many large-scale travel surveys over the last decade (Wolf, Loechl, et al. 2003; Bradley, Wolf, and Bricka 2005). The extent of underreporting varies by region and demographic profile (Wolf, Loechl, et al. 2003; Bricka and Bhat 2007). A review of five recently completed surveys conducted for Denver, Atlanta, Nashville, Massachu- setts, and California revealed overall underreporting levels ranging from 11% to 25%; however, these percentages should neither be interpreted nor applied broadly—additional analy- ses are needed to generate appropriate, targeted correction factors based on specific trip, tour, and socio-demographic characteristics. Generating trip rate correction factors can be accom- plished for surveys in which households report travel in addi- tion to GPS travel data, and also for surveys in which some households report travel using diaries only and others use GPS with prompted recall only. (The recent 2010–2011 New York City regional travel survey used this latter approach.) The techniques for either situation are similar in that correc- tion factors are generated for subsets of travel. For trip-based models the corrections should be for specific trip types (i.e., home-based work), and for tour-based models the corrections should be for specific tours (i.e., school tours for children). Activity Schedules and Interactions Activity-based models (ABMs) tend to require data on the full activity-travel pattern of individuals and such hard- to-collect information as planning times and flexibility measures. ABMs operate with disaggregate individual daily patterns and schedules. From this point of view, it is essential to collect a full-day list of person trips and activities with no gaps, overlaps, or inconsistencies. If one of the trips or activi- ties of the person is missing, miscoded, or underreported, this essentially makes the entire person-day unusable for some of the ABM components. Underreporting in aggregate four-step models can be somewhat improved by applying trip rate cor- rection factors derived, for example, from a 10%–15% sub- sample of GPS-assisted households (Wolf, Bricka, et al. 2004). This approach is less useful for ABMs since it is impossible

11 lation models and dynamic traffic assignment (DTA) models need detailed speed and condition data for the specific model scope. Many model developers have used GPS-based probe vehicles to collect these data over the last 15 years. Probe- vehicle data are typically collected using a sampling plan to ensure that the roads of concern are collected at designated times of day. GPS tracking has also been used to monitor net- work performance measures such as travel times, speeds, and delay (Quiroga 2004; Hackney, Marchal, and Axhausen 2005). More recently, consumer data that is collected by private companies is being used to identify baseline conditions. Sev- eral private data companies have roadway speeds archived over multiple years and can generate custom queries. The original data come from a number of different technologies, including personal navigation devices, commercial vehicle GPS, and smartphones. These technologies are discussed in more detail in a later section. GPS data that have been collected as part of a household travel survey can also be used to identify new model links that are needed in the network. Since travel demand model net- works do not typically include all roads, some collector and local roads that are heavily traveled may not be represented. The GPS travel data can be used to identify these frequently traveled links. Another related research area is measuring travel behav- ior changes that result from changes in network conditions. Much work has been completed in Australia measuring travel behavior changes in response to the TravelSmart policy (Stopher, Fitzgerald, and Biddle 2006; Stopher, Swann, and Fitzgerald 2007). These studies use either 1-week or 4-week GPS panels, repeated over a period of years, to extract some basic travel behavior measures, such as the vehicle kilometers traveled and number of trips. As the GPS data allows for a much more accurate method of determining these values and the data are easier to collect, it has proved useful in measuring travel behavior changes. Route Choice Analysis Route choice decisions are very difficult for survey respon- dents to reproduce using any reporting method. This has led to a lack of useful data on route selection behavior outside of simulated experiments. However, once GPS technology started being used in travel surveys, it was realized that the route selection behavior of the travelers would also be cap- tured (Jan, Horowitz, and Peng 2000; Li, Guensler, and Ogle 2005; Papinski, Scott, and Dougherty 2008). In the research of Jan, Horowitz, and Peng, data from the Lexington study were used to form general observations about route selec- tion behavior, comparing variations in path selection and deviations from assumed shortest paths (Jan, Horowitz, and Peng 2000). Georgia Tech researchers used GPS to observe ever, the individual will often have difficulty articulating the actual constraints underlying a location decision, as was found in the UTRACS survey (Frignani et al. 2010). Rather than rely on individuals to recall their location choice decision-making process, long-term observations can be used to directly observe the variability of location decisions for activities to formulate a more accurate measure for use in model development. This can similarly be done for other factors such as timing flex- ibility, and route choice variability. Activity and Trip End Locations Another issue that has plagued HTSs and subsequent travel model development for many years is geocoding of loca- tions (trip origins and destinations) and ensuring proper trip arrival and departure times. For any travel model, whether four step or ABM, a trip record with unknown or incorrect desti- nation location(s) is unusable for most sub-models. Round- ing and other mistakes in trip departure and arrival times are less critical for four-step models since they operate with broad 3- to 4-hour time periods. However, advanced ABMs are extremely sensitive to both spatial and temporal incon- sistencies. They operate with tours rather than trips, and hav- ing a data item missing on one of the trips frequently results in discarding the entire tour. Differences between CATI- and GPS-collected travel time data and how they compare with modeled estimates based on the same origin–destination (OD) pairs were analyzed by GeoStats (Wolf, Oliveira, and Thompson 2003) for the three regions in California that par- ticipated in the GPS component of the 2001 statewide travel survey. This analysis revealed that CATI tends to significantly overestimate travel time when compared with GPS-derived trips and modeled travel times. GPS technology can be used to ensure a consistent daily chain of trips and activities for a person because both the spa- tial and temporal aspects are present in the GPS stream with a high level of detail for routes and modes. In particular, for auto trips, such data as toll facilities or managed lanes used on the trip can be automatically retrieved. For transit trips, the GPS stream provides information about the sequence of all access and transit line segments (including exact boarding, alighting, and transfer points). The GPS stream also clearly identifies the parking location for both auto and transit trips with auto access or egress. Baseline Transportation Network Conditions All transportation models require some sort of baseline network and measurement of operating conditions. Travel demand model networks typically require estimates of free flow and congested speeds. Activity and tour models require these same speeds but at a more refined temporal scale. Simu-

12 from probe vehicles, GPS-based travel surveys, or consumer product origin–destination data sets. Trip ends are geocoded and assigned to a traffic analysis zone (TAZ). Trip travel times between TAZs are recorded based on their start times. The resulting table of TAZ-to-TAZ trips is aggregated and aver- aged into time-of-day bins based on the model design needs. The results can then be used in comparison with a similar output table generated by the model. In addition to GPS-based household travel survey data, the same information can be tabulated from reported informa- tion in travel diaries. A more recent development by a cell phone data provider allows modelers to purchase zone-to- zone travel times for large percentages (30%–70%) of the population where zones are based on census geometry. While still needing some independent evaluation, the approach is palatable to the modeling industry and fits the needs of inde- pendent and comprehensive validation data sets. Additional Modeling Needs Visitor/Tourist Travel Behavior. There are very few visi- tor models applied in practice. Most that have been applied are greatly simplified and aggregate in nature. However, for some major cities like New York, visitors represent a very sig- nificant travel component. In advanced ABMs, a disaggregate approach for modeling visitors has been considered that is based on the same principle of micro-simulation of individ- ual behavior as the core model for residents. Hotels provide the basis for structural synthesis of the population of visitors. To support such a model, a sample of daily travel diaries of hotel customers similar to individual questionnaires of HTSs can be collected. However, daily travel of visitors might be quite intensive in terms of number of trips and chains of trips (especially for nonbusiness purposes). Further, visitors and tourists are less familiar with the area, and it would be more difficult to retrieve trip end locations with them by address in a conventional non-GPS setting. Tourists and visitors may be even more reluctant to participate in a comprehensive, long survey compared to residents. It is possible that GPS-assisted methods with prompted recall would be attractive for this type of survey. A simplifying aspect of this type of survey is that the sampled unit is a person rather than an entire house- hold (when compared to a household travel survey). One approach would be to integrate an airport survey with the hotel visitors’ survey. Visitors can be recruited as they arrive at the airport and be equipped with a GPS device for the duration of their stay, with travel information retrieved using a prompted-recall method when they return to the air- port for their departing flight. It should be noted that visitors’ trips to and from airports themselves represent an important travel market with unique characteristics such as very high willingness to pay for travel time savings and reliability. variations in the chosen morning commute route within the Commute Atlanta project (Li, Guensler, and Ogle 2005). Papin- ski, Scott, and Dougherty compared route pre-planning to actual morning commute routes and made observations about how routes are planned (Papinski, Scott, and Dougherty 2008). The recent Traffic Choices Study sponsored by FHWA and conducted in Seattle, WA evaluated the before and after choices (including route) when different corridors were tolled (Puget Sound Regional Council 2008). This type of survey can be combined with any travel demand policy experiment. (Imposing differentiated tolls was the essence of the Seattle study.) In this case, the individual response to the policy can be analyzed through car use. An important additional benefit of this type of study that has not been fully utilized yet is the ability to calculate travel condition measures like travel time reliability in addition to average travel time and cost. Travel time reliability has been widely recognized as a very impor- tant characteristic of highway service quality. However, this characteristic has never been included in travel models [and travel behavior analysis in general except for stated preference (SP) studies] because of the absence of network data on travel time variation at the trip origin–destination level. There is a wealth of data on travel time distribution at the highway seg- ment level provided by sensor-based data collection systems (loops, cameras, RFID, etc.). However, a GPS-assisted traffic choices study is a unique way to track travel times and con- ditions for entire trips (for example, commuting to and from work) implemented by the same individual over a substantial period of time. The San Francisco County Transportation Authority developed a route choice model based on GPS travel data col- lected as part of the CycleTracks data collection effort (Hood, Sall, and Charlton 2011). GPS data collected on smartphones using the CycleTracks application were analyzed to identify activities, mode transfers, and network paths. The results were used to create a multinomial logit model to reveal route and condition preferences. A smaller-scale but similar effort was conducted in Portland in 1999 (Broach, Gliebe, and Dill 2009), and recent efforts in Austin, TX and Monterey, CA also used the same approach to route choice modeling. Model Calibration and Validation Travel model calibration and validation are conducted by comparing travel time forecasts from the baseline model with some sort of ground-truth data. The ground-truth data in this situation can come from trip origin–destination travel times and traffic volume counts at major screenlines. (Screenlines refer to locations around an urban area where traffic is fun- neled into a few crossing points. Screenlines typically occur at river crossings, rail crossings, or border crossings.) More recently, GPS origin–destination travel times have been used

13 variability, thereby giving planners more options for improv- ing emissions estimates. GPS data also provide measurement of time between engine starts. This value is used in estimating the intensity of cold start emission events. Local road travel is often unknown. A properly designed GPS component can help analysts identify the fraction of VMT and vehicle hours traveled (VHT), two common emis- sions modeling metrics, that occurs on links outside of the model network. These data can increase the accuracy of emis- sions estimates, particularly local travel, which may have a lot of stop-and-go activity. GPS data from household travel surveys have been used to support secondary research into emissions and green- house gas formation by the National Renewable Energy Lab in Colorado and the U.S. EPA (Gonder et al. 2007). Emis- sions researchers are relying less on typical driving cycles for emission rate estimates and instead, with the proliferation of GPS and other research, relying more on actual driving pat- terns from the general population. This interest is expected to continue with an increased focus in greenhouse gas emis- sions and tighter emissions standards. Involvement of these researchers in the formation of a possible GPS supplement to a household travel survey could result in additional financial support and increased benefits from the study. In fact, the California Energy Commission and the Califor- nia Air Resources Board (CARB) spearheaded a GPS augment to the 2012–2013 CHTS California Statewide Travel Survey in which 1,200 households received both GPS devices and OBD engine sensors to install in up to three household vehicles for 7 days. These data streams were processed, delivered, and used to estimate fuel consumption and vehicle emissions as required by recently passed state laws that require greenhouse gas emissions monitoring. An early use of GPS in 1999 by CARB evaluated heavy- duty truck activity (Battelle Memorial Institute 1999). This study, and many other special-purpose studies, has evaluated detailed trace data from instrumented vehicles to improve on the understanding of vehicle activity. Other Types of GPS-Based Travel Behavior Studies Physical Activity/Health Research Given that many regions are interested in smarter and more transit-oriented development, many are including a physical activity component to the planned travel surveys. The 2001 SMARTRAQ survey in Atlanta included a compo- nent similar to this that was funded by the Centers for Disease Control. Health researchers have been very active in trans- portation planning in recent years due to the clear impact of travel behavior on physical activity (and obesity). People who Taxi Vehicle Activity Patterns. Taxi as a travel mode has never been fully represented in travel models. In many regional travel models, taxi activity is entirely missing from the forecast. In other models, it is represented in a simplified way in terms of actual availability for the trip and associated travel time (wait and ride) and cost. However, in some major cities like New York, taxis represent over 30% of motorized trips. Making the taxi share more accurate in the mode choice model is not the only task to adequately represent taxies in travel models. Another important issue that has never been fully addressed in operational models is a proper representation of taxi vehicle movements that is much more complicated than for private automobiles. Taxi vehicle movement represents a complicated daily chain of trips with and without passengers that do not directly relate to passenger tours. For advanced travel models, one can envision a special new sub-model being developed for converting passenger taxi trips into vehicle trip chains. This sub-model will require a new data source that could be envi- sioned as a multiday GPS-assisted survey of taxis (with GPS devices installed on taxis). Some recent and ongoing research has been completed in modeling taxi behavior and taxi demand assisted by GPS data collection. One such a study was conducted by Liu, Andris, and Ratti (2010), who used a large database of taxi driver GPS traces to analyze how taxi drivers evolve their travel patterns to increase revenue. Commercial Delivery Vehicle Activity Patterns. Similar to the additional data recommended for taxi movements, one could consider a GPS-assisted multiday survey for trucks, deliv- eries, and other non-passenger vehicles contributing to daily circulation in major cities. Behavior of commercial vehicles is very different from passenger travel behavior and, in general, has been less explored and understood. A delivery truck might have 10 to 20 chained trips per day that are very difficult to retrieve reliably in a conventional survey setting. GPS-assisted technology is the only way to retrieve actual truck movements as a basis for a more advanced freight delivery model. A truck GPS study of a major grocery chain in the Chicago region was recently completed by the University of Illinois – Chicago (Mohammadian et al. 2013). Emissions Modeling. Household travel survey GPS data can be used in emissions modeling to develop and evaluate driving profiles and to generate link-level speed estimates. Emissions models require speed data to generate accurate estimates of vehicle emissions. The driving profiles provide the fraction of time respondents spend at different speed bins. Both Mobile 6.2 and the new MOVES model need this information to generate emissions estimates. Many regions also use a link-based emissions assessment that is based on the average speed of a road network link and the road vol- ume. GPS data can provide the average speeds as well as their

14 enabled device to assess user fees, either through onboard calculation or communication with a centralized manage- ment center (Pierce et al. 2011). The study has looked at the suitability of GPS in terms of its accuracy in being able to distinguish various road segments, privacy issues, and vari- ous architectures for collecting the data and assessing the fees. Samsung smartphones were provided to recruited participants, who installed the devices in their vehicles for the duration of the study. Some of the findings are likely applicable to more general GPS survey efforts, such as the trade-offs between thick and thin client data processing strategies (i.e., processing and aggregating data within the device instead of at a centralized location), which could help mitigate some of the privacy issues in other surveys as well (Pierce et al. 2011). The aforementioned Traffic Choices Study sponsored by FHWA and conducted in Seattle, WA evaluated the before and after behavior when different corridors were tolled at dif- ferent times of the day (Puget Sound Regional Council 2008). Findings suggested that participants made small-scale adjust- ments that could, as a whole, have an impact on traffic con- gestion. Further, the study suggested that open road tolling is technically feasible but will require a more robust business model to achieve the tolling program’s goals. Transit Passenger Surveying To comply with Title VI requirements as well as to improve service planning, transit agencies conduct surveys at varying frequencies to gather data regarding travel patterns (which may include origin–destination information), ridership demographics, and customer satisfaction information. The data are a valuable tool for effective transit planning and travel demand modeling. Traditional surveys collect data on the sequence of the current one-way trip from origin to boarding and then alighting to destination. These surveys also generally ask about payment method, fare subsidies available to the customer, and options for alternative modes, as well as demographic details about the rider, including age, race, income, and frequency of transit use. A 2005 survey of 52 transit agencies found that larger agencies conduct five or more onboard and intercept surveys annually, while smaller agencies conduct surveys every 1 to 3 years (Schaller 2005). Transit agencies are also required to have data collected by a recent onboard survey (within 5 years) to apply to the Federal Transit Administration’s New Starts program. Prior to the rise of GPS and mobile technology, the latter of which allowed for real-time geocoding, the most common approach to the collection of these data was to have a survey administrator hand out pencil and paper questionnaires on selected bus routes or trains. An advantage of the pencil and paper method is the relative ease of handing out a questionnaire to a large proportion of the riders of a given become car-dependent not only have an impact on traffic congestion but may also have limited physical activity. Wear- able GPS devices combined with activity monitors (or accel- erometers) have been deployed for multiday periods in many studies to quantify the levels of physical activity and travel for different populations of interest. These studies have included before and after mobility and travel evalu- ations of the cane training rehabilitation program offered for visually impaired veterans, an analysis of travel patterns and environmental exposures of children with asthma, and an examination of the level of physical activity on and off trails by trail users across the state of Massachusetts (Wolf and Lee 2009; Wolf and Trost 2009; Troped et al. 2008). GPS and accelerometer data can be used to answer key questions such as where does most physical activity occur (at home, a non-home location, or as a by-product of travel), how much physical activity occurs on a daily basis, and at what intensity does this activity occur. Mark Freedman of Westat stated, “We are excited about the potential opportunities to combine the transportation and health sectors in joint data collection efforts where the use of travel diaries, GPS, and accelerometer data can be combined” (2012 Industry Survey). Such a study was recently conducted by Thompson and Kayak (2011) using GPS combined with accel- erometer data to quantify individual daily activity levels. A simi- lar study by Lee et al. (2012) used GPS traces from travel surveys to assess the amount of physical activity respondents engage in when choosing active transportation modes (i.e., walking, biking, etc.). Such studies are likely to become more and more important as public health challenges in the developed world, such as obesity and cardiovascular disease, grow more prevalent (Thompson and Kayak 2011; Doherty and Oh 2012). Inclusion of a physical activity component in a household travel survey can serve health research needs, as well as pro- vide data for transportation modeling and regional planning needs. The relationship between physical activity and trans- portation planning is clear, and this knowledge is useful in the development of strategies that promote nonmotorized travel. Beyond this, understanding the relationship between physi- cal activity and travel mode must factor into the design of livable communities and other built environment planning exercises. The most recent Nashville regional travel survey was designed with these joint goals in mind. This survey was branded as the Nashville Transportation and Health Study and included a 10% subsample of households who used GPS devices and accelerometers and also completed an extensive health survey in addition to the household travel survey. Road Pricing and User Fees An ongoing study being conducted in Minnesota has looked at the feasibility of road pricing strategies using a GPS-

15 ized areas with populations exceeding 200,000, go through a certification review every 4 years per 23 CFR § 450.334(b) (U.S. Department of Transportation 2012). During this review, a TMA may be required to update the data used in its TDM, especially if the travel survey data used were collected 15 or more years previously (Murakami and Bricka 2012). In fact, for most MPOs, conducting a travel survey constitutes one of the largest routine expenditures made from its plan- ning budgets (Stopher, Alsnih, et al. 2008). Evolution of Household Travel Survey Standardization The 1990s saw an increase in the demands placed on TDMs and travel surveys as a consequence of changes put in place by the 1990 Clean Air Act amendments, the 1991 Intermodal Surface Transportation Efficiency Act (ISTEA), and other earlier legislation (Tierney et al. 1996). At the same time, travel surveys went through several technology evolutions throughout the 1980s and 1990s with the intro- duction of computerized interviewing systems, centralized call-centers, and online address validation and geocoding technology. Guidance on recommended processes and practices in travel surveys was needed to better meet these added demands by improving quality, reliability, and transferabil- ity of the collected data. These initial efforts were crystalized in the FHWA Travel Survey Manual (Tierney et al. 1996). Although this document did not prescribe standards, it did provide significant guidance by compiling material from pre- vious guideline documents as well as technical papers into a single comprehensive source. The 580-page report identified the various types of travel surveys and covered several impor- tant subjects, such as management and quality control, preci- sion and accuracy, geocoding, and emerging trends identified at the time (including stated response and longitudinal sur- veys). In addition, the report dedicated individual chapters to the main types of travel surveys identified by FHWA, which were household travel surveys, vehicle intercept and external station surveys, transit onboard surveys, commercial vehicle surveys, workplace and establishment surveys, visitor sur- veys, and parking surveys. A subsequent push for standardization came about in the early 2000s with the publication of the NCHRP Report 571: Standardized Procedures for Travel Surveys (Stopher, Alsnih, et al. 2008). This report was jointly created by a team of travel survey experts from around the world and focused on identifying aspects of personal travel surveys that could be standardized. It also provided recommendations on how to implement the drafted standards, identified areas for future research, and included templates for requests for proposals route. A disadvantage to this approach is the possibility of a failure by the participant to understand the objec- tive (e.g., participants will report a round trip instead of the requested one-way trip). Additionally, the complete- ness, precision, or accuracy of reported origin, boarding, alighting, or destination locations are not always known and not easily verifiable. While this approach is still widely used today, innovative methods are being used to replace or supplement this traditional approach. One such method used to collect and audit responses to transit surveys uses GPS-enabled personal digital assistants (Oliveira and Casas 2010). However, this approach does not fully address concerns about the accuracy of self-reported data. A method that addresses the concerns about the accuracy of the data, while also leveraging the capabilities of real-time geo- coding and GPS data, is the use of tablet devices to conduct face- to-face personal interviews (Atlanta Regional Commission 2012). The use of this combination allows for more accurate, complete, and representative responses and also provides the interviewer with details about the transit system and study area that would otherwise be unavailable. Tablets with cel- lular connectivity allow the survey managers to adjust goals in real time. While there may be concerns about the costs of the tablets or the labor-intensive nature of face-to-face per- sonal interviews, there is also evidence to suggest that the costs become fairly comparable when calculated using completed, usable surveys. Automated passenger counter (APC) and AVL systems are used frequently as a means to measure level of service (LOS), manage dispatching and scheduling, and provide feedback to drivers about schedule adherence (Furth et al. 2006). While it is possible that these technologies could be integrated to pro- vide high-quality and accurate passenger movement data, it is not a prevalent option at this time. Challenges to this pos- sibility include the fact that not all vehicles in a fleet are out- fitted with APC and AVL devices and that their integration is not always a straightforward task. Other applications of merged AVL/APC data include the estimation of dwell times. Standards, Guidelines, and Common Practices for Travel Demand Model Data Collection Household travel surveys constitute one of the most important sources of disaggregate travel behavior data for TDMs. They were initially conducted in the United States in the 1950s, and, during most of the decades since then, little has been done “to standardize the processes or to insti- tute consistent practices of acceptable quality of reliability” (Stopher, Alsnih, et al. 2008). The U.S. Department of Transportation requires that transportation management areas (TMAs), defined as urban-

16 surveys (U.S. Office of Management and Budget 2006). The document is similar in form and content to the standards and guidelines found in the BTS document and contains the fol- lowing caveat: The standards and guidelines are not intended to substitute for the extensive existing literature on statistical and survey theory, meth- ods, and operations. When undertaking a survey, an agency should engage knowledgeable and experienced survey practitioners to effectively achieve the goals of the standards. Persons involved should have knowledge and experience in survey sampling theory, survey design and methodology, field operations, data analysis, and dissemination as well as technological aspects of surveys. As with the BTS, the OMB has provided a firm general framework for survey design and deployment, and it has stressed the need to conform to sound, proven statistical methods when conducting surveys but resists providing any more concrete requirements, thereby leaving discretion to the survey design process. The U.S. Census Bureau conducts a decennial census, an economic and government census every 5 years, and the annual American Community Survey (ACS). It is common practice to use ACS question wording and choice lists in travel surveys. The bins used for questions about income, for example, are often identical to the bins found in income questions posed by the Census Bureau. Questions about race/ethnicity, gender, occupation, age, and other socio-demographic attributes may also be based on census structure, wording, or choices. These intentional design consistencies between surveys allow for comparisons at appropriate person/household and geographic aggregation levels, which are an important means of checking and controlling the sample for biases and representativeness during the survey effort. Guidelines in GPS Data Collection and Basic Processing for Travel Surveys Although U.S. federal agencies do not provide formal stan- dards or guidance on how to use GPS technology in travel surveys, there exists a considerable body of literature on the topic. There are also standards that are related to how GPS data are collected, archived, and shared, which are applicable to household travel surveys. This section discusses these two topics. The next section of this chapter covers the literature review results in the area of GPS data processing and imputa- tion methods. Guidance from Technical Literature Wolf proposed guidance in the form of simple steps that can be used to convert a stream of raw GPS points into trips (Wolf 2000). The process included the filtering of GPS points (RFPs). The main aspects for which standards were devel- oped and presented in this report were: • Design of survey instruments, • Design of data collection procedures, • Pilot surveys and pre-tests, • Survey implementation, • Data coding and geocoding, • Data analysis and expansion, and • Assessment of survey quality. Several of these standards were incorporated into the design of the 2009 National Household Travel Survey (NHTS). More recently, members and friends of the Transportation Research Board’s Travel Survey Methods Committee (ABJ40) have started maintaining an online version of the travel survey man- ual (http://www.travelsurveymanual.org). Initial content for the website came primarily from NCHRP Report 571 (Stopher, Alsnih, et al. 2008) and FHWA’s 1996 Travel Survey Manual (Tierney et al. 1996). Since its initial release, the content has been updated and expanded by the professional community. The remaining parts of this section summarize the main aspects of these latest standards with regard to how they affect the collection of travel survey data for supporting GPS data processing and augmentation, present a review of the basic elements of a successful GPS-enhanced travel survey, and present three examples of recent travel surveys that illustrate different design approaches. Relevant Travel Survey Standards Guidance from Other U.S. Federal Agencies Other federal agencies that provide guidance relevant to travel surveys are the Bureau of Transportation Statistics (BTS), the Office of Management and Budget (OMB), and the U.S. Census Bureau. After the passage of ISTEA in 1991, the BTS was created in 1992 for the purpose of administer- ing “data collection, analysis, and reporting and to ensure the most cost-effective use of transportation-monitoring resources.” The stated mission of the BTS is “to create, man- age, and share transportation statistical knowledge with public and private transportation communities and the nation” (U.S. Bureau of Transportation Statistics 2005). To that end, the BTS has released a statistical standards manual that provides general guidelines about the planning and design of all types of sur- veys run by a government agency and includes guidelines for the actual collection of data, the processing of data, and subse- quent analysis, dissemination, and evaluations of data quality (U.S. Bureau of Transportation Statistics 2005). The OMB oversees the Office of Information and Regula- tory Affairs, which hosts a document developed to provide guidance on the development and deployment of statistical

17 a serial protocol; parsing these messages yields information that can be used for navigation or logged for later processing. The GPS exchange format (GPX) uses XML (extensible markup language) schema for the lightweight encoding of waypoints, routes, and tracks (Foster 2004). This format is very popular in consumer-grade devices and is supported by several websites and web services. Another XML-like standard that has been used for exchanging GPS data is the Keyhole Markup Language (KML), which can be used by both Google Earth and Google Maps to display GPS data on an image of a globe and map respectively (Google 2012). Since its initial adoption, KML has also become an Open Geospatial Consor- tium (OGC) standard (Open Geospatial Consortium 2008). GeoJSON is a geospatial data interchange format based on JavaScript object notation (JSON) and is commonly used in web development (Butler et al. 2008). The focus of these standards is on mapping and navigation applications, mak- ing them less than suitable to travel survey and transportation planning applications. More recent advances in both the capabilities and usage of web and mobile (i.e., smartphone and tablet devices) map- ping APIs have generated a few de facto standards that the information technology industry follows when processing GPS data, such as: • Point coordinates are stored in decimal degrees using the World Geodetic System (WGS) 84 datum, • Point date and time information are provided in coordi- nated universal time (UTC), • ISO 8601 or NMEA 0183 formats are used when encoding date and time information into text, and • Speed is indicated in meters per second. Imputation and Data Fusion of Travel Behavior Details Travel behavior researchers have recognized that detailed travel data from GPS traces combined with other trip details and geographic information has the potential to provide travel behavior details without participant reporting. For example, trip ends derived from passively collected GPS data can logically be combined with geographically referenced land use data to estimate trip purpose (i.e., home-based shopping trips). Several other travel details can be similarly estimated when combined with other measured or reported data elements. These processes of imputation and data fusion are attractive to travel behavior researchers because data generation moves away from participant-reported variables and more toward measured variables, which are deemed more reliable. Further, there is growing interest in using GPS data generated through consumer devices and apps that are archived and resold by private companies. This type of data based on quality indicators and zero speed followed by the computation of dwell time at each point. It was reported that a dwell time of 120 s worked well for identifying most trips. Stopher, Jiang, and Fitzgerald provided additional guidance on how to prepare raw GPS point data for processing along with suggested thresholds for GPS data-quality indicators, more detailed procedures for filtering non-movement, and suggestions for how to detect and handle cold starts and sig- nal dropouts (Stopher, Jiang, and Fitzgerald 2005). More spe- cifically, the team proposed that GPS points with zero speed and that show movements of less than 15 m be removed dur- ing filtering (Stopher, Jiang, and Fitzgerald 2005). Tsui and Shalaby (2006) suggested that points with fewer than three satellites in view and with horizontal dilution of precision (HDOP, a measure of the quality of a GPS coordi- nate solution, where smaller numbers indicate better data) values above 5 be automatically removed, as well as points with zero directional heading and speed values. The resulting points are then reviewed for positional jumps, which tend to occur in urban canyon areas. Schüssler and Axhausen (2009b) used altitude in lieu of traditional dilution of precision qual- ity indicators in cases where the collected data do not contain them by removing points that reported unrealistic altitude readings. Schüssler and Axhausen combined this filtering with the examination of points with near-zero speed for a mini- mum of 120 s to detect trip ends; activity locations were sub- sequently identified by looking at “bundles of GPS points” consisting of at least 15 points in sequence (Schüssler and Axhausen, 2009b). Alvarez-Garcia et al. (2010) used a mini- mum distance threshold of 30 m between points to filter raw data before identifying stops, while Lawson, Chen, and Gong (2010) employed 50-m buffers to detect trip ends by search- ing for records outside a point buffer within a 120-second time window. With regard to device selection, information can be obtained in reports from previous projects as well as from findings in research papers. However, given the rapid evolu- tion and constant change in the consumer market for GPS data loggers, it may be necessary to conduct independent evaluations such as the ones contained in Lawson et al. (2008) and Anderson et al. (2009). Related GPS Data Standards Applicable to Travel Surveys The National Marine Electronics Association (NMEA) developed one of the earliest standards for encoding data from GPS receivers, as well as for use with other navigational sensors. This standard, which is named NMEA 0183, is sup- ported by most GPS receivers and has undergone several updates since being released in the 1990s. The NMEA 0183 standard defines a set of text messages that can be sent over

18 up receiver positional reacquisition. Cold start events can be detected by comparing the end and start locations of adjoining trips and searching for distance gaps. The trip start location and path can then be corrected using information from the previous trip as well as geographic information system (GIS) data. Signal dropouts occur when the receiver no longer has a lock on the minimum number of satellites needed to com- pute a positional fix (i.e., three for a two-dimensional solu- tion and four for computing a three-dimensional solution). Typical causes of signal dropouts are urban canyons and overhead blockages such as bridges, tunnels, and tree foli- age. Signal dropouts can be detected by computing the speed between the starting and ending points of a data gap (defined as a gap consisting of two points that are separated by a time interval longer than a multiplier of the logging epoch but shorter than the defined trip end criterion) and comparing it with a minimum movement speed. If the computed aver- age speed is greater than the minimum movement speed, the dropout can be ignored. If the computed average speed is less than the minimum movement speed, then it should be inspected for possible short stops like those that occur when picking up and dropping off passengers. These methods are usually combined with analyst follow- up and inspection procedures to ensure that consistent and accurate results are produced. These are especially important when reviewing and tagging short stops that last less than the usual trip end delay criteria of 120 s (Steer Davies Gleave and GeoStats 2003). Determining Basic Trip Details Once the end points of trips are identified, it becomes a simple computational problem to derive basic trip attributes. For example, one can define P as the ordered set of n GPS points (pi) belonging to a trip, D as a function that returns the distance between any two points pi and pi+1, and T as the function that returns the amount of time between any two points. Using these, one can identify the basic trip attributes using the formulae that appear in Table 1-5. Additional filtering and post-processing of the GPS points may be conducted before these formulae are applied. For does not come with additional travel details other than the GPS trip trace and is limited in usefulness unless additional behavioral details can be imputed. The following sections present how these tasks have been accomplished by research- ers and practitioners. Trip End Identification Before trip identification can occur, it is often necessary to perform basic formatting of the logged data to have it in a consistent format that is convenient for processing. This includes converting date and time information (which is often provided in UTC) to local date and time, as well as performing unit conversions (i.e., some devices may report speeds in atypical units such as knots). It is also important when performing these initial steps to be aware of the log- ging rules used to configure the GPS loggers prior to data collection. These rules define when and how often a new GPS position is to be logged. For example, knowing the expected logging interval helps in detecting signal dropouts. An initial data quality step is often undertaken prior to any analysis to remove data points identified by the device as having too few satellite connections or low precision due to poor satellite configurations. Next, the remaining points are processed to remove fur- ther erroneous observations to produce reasonable traces. The traditional method involves the filtering of GPS points moving at very slow speed (e.g., less than 1 mph) followed by the computation of time intervals between subsequent points; these time intervals represent time over which the logger did not move. Whenever a gap of 120 s or more is found, a new trip end is placed (Wolf, Guensler, and Bachman 2001). This method has also been extended by looking at distance covered between points, spatial buffers, and heading changes (Stopher, Jiang, and Fitzgerald 2005; Lawson, Chen, and Gong 2010). Other studies have used various clustering algorithms such as k-means clustering (Ashbrook and Starner 2003), spatial density analysis (Flamm, Jemelin, and Kaufmann 2007), and land-use–constrained spatial buffering (Auld et al. 2009). Once the initial set of trip ends is identified, it is necessary to perform additional processing to deal with the potential presence of cold starts and signal dropouts. Cold start events occur when the GPS receiver is either powered down or has not acquired satellite signals for an extended period of time (i.e., more than a few hours). Under these circumstances, a receiver may take several minutes to restart acquiring and reporting GPS positions, which results in the start portion (or the entirety) of the trip not being cap- tured. Smartphones and other connected devices can shorten this time through the use of assisted GPS technologies that use the cell-data network to download updated satellite orbit information and time offsets, which can be used to speed Name Formula Origin location and time p1 Destination location and time pn Trip duration T(pn, p1) Trip distance Trip path P Table 1-5. Basic trip attributes from GPS.

19 SOW points follows the same logic, but in reverse order, while each point after a time difference of at least 80 s is marked as a potential EOG point. Travel Mode Identification Different approaches have been used to associate a travel mode with a sequence of GPS points. Most mode identifica- tion algorithms rely on central tendency measures of instan- taneous GPS point speeds, such as mean, median, mode, as well as indicators of speed variability, such as acceleration and deceleration rates, standard deviation values, and maximum speed and acceleration values for different high percentiles (e.g., 85th and 95th), as well as measures of positional quality. The main methods used to perform mode identification can be classified into three groups: rule-based, probabilistic, and artificial intelligence. Artificial intelligence methods include both fuzzy logic and neural network applications. Stopher, Clifford, and Zhang proposed a hierarchical, rule- based process in which candidate modes are tested against the GPS data in the following predetermined sequence: walk, bicycle, off-street network transit, bus, and auto (Stopher, Clifford, and Zhang 2007). In all cases, the 85th percentile values for speed, acceleration, and deceleration are used to rule out a travel mode candidate. The authors point out that using the 85th percentile values for the decision variables has the benefit of dealing with outliers typically found in person- based GPS logs (i.e., the few points that stray from the cap- tured trajectory). Oliveira et al. (2011) used pre-computed values for average, maximum, and standard deviation of mode speeds to select the most likely candidate. The first step in the process was to compute average and standard devia- tion values of the trip segments’ point speeds. The algorithm selected the travel mode that most closely matched the values for average and standard deviation of point speeds while hav- ing its 95th percentile speed lower than or equal to the mode’s maximum speed. Oliveira et al. (2006) demonstrated how a probabilistic modeling approach could be used to identify travel mode using data obtained by fusing GPS points with personal accelerom- eters. The developed multinomial logit model was shown to accurately identify modes for 75% of the validation cases. It was also found that using the accelerometer data improved the identification accuracy for nonmotorized travel modes. Moiseeva, Jessurun, and Timmermans (2010) also used a prob- abilistic approach, this one called the Bayesian belief network, to identify travel modes achieving a high accuracy rate of 92%. Fuzzy logic has its roots in machine control and is appli- cable to imprecise situations, which cannot be defined with crisp true-and-false rules. For example, trips can be defined in terms of slow, medium, or fast using median travel speed. Most people typically do not reason that there is an exact cutoff example, the points’ positions could be smoothed to remove outliers before computing distance or generating a final trip path (Li, Guensler, and Ogle 2005). Travel Mode Detection and Processing Once the data are segmented using basic trip ends, it is necessary to detect mode transitions that may have occurred within a GPS trip. The resulting sub-trips are often referred to as trip or mode segments or elemental trips. Once these are identified, travel modes are assigned. Detecting Mode Transitions Work presented by de Jong and Mensonides proposed an approach whereby trips were segmented into single-mode stages based on the assumption that a short period of zero speed was necessary for each mode change (de Jong and Mensonides 2003). The travel mode of the stages was then determined by leveraging the speed characteristics and the proximity to public transport stops and routes. In addi- tion, the proposed logic tested whether the generated mode sequence was reasonable; for example, the logic would not allow direct transitions from bus to auto without an interme- diate walk stage. The logic uses the fact that the walk mode has consistently low speeds and accelerations. Tsui and Shalaby presented an integrated system to process person-based GPS data for travel surveys (Tsui and Shalaby 2006). This system (GPS-GIS) included two versions; version one included modules for performing data filtering, identify- ing trip ends, and detecting mode transitions within trips and mode identification, while version two used link matching of the GPS data to a GIS representation of the transporta- tion network to support further GIS-based processing. The mode transition identification module in version one of the GPS-GIS system segmented each trip into single-mode stages by finding the points where the mode changed from walk to another mode or vice versa; the authors referred to these as mode transfer points (MTPs). Schüssler and Axhausen (2008) implemented a mode tran- sition detection system based on the one proposed by Tsui and Shalaby (2006), with the original implementation featur- ing three types of MTP: end-of-walk (EOW), start-of-walk (SOW), and end-of-gap (EOG) points. The EOG point was used to indicate the end of a period with GPS signal loss. For each transition from a speed below 2.78 m/s to above 2.78 m/s, the algorithm searches backward until the next point with a speed above 2.78 m/s or until at least three consecutive GPS points with a maximum acceleration of 0.1 m/s2 are found. In this case, the last of the trailing points with small acceleration values were marked as being potential EOW points; otherwise, no EOW point was detected. The procedure for the potential

20 quite possible. Processing enhancements such as including trip chain logic, evaluating related geospatial data, and expand- ing to traveler-specific behaviors across multiple days may improve trip mode identification; however, research results in these areas were not formally published at the time the literature review was performed for this NCHRP study. Route Identification One of the post-processing steps implemented with GPS data sets is to relate their point locations with spatial data sets representing the transportation network in a process known as map matching (MM). When applied to GPS data sets, it allows the identification of the routes taken on the network. MM processes can be performed on the fly (i.e., in a navi- gation device) or as a post-processing step to previously col- lected GPS trajectory data sets. The first application has been covered extensively in the literature and is mostly concerned with accurate predictions of where a GPS receiver is along a link and does not necessarily need to run faster than real time. This discussion focuses on the latter application, which is frequently applied in the context of household travel sur- veys (Doherty et al. 2000), performance measurement studies (Marchal, Hackney, and Axhausen 2005), and the analysis of utility vehicle behavior (Blazquez and Vonderohe 2005). Some additional applications of the results of a post-processing MM exercise of GPS point trajectories to a transportation network are to associate speed information to network links, identify locations where congestion occurs, generate data sets that help to understand route choice, and associate network data ele- ments with the GPS data. Initial map-matching algorithms used with GPS data would identify match candidates either by finding the clos- est line feature to the point or locate the line feature with the shape point closest to the GPS point. Greenfeld proposed an algorithm that created lines between subsequent GPS points and used similarity comparisons to decide which network link best matched each GPS point pair (Greenfeld 2002). A weight- ing scheme was used to balance the degree of parallelism between the GPS line segment and the link, the shortest dis- tance to the link, and the size of the intersecting angle between the GPS-derived line and the street arc. White, Bernstein, and Kornhauser (2000) suggested a similar algorithm that used differences in heading to rule out matches. Blazques and Vonderohe (2005) developed a rule-based MM algorithm that made use of shortest-path computations and turn restriction information to verify matches as the route was built. Although a high success rate was observed while applying this approach, the authors felt that it could be further improved, particularly when poor quality data were used, at the intersection of divided highways and where false negatives failed to snap (Blazquez, Ponce, and Miranda 2010). between someone traveling slowly and fast. Fuzzy logic allows you to describe a speed value in terms of how slow and how fast it is at the same time using continuous values from 0 to 1 (Lawson, Chen, and Gong 2010). Tsui and Shalaby (2006) and Schüssler and Axhausen (2008) employed a fuzzy logic basic approach to define whether a mode candidate was suit- able to a set of GPS points; in both cases an ordered set of most-likely candidates was associated with each trip segment to be classified. The fuzzy variables used in the membership functions were average speed of GPS points, 95th percentile maximum speed of GPS records, positive median acceleration of GPS records, and data quality of GPS records (based on HDOP). Tsui and Shalaby performed additional discrimina- tory analysis on the initial set from the fuzzy logic functions utilizing the results from a map-matching procedure, further refining the resultant choice set (Tsui and Shalaby 2006). Neural networks are able to generalize conclusions for data without the need to define relationships or rules in an a priori manner. A neural network can learn the subtle differences between car, bus, and walking trips and, therefore, automati- cally detect the mode of transportation for a new, previously unseen trip (Gonzalez et al. 2008). Byon, Abdulhai, and Shalaby (2007) demonstrated how neural networks could be used to automatically detect the mode of transportation. Their research used data collected using a laptop connected to GPS receivers; attributes used in their neural network included instantaneous acceleration, speed, and HDOP. The impact of different log- ging frequencies was analyzed, with the authors finding that mode detection performance was close to 80% with sampling intervals as long as 3 min (Byon, Abdulhai, and Shalaby 2007). Shorter reporting intervals produced better results. Gonzalez et al. (2008) took the application of neural net- works to model identification further by examining how well it could work on data collected using mobile phones equipped with assisted GPS technology. The authors also showed how the method could be applied to a reduced version of the input data, which were called “critical points.” These were defined as the minimum set of points necessary to reconstruct a par- ticipant’s path (Gonzalez et al. 2008). Lawson, Chen, and Gong (2010) conducted a controlled experiment where the performance of three mode selection methods was compared. The first applied a rule-based algo- rithm similar to the one proposed in Stopher, Clifford, and Zhang (2007), the second implemented the neural network method previously described by Gonzalez et al. (2008), and the third was the approach documented in Schüssler and Axhausen (2008). The overall result was that the neural net- work approach produced the best results, with a success rate of 84% (Lawson, Chen, and Gong 2010). Research on improving instantaneous and post-processed travel mode identification techniques is ongoing, and improve- ments in the reported success rates stated in this section are

21 added. Once each decision to extend the route with a link is made and accepted, this link will remain on the final selected route. MM procedures based on the multiple hypothesis tech- nique (MHT) address this limitation by maintaining several alternative paths and eventually selecting the best one at the end of the process. Pyo, Shin, and Sung (2001) demonstrated how MHT can be used in a navigation setting, which included integration with a DR device. However, due to its origins in navigation, the proposed logic focused on the accuracy of the point projections on the network instead of the final vehicle path. Marchal, Hackney, and Axhausen (2005) applied MHT to the problem of post-processing MM, with a focus on the opera- tional performance. This meant that only the two-dimensional GPS point coordinates along with the line features represent- ing the network were used as inputs. Furthermore, the scoring of point snaps and routes was mostly based on the distance between the GPS points and the network links. One of the findings of this application of MHT was that reasonable results could be obtained despite these simplifications. The authors also looked at the impact of keeping different numbers of can- didate routes during the course of a route derivation; it was found that keeping 30 candidates produced quality results at reasonable computational speeds. This algorithm was later extended by Schüssler and Axhausen (2009a) to be used within the context of trips with the added ability to fill in gaps in the identified routes. Trip Purpose Identification Identifying trip purpose or activity at destination from GPS-derived trips remains a difficult problem to solve. This is because it inherently requires a multi-factorial approach where extensive data sources are combined with the basic trip attributes derived directly from the underlying trace data. Wolf, Guensler, and Bachman (2001) demonstrated how trip end locations could be matched to GIS data to derive basic trip purpose classifications. At the time of the research it was noted that the trip purpose determination step worked well for approximately 78% of the test cases. Schönfelder and Samaga (2003) used a multistage hierar- chical matching procedure to infer trip purposes. This process involved the calculation of clusters of trip ends, the identi- fication of trips whose purposes could be trivially deduced, and determining relationships between trip purposes and the socio-demographics of the respondents, as well as time of day when the activity occurred. Since the authors did not have information on the true trip purposes, the distribution of the inferred purposes was compared to that from a regional household travel survey. The results indicated differences in a number of purposes, including private business, work and work-related, shopping, and leisure activities. The improved algorithm added dynamic resizing of the match distance tolerance, filtering of snaps based on the vehicle azi- muth (or heading), and a revised set of matching rules. Quddus et al. (2003) proposed an algorithm for use in a real-time navigation application that fused GPS points with data from a dead-reckoning (DR) device using a Kalman fil- ter. It made use of weights to compute a score for each match. The score took into account the distance between the GPS points and the network arc, as well as the deviation between the point’s heading (computed by the DR device) and the bearing of the link, and the position of the point relative to the link. Byon, Abdulhai, and Shalaby (2007) also proposed a MM routine to be used in real-time monitoring applications using data from GPS-enabled mobile phones. Quddus, Noland, and Ochieng (2006) proposed an algorithm that used fuzzy logic to deal with the errors and uncertainties present when matching GPS points to a road network and that also used data from a DR device. It used multiple inputs to evaluate if a GPS point should be matched to a link, but placed emphasis on heading differences and the perpendicular dis- tance to links. Fuzzy inference systems were used to combine the multiple input variables to generate likelihoods of links being appropriate matches in the different stages of the algorithm, namely the selection of the first route link, the determination of whether a point snapped to the current link, and the selec- tion of links to be added to the route. The MM process proposed in Velaga, Quddus, and Bristow (2011) built on the approach proposed by Quddus et al. (2003) by adding an additional step that optimizes the weights used in the scoring scheme using a genetic algorithm optimiza- tion technique. The authors also proposed the use of different weights based on the operating environment where the GPS data were collected (e.g., urban, suburban, and rural). A common characteristic of these and most existing MM algorithms is the concept of match modes, with most of them containing an initialization mode, which identifies the first route link; a link snapping stage, which computes projec- tions, or snaps, of the GPS points on the current link; and a new link search mode, which identifies the next link to be added to the route. The algorithm proposed by Dalumpines and Scott (2011) did not follow this pattern; it instead relied on network topology and used the GPS trace to construct a series of gates that limited the road network connectivity. This way, a shortest-path solution connecting the start of the trace to its end was likely to match the actual route taken. This process assumed that a complete and up-to-date road network, including information on turning restrictions, was available. It is not likely to be able to handle incomplete road networks, such as the ones typically used by travel demand models (i.e., missing local and collector type roads). Most MM algorithms in the literature reconstruct the orig- inal route using a linear process with a single link at a time

22 (2008) with probabilistic multinomial logit models (MNL). The approach consists of two steps: (1) clustering trip ends into origins and destinations and (2) identifying trip purposes. The second step is performed through two sub-steps; the first one assigns trip purposes deterministically, while the second applies MNL models to calculate the probability of four trip purposes: work/school related, personal business, shopping, and social recreation. Two MNL models were developed, one for home-based trips and another for non–home-based trips. The factors taken in as independent variables in these models fell into three different types: time of day, history dependence, and land use characteristics. The authors reported 67% and 78% match rates for home-based trips and non–home-based trips, respectively (Chen et al. 2010). Cell Phones, Personal Navigation Devices, Smartphones, and the Emerging Role of Consumer Technologies in Travel Behavior Research Cell phones, PNDs, smartphones, and other consumer technologies are commonly used across most demographics and are now viable sources of travel data. Unlike documented research studies and applied public-sector data collection, data and methods for collecting and using GPS capabilities found in consumer devices and applications are closely guarded by the private companies that are developing commercial prod- ucts. Further, consumer products and applications change rapidly, and systems evaluated one day may be irrelevant the next. For this section of the report, attention will be focused on documented studies using consumer products and the most relevant but generalized capabilities of these products for extracting travel behavior details. First, active data collec- tion strategies are presented, followed by the more prevalent passive data collection approach. This section concludes with a sample of current offerings from private data providers. Research institutions and private companies have devel- oped capabilities to use consumer devices to generate travel behavior data in both an active and a passive manner. Active data collection refers to the use of common personal con- sumer products to administer a survey or seek direct user feedback. Passive data collection refers to the use of the archived travel data generated by consumer technologies and consumer applications. From a transportation planning or research perspective, the technology and data access methods are less important than the characteristics of the resulting data set. However, researchers and practitioners must be aware of the limitations of new technology solutions and new consumer data products, particularly before they are applied in a forecasting model. All solutions, regardless of type, require three key considerations Bohte and Maat (2009) also used GIS data (land use and points of interest) to pre-assign trip purposes to participants in a GPS PR survey. The developed survey system applied corrections to the derived purposes based on responses from participants. This rather simple trip identification approach was able to correctly predict purpose for 43% of the collected trips (Bohte and Maat 2009). The ability to accumulate these corrections across multiple participants was later added and pilot tested by Moiseeva, Jessurun, and Timmermans (2010). Stopher, Clifford, and Zhang refined this method with the addition of several improvements, including the use of frequently visited locations (e.g., home, work, and school) and activity duration (Stopher, Clifford, and Zhang 2007). More recently, improvements have been made to this core method by adding the concept of tours to the data process- ing and review (Shen and Stopher 2012). The latest version of this method was applied to the GPS data collected in the 2009–2010 Greater Cincinnati regional travel survey, in which every member in the household over the age of 12 was asked to carry a passive GPS device for 3 days. A subset of the households also completed a PR interview where the GPS- derived trips were presented and additional attributes, includ- ing trip purpose, were added. The 4,133 GPS trips from the PR data set were used to evaluate the accuracy of the method. The authors found that the accuracy of the base method, without the tour-based corrections, was approximately 59%. The added validation and corrections were able to increase the accuracy to 67%. The authors also compared the derived trip purpose distribution with that generated using data collected as part of the 2009 NHTS and found that the two were not significantly different (Shen and Stopher 2012). Griffin and Huang (2005) used decision trees, built using the C4.5 algorithm, to identify trip purposes for GPS activ- ity locations. The authors reported a high accuracy in the determined purposes, but the focus of the paper was on the clustering approach used to determine trip end locations from the GPS data stream. McGowen and McNally (2006) demonstrated the use of classification trees and discriminant analysis to identify the most likely trip purpose. The devel- oped models were based on personal, household, and trip attributes. A total of 22 different variables were employed, with the source of the data being the 2000–2001 CHTS. Only out-of-home activities were considered in the developed models, and these accounted for 40% of destinations. The reduced set of purposes contained 26 types, which were also aggregated into five major activity categories. The developed classification trees and discriminant models performed very similarly, with average accuracies in the 73% to 74% range for the major activities and 62% to 63% in the 26 disaggregated activities (McGowen and McNally 2006). Chen et al. (2010) combined the approaches proposed in Schönfelder and Samaga (2003) and Stopher, Alsnih, et al.

23 • Smartphone use is not biased by race, but is biased toward those with a high income and those who are well-educated, urban/suburban, and under age 50. As transportation planners and researchers identify new methods to access travel data, statistics such as these can provide some insight into survey design and data usage limi- tations. The use of cell phones and smartphones across differ- ent races appears to be unbiased, which is encouraging given that travel data from minorities have been difficult to access using traditional or GPS-based methods (Bricka et al. 2009). It should also be noted that known bias in device usage can be accounted for in survey design. In fact, some of the bias in device usage may aid in capturing individuals that do not respond well to traditional methods. Active Data Collection from Consumer Devices Interest in the active use of smartphones for surveying has increased in the last several years as travel behavior research- ers have tried to find ways to reduce respondent burden and the equipment costs (and deployment costs) for GPS-based travel surveys. Given that smartphones are typically equipped with GPS (plus accelerometry and motion sensing capabili- ties) and carried by users almost everywhere, the attraction to tap into these devices for location and human activity infor- mation as well is logical. Before smartphones were widely available, active electronic data collection was possible using personal digital assistant (PDA) devices configured with GPS receivers and other exter- nal devices. One early example of such an instrumentation package was the Electronic Travel Diary solution developed for the physical activity sub-survey of the 2000 SMARTRAQ travel survey done in the Atlanta region. The package included a Palm III device and a wearable GPS data logger; custom programming was done to encode the main portions of the travel survey into the Palm device, which was later fused with the GPS data in a post-processing step (Wolf et al. 2000). The TRAC-IT application is one of the first examples of using smartphones to actively collect travel behavior data. TRAC-IT was developed in 2007 for use in a Florida DOT research project with the objective being to “better under- stand and pattern household travel behavior for the purpose of educating, promoting, and encouraging households to use alternatives to driving alone.” The application required users to actively start and stop trip capture, followed by the input of information such as place type, purpose, mode of transporta- tion, and travel companion counts. The phone captured and transmitted GPS trace information while the trip was active. Using the data collected, the application provided feedback to participants in the form of personalized suggestions on for evaluating a consumer data product’s ability to accurately identify travel behavior: 1. Identifying data bias—Data from consumer products can be biased because it is the user’s choice to purchase and operate the product. Some demographic groups have been slower to use consumer products and, therefore, may not be repre- sented in the data samples. It should also be noted that just because a device may be more prevalent in a certain demo- graphic does not mean that the data has the same bias. For example, travel speed data from a smartphone may not be biased because the user must follow traffic laws like everyone else, thereby negating some of the demographic bias. 2. Verifying data quality—Location and attribute accuracy is important for understanding potential error ranges in subsequent analyses. Furthermore, the consumer product market is very competitive, and independent validation of quality is important until data standards are implemented that provide some level of quality assurance. 3. Ensuring privacy protection—Given that these data sets come from private consumers, special attention to privacy protection must be maintained. Planners and researchers must be assured that they are legally protected and that the original consumers have provided consent. Existing and proposed legislation at the state and federal level is focused on this issue. Initial questions that arise with consumer products address market penetration. For example, how many people use the data-generating product, and is the user group demographi- cally biased? The Pew Research Center provides market statistics about consumer product usage and technology penetration into American society (Pew Research Center 2012). These statistics are useful in determining any high-level bias that might be intrinsic in the use of consumer products. The fol- lowing statistics were noted in April of 2012: • One in five people do not use the Internet. The non-users are biased toward senior citizens, Spanish speakers, those with less than a high school education, and those with low income. • Those that do use the Internet use it very frequently. • Eighty-eight percent of American adults have a cell phone. • More demographic groups are using smartphones as their main Internet access source (including minorities and low- income households that have been traditionally low-level Internet users). • The use of cell phones is not biased by race. • Forty-six percent of American adults have a smartphone, and the market share is growing. In less than 1 year, between May 2011 and February 2012, estimated smartphone own- ership increased by 11%. The forecast is for this trend to continue.

24 crumbs created by everyday technologies. The transportation community has been slower than several other disciplines in finding applied roles for consumer data to support tradi- tional transportation planning. Most of the consumer data products currently used in transportation are targeted at sup- porting traveler information systems and marketed directly to consumers. Within the past few years, consumer data mar- keted for public agency consumption has found footing and is now poised to play a role in improving understanding of travel behavior and supporting future transportation plan- ning decisions. These sources of passive travel data are the result of stan- dard technologies that are used in everyday life (i.e., com- munication, navigation) but also generate archival traces of information. While the list of potential consumer products includes familiar devices such as mobile phones, smart- phones, and PNDs, there are other, lesser-known devices that can also be used as sources of travel data. Given that products and product updates come and go with regularity, one way of categorizing each solution may be to base the categories on the fundamental location technology and the data collection method. Most consumer data that are available to public agencies today are generated for a purpose other than supporting travel behavior analyses. Almost all original products were designed to support real-time traffic data offerings where sensor, GPS, and/or cell phone data were collected in real time and translated into link-level speeds. As this market took hold and as data archives grew, other uses for the data arose. These archival data sets can be extensive with respect to the amount of data collected, the duration of the data collection period, and the geographic coverage. Generally, these data are converted into a data product (e.g., NavTeq traffic pat- terns data) or a data query service (e.g., such as that offered by TomTom) that are, in turn, offered for purchase. More recently, these products have been adapted to public-sector planning applications promising details of population move- ment or facility-based performance. Table 1-6 lists potential technologies, example providers, market focuses, and potential travel behavior values. From the public-sector perspective, there are a number of attractive advantages and complicating disadvantages when considering these data products. Advantages of these data sets include that they contain: • A massive amount of data, • Data that are both comprehensive and continuous (i.e., many days of data), • Data that represent real-world experiences/conditions, and • Raw data that allow for a wide variety of delivered prod- uct solutions custom made for certain studies (e.g., speed, origin–destination, delay, and turning movement studies). how to save time and money by streamlining travel behav- ior (Center for Urban Transportation Research, University of South Florida, 2012). Another good example of active data collection using smartphones is the CycleTracks app, which was originally developed by the San Francisco County Transportation Authority to document bike routes with the purpose of using them to build the model network support for bicycles. The application design requires users to actively start and stop logging GPS data and to associate comment information with the collected trace. The original CycleTracks source code was made available publicly using an open-source license. In 2011, NuStats modified this code base to create a proof-of- concept app called PTV Pacelogger. This effort was done in close collaboration with Portland Metro for use in a pilot survey in Portland within the context of Oregon Household Activity Survey (Bricka and Murakami 2012). Smartphones have also been used to collect vehicle-based GPS data as part of the Minnesota DOT Mileage Based User Fee demonstration project. In this project, smartphones and supporting equipment were provided to 500 volunteer participants, primarily in Minnesota’s Wright County, with the intent of keeping the phone permanently installed in the vehicles for the duration of the study. Participants were asked to use the equipment for 6 months and agreed to pay a mileage-based fee accumulated during the testing period. The collected GPS data were used in the smartphones to estimate charges based on roadway type, geography, and other factors and were not transmitted to the central processing location (Battelle Memorial Institute 2012). Passive Data Collection from Consumer Products Transportation planners and decision makers are in a dynamic age of transportation data availability. The rapid market penetration of location-enabled consumer technolo- gies is providing new, nontraditional sources of travel data regarding persons, vehicles, and transportation networks. Mobile phones and personal navigation devices can generate massive amounts of archival data regarding personal travel, and this information is increasingly used by transportation professionals in planning and research. Initial consideration is generally met with enthusiasm regarding the potential of the data to identify detailed travel behavior but is soon tem- pered with questions regarding bias, data quality, value, and integration into existing models and planning procedures. Further, planners and researchers are facing these issues in a rapidly evolving consumer marketplace where formal guid- ance is limited and new product offerings are common. There is a wide range of active research initiatives into social behavior patterns, as evidenced by the digital bread-

25 logging logic, point resolution/frequency, post-processing algorithms, coverage, bias, quality, and privacy protection. One method of validation that can be implemented in these early travel behavior products is to compare results to other travel surveys like the NHTS. Comparing average trip length distributions for certain trip types (home-based work, home- based school, and non–home-based) may be a simple way to evaluate a product’s potential. Early Proof-of-Concept Studies for Passive Data Capture The earliest use of data from consumer products relied primarily on cell phones and occurred as part of academic studies for traveler information systems. A research paper by Qiu and Cheng provides a good summary of the early history of cell phone use for traveler information systems (Qiu and Cheng 2007). As early as 1996, researchers were exploring the concept of using cell phone signals to find the location of a phone and to estimate its travel speed. These studies were precursors to efforts in using the same technology to measure travel behavior. Researchers at Kobe University developed methods for extracting travel details from mobile phone data (Asakura and Hato 2004). They developed techniques and algorithms for extracting movement information based on archived cell phone activity data. Their procedures were successfully tested on 100 participants. They also recognized that additional data besides time and location are needed for use in travel behavior applications. Researchers in South Africa tracked cell phone data for 83 participants over a 2-day period where time and position were updated every 5 minutes (Krygsman and Schmitz 2005). Findings showed that it was possible to use cell phone track- ing techniques to generate time and location information for activities. They also noted that substantial effort was required Some disadvantages of these data sets are that: • They have privacy issues with releasing detailed travel data, • They are unproven in applied travel behavior models, • They have a potential demographic bias in source data (i.e., actual demographics are unknown or proprietary), • They have uncontrolled equipment usage (users can acti- vate or deactivate device), • They are unable to automatically identify short trip ends with low-resolution data, • Current products are heavily aggregated and are typically averaged at segment levels, • Black-box processing (i.e., proprietary algorithms) makes data quality uncertain, and • There is a dynamic market space—data providers and products might not be around for long, may change con- tent, or may shift focus. Passive location data are obtained from the user in differ- ent ways. Some data are obtained from remote sensors, other data may be directly uploaded from the consumer device, and some data may be volunteered from users. The primary methods of gaining access to passive location data from con- sumer products are: • Cellular activity detection by cell towers, • User-installed application with automatic data upload of location data, • Imbedded application within product not under user control, • User-initiated data upload or sharing of location data, and • Signal detection by sensors (Bluetooth, Wi-Fi, etc.). From a traveler behavior standpoint, the method of data upload or access is less important than the various data char- acteristics that define its quality and usefulness, including Technology Example Provider Primary Market Focus Potential Public-Sector Interest In-vehicle navigation device TomTom Navigation and real-time traffic information Transportation system performance, repetitive travel patterns In-vehicle service OnStar Location-based services Origin–destination data, parking, transportation system performance Mobile phone tower-to-tower handoffs AirSage Traffic data, population movement data Origin–destination data, population movement, long- distance travel times Smartphone application INRIX Traffic Real-time and predictive traffic information Transportation system performance, origin–destination data, trip-making patterns Table 1-6. Emerging consumer data sources.

26 2011). Data were collected over a 60-day period from more than 600,000 Sprint phones. Data were tabulated to gen- erate origin–destination matrices for TAZs to support CAMPO modeling efforts. A similar project was conducted for the South Alabama Regional Planning Commission in 2011 (Mobile MPO 2011). Final results from these projects have not yet been published. CAMPO also used AirSage data as part of a speed study for their region and found the results comparable to those obtained using GPS probe data. Some concerns over validation were mentioned. In 2011, MyGistics conducted an evaluation of time- varying travel demand for a major interchange in Roseville, CA using cell phone data from AirSage (Ma et al. 2012). Zonal trips for the study area were tabulated from cell phone data to estimate the travel demand for peak periods. The results showed that the time-varying demand estimation is possible using passive sources. In 2012, the Virginia DOT studied the origin–destination patterns of travelers passing along the joint section of I-95 and I-64 (Business Wire 2012). TomTom origin– destination data were used to identify zone-to-zone travel patterns and were then used as inputs into a micro-simulation model of the corridor. The results of this study have not yet been published. Table 1-7 shows additional project characteristics for known applications of consumer data for travel behavior analysis. Sample of Current Offerings from Private Companies Several private companies that sell transportation data generated from consumer devices (and other sources) were contacted regarding their data offerings and were sent cus- tom industry-specific questionnaires. This section briefly summarizes the information provided by each company that responded to these questionnaires. It is worth noting that most responses were provided in the form of marketing materials. The following sections summarize the information gathered from each provider. AirSage. AirSage has agreements with two of the top three major mobile phone carriers that allow AirSage to gather information regarding the location of mobile phones in real time, which, in turn, enables them to generate real- time traffic data for traveler information systems and to gen- erate archived cell phone movement information to support origin–destination studies. They have a refined and patent- protected process [known as Wireless Signal Extraction (WiSE)] for triangulated signals based on signal strength at cell towers and can classify data points as transient or sta- tionary. Their coverage area includes the continental United for establishing additional details needed to support the data needs of transport models. Research papers available at the SENSEable City Labo- ratory at the Massachusetts Institute of Technology (MIT) include a wide range of concept papers on consumer product data analysis. The earliest papers focused on issues surround- ing location-based services and wireless hotspots. Their research expanded with time and now includes a number of papers and findings that provide insight and direction into the future transportation planning roles of consumer data. In 2006, a paper by Ratti et al. identified a vision for using cell phone data for understanding people’s behavior (Ratti et al. 2006). The vision was demonstrated with data from Milan, Italy, and showed the possibility of true travel monitoring for both typical and atypical events. Ratti et al. (2007) fol- lowed up this research with a study of cell phone movement data in Graz, Austria. In that study, cell phone users volun- teered to have their locations tracked for a 24-hour period to show how users’ travel could be monitored. These early efforts did not fully address the mass processing and impu- tation needs but rather focused on the idea that behavior could be identified based on the passive data. Since 2007, the SENSEable City Lab at MIT has had available several addi- tional papers focusing on the technological challenges and opportunities for using consumer data to identify action- able travel behavior information. In all of these early studies, the focus was on proof of con- cept with limited sample sizes. All studies mentioned con- cerns over privacy as a significant hurdle to both the future application and the viability of publicly sponsored research efforts. Most defended against the concept of bias due to the market penetration of cell phones. Public Agency Applications The first documented large-scale public agency use of archived consumer product data for estimating travel behav- ior was in Israel in 2009 (Gur et al. 2009). In this study, data from ITIS Traffic Services Ltd. was acquired by the Israel Department of Transport and contained archived data from 10,000 mobile phones for 16 one-week periods. The project was conducted as part of an effort in building a countrywide travel demand model. The data were used to identify inter- zonal trips to and from home and other non–home-based trips. While the effort was successful in supporting large-scale movements of the population, it was suggested that the data should be used in conjunction with more traditional survey methods to support the specific modeling requirements of city and regional planning. In 2011, AirSage was contracted by the Capital Area Metro politan Planning Organization (CAMPO) in Raleigh, NC to conduct an origin–destination study (AirSage Inc.

27 data from fleet vehicles, personal vehicles, roadside sensor- based systems, cell phone data, smartphone applications, RFID, and other sources. They have products that have been generated specifically for public agencies that are based on archived travel condition data. The majority of their solutions are designed to support operations, system planning and measurement, and system optimizations. For travel behavior and model support, archived INRIX speed data have been used for developing baseline speeds by times of day. Bias: No information provided other than as indicated by their listed sources of information. Generally, roadway speed and delay information is not susceptible to the same impact of demographic bias as trip information. Data Quality: Historical data products provide statistical distributions to reflect variability. Quality is related to sample size that may vary based on traffic volumes and functional clas- sification. Multiple data sources contribute to data validation. Privacy Protection: INRIX does not reveal private infor- mation from sources; only aggregate data at a road segment or route level are provided. Nokia/NAVTEQ. Nokia gathers real-time travel infor- mation from a wide range of original sources, including GPS/ States and Hawaii. Most of AirSage’s data products are based on all cell phones covered by the carrier agreements that provide these triangulated data points. Other data products are based on location data collected from opt-in devices that allow more detailed tracking (called FastCache). AirSage estimates home and other activity locations based on time- of-day, day-of-week, and location patterns over many weeks. Demographic distributions can then be applied to the esti- mated home locations. Bias: AirSage reports that their coverage includes 70% of cell phones in the United States. Data Quality: Origin–destination information is aggre- gated to the U.S. Census block level or higher. An internal data validation process is conducted within WiSE. Privacy Protection: No information about private con- sumers is extracted to any data products. Exact trip end information is not provided for their origin–destination data; locations are aggregated to the U.S. Census block level. FastCache users have license agreements in place that allow sharing of more detailed information. INRIX, Inc. INRIX gathers real-time travel information from a wide range of original sources, including GPS/AVL Study Data Source Retrieval Method Bias Treatment Noted Data Quality Issues Stated Privacy Protection Ministry of Transport and Road Safety, Israel, 2008 (Gur et al. 2009) Cell phone Passive cellular activity detection Overnight locations compared to aggregated demographics Conducted separate CATI survey of population regarding cell phone use Trip end location limitations, processed data not exactly matching model needs Legal review by agency lawyers, removal of private information from raw data Capital Area MPO, Raleigh NC, 2011 (AirSage Inc. 2011) Cell phone Passive cellular activity detection Overnight locations compared to aggregated demographics Very short trips excluded, some merging of TAZs needed in dense areas Removal of private information, final products only include aggregated results by TAZ South Alabama RPC, 2011 (Mobile MPO 2011) Cell phone Passive cellular activity detection Unreported Unreported Unreported Virginia DOT, 2011 (Business Wire 2012) PND User- initiated app with auto data upload Unreported Unreported Covered under license agreement MyGistics 2011 (Ma et al. 2012) Cell phone Passive cellular activity detection Not explicitly discussed, but mentioned “data cleaning” Unreported Unreported Table 1-7. Summary of efforts applying consumer data for travel behavior analysis.

28 Bias: There is potential bias in traffic data given the heavy reliance on fleet vehicles. Bluetooth sensor data are biased toward drivers that are more likely to have Bluetooth-enabled devices. In both of these situations, traffic conditions mitigate the impact of final product bias. Data Quality: Quality metrics are used internally to iden- tify potential sensor or data-quality issues. Quality is related to sample size, which may vary based on traffic volumes and functional classification. Privacy Protection: No private information is gathered from any of the information sources. Privacy Protection in Consumer Data Sets As noted previously, a wide range of information can be gathered about travelers moving on transportation networks through various data collection methods. As the capabilities of sensor-based systems and consumer products advance, more personally identifiable information is accessible and, therefore, comes with an increased risk of exposure of pri- vate information. In the last couple of years, it was revealed that most phones, GPS devices, and smartphone applica- tions have stated and unstated capability to archive and upload information about usage, including location data. The hardware and software companies controlling these ele- ments protect their data access with license agreements that users must agree to prior to operation. These companies are also very sensitive to the concerns of their customers and generally avoid risking their primary markets. Therefore, while the idea of selling consumer data has some attraction, there is some hesitation to providing any sort of private data to a government agency. At the time of this writing, there were three proposed leg- islative acts intended to control government access to private data collected from consumer products: the Geolocation Pri- vacy and Surveillance Act of 2011 in the U.S. Congress, the Wireless Surveillance Act of 2012 in the House of Representa- tives, and the California Location Privacy Act (which passed with bipartisan support in August of 2012). The Geolocation Privacy and Surveillance Act is a bipar- tisan bill that provides “a legal framework designed to give government agencies, commercial entities, and private citi- zens clear guidelines for when and how geolocation infor- mation can be accessed and used.” The primary focus of this act is to prevent law enforcement from tracking individuals without a search warrant except in cases of emergency. More importantly, the bill also addresses private companies’ use of consumer data and limits their use unless there is explicit consent by the individual. Passage of this bill may have some impact on the availability of some consumer data sets, particularly those with blanket or third-party data access agreements. AVL data from fleet vehicles, personal navigation devices, cell phones, roadside sensors, and other sources. For public agen- cies, Nokia provides Traffic Patterns and Traffic Analytics that aggregate traffic speeds as needed to support performance evaluation or planning efforts. Bias: Approximately half of the data sources are private, consumer-based sources, and the other half are fleet vehicle- based sources. Bias is limited because they only provide road segment speeds. Data Quality: Data quality procedures are applied to all data before they are released into a real-time or archived data product. Multiple data sources contribute to data validation. Quality is related to sample size, which may vary based on traffic volumes and functional classification. Privacy Protection: No private information is released in any product. Private information is not collected, and unique IDs are periodically assigned in the data processing to ensure anonymity. Nokia private consumer data usage is authorized with direct user acceptance of a data sharing agreement at the activation of a Nokia device. TomTom. TomTom gathers GPS data from personal navigation devices, and GPS/AVL from fleet vehicles, smart- phones, cell phones, and third-party data. TomTom data products serve real-time information in support of traveler information systems and archived historical data for trans- portation planning and operations. Bias: The prevalence of consumer devices has indicated a potential demographic bias typical of cell phone market penetration rates. Bias is limited for road segment and route speed information. Data Quality: Smartphone GPS data are controlled for mode bias by limiting data usage to when the smartphone is docked in a car holder. Multiple data sources contribute to data validation. Quality is related to sample size, which may vary based on traffic volumes and functional classification. Privacy Protection: Data from consumer products are provided according to data usage agreements with the con- sumer. Private information is given the highest regard, and no private information is released in any data products. Random IDs are generated at regular intervals in the internal process- ing of data. TrafficCast. TrafficCast gathers GPS/AVL data from fleet vehicles and from mobile and fixed-sensor–based sources to generate estimates of real-time traffic infor- mation (provided as Dynaflow). TrafficCast also provides Bluetooth sensor systems and data delivery for interested agencies (provided as BlueTOAD). Travel-behavior–related data products include archival data for estimating baseline speeds and OD data at setup using Bluetooth sensors for specific study areas.

29 where the Federal Communication Commission required all cell phone manufacturers to have built-in location-detecting technology that would allow wireless network operators to provide latitude and longitude information on callers within 300 m to support emergency response. This requirement was part of the Wireless Communications and Public Safety Act of 1999 (Wolf 2000; Karim 2004). As a result of the E-911 mandate, cell phones are now equipped with a number of location-sensing technologies. The location can be determined by GPS, Wi-Fi, Bluetooth, and their interaction with the wireless network. Data collected using the first three detection types can be controlled by the owner by altering the settings on the phone. Data collection via the wireless network interaction, however, is done so that anytime a phone is turned on and interacts with the wire- less network in some way, whether a tower-to-tower handoff or the initiation of a phone call or text, the location can be determined through triangulation. This location detection is done passively and has been processed by private companies for real-time traffic and population movement applications. Data from smartphone applications is an area of location detection that is advancing swiftly. People agree to certain terms to use the services that require location detection and/ or sharing. Many location-based services such as navigational apps (e.g., Google Maps) and social networking applications (e.g., foursquare) are very popular, and the location data gen- erated by these apps can be archived and resold according to licensing agreement terms. RFID, Bluetooth, and Wi-Fi. RFID systems can detect the location of travelers when their assigned RFID tag is within range of a receiver. Generally, this allows the pro- vider of the RFID tag to look up private account informa- tion for the person subscribed to the unique RFID. This capability can be employed for applying and enforcing road use-based fees. Bluetooth readers are much less obtrusive and function by identifying the media access control (MAC) address as The Wireless Surveillance Act of 2012 is designed to limit access to electronic communication. This act, less important to travel behavior research, addresses email and phone com- munications. The recently passed California Location Pri- vacy Act of 2012 makes it mandatory for law enforcement agencies to obtain a search warrant before gathering any GPS or location tracking data from a personal cell phone or other device. While most of the attention on privacy legis- lation is in law enforcement, there are definite concerns in all three (and in older efforts) regarding data access by any public agency. One of the first transportation planning studies to address the issue of data privacy was the Connected Vehicle Road User Fee Test (Pierce et al. 2011). The researchers were con- fronted with public concern about the tracking capability of the technology to be used. The issue and the project’s focus on privacy protection identified that the concerns were valid and that other transportation planning studies should have thorough procedures for checking legality and providing the utmost security for private information. The types of private data that can be accessed vary by device. Table 1-8 details the type of information that can be col- lected by detector type. There are several categories of agree- ments between users, service providers, and data collectors when data are actively or passively collected. In some cases there is no agreement at all, as in video surveillance typically purposed for traffic and incident management. GPS and Cell Phones. Once selective availability, which refers to the degradation of GPS signals, was eliminated in 2000, the accuracy of GPS devices improved from 30 m to 100 m in 2000 to 5 m to 10 m by 2003 (Zmud and Wolf 2003). The information that can be collected from GPS devices has such precision that much information can be derived from the time-stamped position data. Another move by the govern- ment that affected location capabilities by cell phone rather than GPS technology alone was the development of E-911, Location detector Individual information Video surveillance Vehicle – location, vehicle type, speed, time, occupancy Pedestrian/transit – location, time, activity, company License tag ID – can be linked to vehicle registration data Bluetooth Location, time GPS device (includes PND) Location, speed, route, frequent locations, time, acceleration RFID Location, speed, time Cell phone Location, speed, time Smartphone application Location, user information, context Transit smart cards Origin, destination, frequented stations/stops, and times Table 1-8. Individual information accessed from consumer products and detectors.

30 location, application and feature usage, network traffic data, ser- vice options you choose, mobile and device number, and other similar information may be used for billing purposes, to deliver and maintain products and services, or to help you with service- related issues or questions.” “We may collect and process information about your actual loca- tion, like GPS signals sent by a mobile device. We may also use various technologies to determine location, such as sensor data from your device that may, for example, provide information on nearby Wi-Fi access points and cell towers.” “We may share aggregated, non-personally identifiable informa- tion publicly and with our partners – like publishers, advertisers or connected sites.” “This type of information may be aggregated or anonymized for business and marketing uses by us or by third parties.” “We may also draw upon this Personal Information in order to adapt the Services of our community to your needs, to research the effectiveness of our network and Services, and to develop new tools for the community.” “We receive and store any information you enter on our Service or provide to us in any other way. . . . We automatically receive your location when you use the Service.” “We may automatically collect location information from your mobile device, but such information will not be directly linked to a specific person. Your location data will only be provided to us in accordance with Terms governing your app, and will then be aggregated with other data.” “These companies, often called ad servers or ad networks, may place and access cookies on your device to collect information about your visit on our websites.” “We may put together your current city with GPS and other location information we have about you to, for example, tell you and your friends about people or events nearby, or offer deals to you that you might be interested in. We may also put together data about you to serve you ads that might be more relevant to you.” These excerpts exemplify the variety of ways companies inform users of how their personal data may be accessed, used, and shared with others. However, consumers are often unaware of these terms, do not know where to find these terms, or do not understand the implications of pos- sibly ambiguous terms such as “to develop new tools for the community” or “to deliver and maintain products and services.” The ambiguity in terms combined with the vari- ability of policies across license agreements adds to the inconsistency and future unreliability of data sources from consumer products. Fixed-Location Sensors Application of Fixed-Location Sensors to Transportation Data Collection Fixed-location sensors are devices that are positioned along a transportation system and have a short-range detec- Bluetooth-equipped devices pass by a sensor. By pairing the MAC address with observations downstream, the travel time and speed information can be generated. A similar approach is possible for devices emitting Wi-Fi signals. When transit fares are collected from passengers via elec- tronic fare media, records of passengers’ travel times (i.e., when their transit trips started and ended), their transit trip dura- tions (i.e., how much time it took to travel within the transit system), and routes used are automatically captured. Further- more, at the transit vehicle level, boarding counts (as well as alighting counts for some transit systems) are also stored. With this information, transit agencies know when, where, and how their passengers use the system; in turn, the transit agencies are able to monitor and improve their services. Electronic fare media can also have additional information associated with the user, such as school, employment, other organization, age, credit card details, and home address. Data Processing Techniques to Reduce Privacy Exposure There are ways to process the data to reduce the ability to trace it back to a specific household or person. Almost all pro- cessing of data includes a step that renders the data anony- mous. However, even though the data are anonymous, it is still possible to use accurate location information within the data to identify a person’s home location, work location, and other frequently visited places. A recent research project conducted by Elango and Guensler explored two traceability-reducing post-processing techniques for GPS-collected data prior to distribution (Elango and Guensler 2011). Both techniques involve creating polygons around home locations and trim- ming the location data from within that polygon. By keeping the authentication and filtering process in the communica- tion servers separate from the analyzing processes of the traffic servers, the privacy protection of users can be ensured while still maintaining data integrity (Hoh et al. 2006). License Agreements for Secondary Use For many users and service providers, it is important that information collected cannot be traced back to the individual user unless explicitly authorized by the user. To protect the information that can be obtained from cell phone usage, Con- gress passed the Telecommunications Act of 1996, which con- tained Section 222 that requires telecommunication customer approval before customer proprietary network information is distributed to third parties (Karim 2004). The following user agreement excerpts were found on dif- ferent manufacturer or service provider websites regarding the use or reuse of consumer data on April 27, 2012: “We collect information about your use of our products and ser- vices. Information such as call records, websites visited, wireless

31 day, and a trip travel time. Not all individuals or vehicles have active Bluetooth transmitters. Estimates of detection percent- age range from 3% to 8% in the multiple studies reviewed over the last few years (Lee, Agnello, and Chen 2011; Voigt 2011; Bullock, Haseman, and Wasson 2010). Since the sensors operate independently, they can record data for long periods to capture enough data to support analysis needs. There have been applications of Bluetooth devices around the country to support transportation initiatives. Most of these applications focus on performance evaluation and traveler information systems. A few, however, have used Bluetooth technology to understand travel behavior and to feed modeling efforts. A recent Virginia DOT study com- pared OD data from traditional video data collection and Bluetooth (Lee, Agnello, and Chen 2011). Both approaches were used to identify travel patterns of vehicles entering and leaving the Richmond, VA area (external–external, external– internal, and internal–external counts). The Bluetooth cap- ture rates varied from between 3.73% and 5.82%, while the video captured between 52% and 88% of vehicles. A compar- ison of trip tables showed significant differences and led the researchers to believe that the video capture was more reli- able since the Bluetooth sensors have smaller capture rates and much more signal noise in the data. Another applica- tion of Bluetooth sensors for identifying external travel was conducted by the Texas Transportation Institute in Houston (Voigt 2011). The study explored a broad deployment of sen- sors along different road classes to support a traveler infor- mation application. It found capture rates for Bluetooth sensors as high as 20% and determined that capture rates were increasing over time. A study by Bullock, Haseman, and Wasson was conducted over a 12-week period in 2009 along the I-65 corridor in Indiana and collected 1.4 million travel times (Bullock, Haseman, and Wasson 2010). Portable Bluetooth devices were placed along the main line of the Interstate with semi- permanent ones placed along diversion routes. The possible uses for the data included determination of travel delay times, driver diversion rates, and work zone mobility performance. Bluetooth was also used in a Florida DOT study to identify traf- fic pattern changes related to a new interchange in Jacksonville (Carpenter, Fowler, and Adler 2012). Fourteen sensors were deployed for 1 week to identify OD matrices for the primary access points in the study area. The resulting OD travel time matrix was also used to validate the final model. One of the most interesting future travel behavior appli- cations of Bluetooth is its potential role in the Connected Vehicle Initiative research program. The Connected Vehicle Initiative, originally envisioned to improve safety and reduce congestion, is a large research initiative involving intelligent transportation systems (ITSs) that allow vehicle-to-vehicle and vehicle-to-infrastructure communications. Bluetooth is one of the communication technologies being considered. If tion capability. Historically, license plate surveys and video capture have been used to support OD and travel time studies when information was needed for the modeling of specific transportation facilities or areas. The introduction of RFID and Bluetooth sensor technology allowed the same types of studies to be conducted with reduced labor cost, increased accuracy, and potentially larger sample sizes (due to increased study durations). With enough sensors along a transporta- tion system, travel patterns can be identified and developed into OD matrices that can be used to support travel demand models. Since Bluetooth sensors only capture observed travel times between fixed locations without socioeconomic or demographic information regarding drivers, the data pro- vided are limited to short-term modeling and generally are only applied in specific situations. RFID sensors, on the other hand, are typically used for toll tags or transit smartcards that individuals carry for their travel needs and hence have a con- nection to a specific user. This connection allows the poten- tial to use other data regarding that person that may be part of a customer database. Further, the customer information also provides an ability to contact that person for follow-up surveys, granted that previous consent was provided. In addition to these current technologies, there is a signifi- cant amount of research regarding connected vehicles. The U.S. DOT is sponsoring research, known as the Connected Vehicle Initiative and the Smart Roadside Initiative, that will allow data transmissions between vehicles [vehicle to vehi- cle (V2V)] and from vehicle to infrastructure (VTI), where infrastructure includes roadside control systems and sensors. Conceivably, these data could also be used for travel behav- ior analysis when they become available. Data streams from a fully implemented system would capture vehicle trips. Like most of the vehicle sensing technologies, travel from non- motorized modes would not be captured. Regardless, these initiatives have the potential to be powerful data collection sources for anonymous vehicle trajectory data. Bluetooth Technology and Data Collection The Bluetooth protocol is widely used for exchanging data over short distances from fixed and mobile devices. Bluetooth technology has been used in transportation data collection since 2007. Bluetooth sensors can be fixed along roadways, nonmotorized transportation facilities, onboard transit vehi- cles, or a number of other pathways. The sensors or stationary receivers can detect the presence of vehicles or individuals with Bluetooth-enabled devices (when the device is in discov- erable mode) such as in-vehicle navigational devices, mobile phones, and wireless headsets. The receiver does not collect any information other than the unique MAC address of the device and the time of the observation. As a person or vehicle travels along a network link with multiple sensors, its signal is detected by each sensor generating a trip path, trip time of

32 a transit vehicle to debit stored value from the card. Some systems require that the passenger tap out to exit a transit station. Most of these systems are the exclusive means for transit passengers to use for fare payment and thus are used by nearly 100% of riders. The location of the fare boxes can be geocoded, allowing for the boarding and alighting data of passengers to be accurately collected or recorded. In cases where the AFC is integrated with an AVL, bus and light rail transactions may also be geocoded with a reasonable degree of accuracy. Registration of cards is an option but is not compulsory. Such registration does not require the passen- ger’s home, work, or school address to be provided. However, the lack of these attributes does not necessarily mean that data about origins and destinations cannot be deduced. With the addition of land use data, there is a reasonable means for determining whether a trip is originating in an area likely to be the home, work, or school location. Managing Large Data Sets The advent of high-frequency GPS logging applications for transportation in the early 2000s brought new data manage- ment challenges to transportation researchers and practitio- ners. This type of data has the potential of being quite large. For example, a person traveling for an average of 90 min a day for a week will generate 37,800 GPS points at a 1-s log- ging resolution if points are only collected during travel. If one then tries to log the same amount of travel from 1,000 persons, there would be 37.8 million points. Translated into disk space requirements, this number of points would require 2 to 3 gigabytes (GB), depending on the number of attributes stored per point, the resolution, and the level of indexing. However, if no filtering of non-travel points is applied, these numbers would be increased 16-fold. When compared to today’s large disk drives, with capacities measured in hundreds of gigabytes, storage requirements in the 2-GB to 3-GB range (or even 32 GB to 48 GB if all points are logged) may not seem like much, but one needs to real- ize that the real challenge is to be able to effectively retrieve, clean, process, visualize, and attach attributes to groups of records. The current state-of-the-practice approach to solve this problem is to use server-based relational databases (Wolf, Schönfelder, et al. 2004; Oliveira et al. 2011) to store the GPS data in context. The availability of open-source server-based relational databases such as MySQL and PostgreSQL has made this an affordable and popular solution. One approach to make these large data sets more manage- able is to apply compression and filtering schemes at import time. For example, when the objective is to measure travel, one can simply filter out speeds below a minimum thresh- old. Other strategies for reducing the number of records stored (and therefore memory/storage requirements) include implemented on a wide scale, sensors could detect vehicles and gather travel path information. Another interesting study explored the use of multiple Bluetooth sensors to identify in-home activities. Schenk et al. (2011) explored the combination of collecting data from both Bluetooth devices and smartphones to build a com- plete spatial and temporal “lifespace” pattern. Each of the participants carried a smartphone, which collected GPS data and transmitted a signal to a Bluetooth receiver installed in their homes. For two of the participants, 30 days of data were recorded, and for the remaining participants only 21 days of data were recorded due to battery limitations. While the study was successful in collecting detailed activity data, the need for installing multiple fixed sensors prohibits wide-scale deploy- ment for large-scale behavioral studies. If, however, existing consumer devices can capture and record the same signals, then this approach could become more feasible. In essence, Bluetooth will continue to play a role in pro- viding OD data for modeling. However, it is likely that this technology will be limited to small-scale or corridor imple- mentations where models are needed to evaluate changes in geometric or operation conditions. Broader policy and behavioral changes will likely not depend on this technology in the near future. RFID Technology and Data Collection RFID is a technology that is applicable to a variety of fields for the purpose of tracking vehicles, people, and goods wirelessly. For transportation planning, RFID allows orga- nizations and agencies to passively collect data about the tag-equipped users of a roadway or transit facility. Similar to Bluetooth, the user must pass within a certain distance of an RFID sensor for detection. The primary difference between RFID and Bluetooth is that the RFID technology is deployed in transportation infrastructure as a means to identify users of a particular service such as toll collection, parking, and transit fare capture. This implies that additional information such as home address can be linked to any RFID-derived travel data. This added capability provides for a deeper understanding of the socio-demographics of the travelers that pass a sensor and affords the opportunity for follow-up contact. Transit and toll agencies have used this capability to track system perfor- mance, to measure demand, and for user satisfaction surveys. Another implementation of fixed-location sensors is the automated fare collection (AFC) system. AFCs are being used in a variety of transit systems such as MARTA, the NY Metro, and the Chicago Transit Authority. Research into using these systems as a potential replacement for traditional OD sur- veys is currently under way (Munizaga, Devillaine, and Amaya 2012; Chakirov and Erath 2012). Generally, AFC cards require that a passenger tap a fare box at a train station or on

33 the resulting system to scale out to meet demand. Unfortu- nately, when data are segmented in this way, additional steps are required to conduct analysis over the entire data set. This added management overhead increases as the size of the data set in question grows and more segmentation is needed. To deal with these issues, Google developed distributed stor- age and processing systems using commodity (i.e., inexpensive and numerous) computer clusters running Linux and capable of handling hardware failures through the use of redundancy when partitioning the data on the cluster (Chang et al. 2006). Adding computers to the cluster would automatically increase processing capacity and reduce total run time (also referred to as scaling out). These new technologies make it possible to run very large data crunching jobs in a reliable and efficient manner, allow- ing software engineers to focus on the algorithms and pro- cessing logic instead of the management overhead associated with these types of tasks. The research papers Google pub- lished on this approach inspired other companies and indi- viduals to develop software implementing the same approach. Some of these efforts, in turn, became open-source projects, and a community grew around them. As a whole, these new technologies have been referred to as “big data.” The storage solutions that drive these new data management solutions are collectively called “noSQL” and are commonly combined with a programming approach called MapReduce. These new data storage technologies focused on simpler data structures (typically consisting of key value pairs) as opposed to the more complex represen- tations used in relational data modeling. This is explained by the fact that most of the initial applications consisted of processing document-based data such as web pages and web server logs. Table 1-9 shows examples of popular open-source big-data technologies. Finally, the recent emergence of cloud computing has made it possible to rent large computer clusters on demand at rea- sonable costs. The combination of this new availability and big-data technologies has made it possible for small organiza- tions to tackle sizable data management and processing jobs. The flexibility inherent in cloud-based solutions also allows computing resources to be added and removed as needed. For example, StreetLightData is a start-up company which is developing technology for processing massive amounts of GPS data collected using smartphones. The processed data will then be used to develop site selection and planning data prod- ucts. According to a recent interview given by its chief tech- nology officer and founder, Paul Friedman, StreetLightData is using a Hadoop-based big-data cloud solution from Claudera to batch process initial processing done using several servers running PostgreSQL (i.e., the initial database is segmented). StreetLightData’s website is http://www.streetlightdata.com/. sampling the data at lower frequencies (e.g., record only a point every 10 s instead of every second) and applying data simplification or compression procedures such as SQUISH (Muckell et al. 2011). Unfortunately, these latter approaches result in loss of information, which may not be a problem for immediate uses of the collected data but may limit future secondary applications. Best practices should be followed when using a relational database to store large data sets. These include using appro- priate field data types to keep record sizes small, ensuring that tables have primary keys defined, and applying normaliza- tion to schema. Expected data storage and processing needs should be used to guide the selection and configuration of a relational database platform, including the hardware on which it will run. From a hardware standpoint, it is impor- tant to use servers that have as much random access mem- ory (RAM) as the budget can afford, to use servers with a redundant array of inexpensive disks for both performance and failure protection, and to have good backup and restore procedures in place. Despite being capable of storing, managing, and process- ing very large data sets, relational databases have architec- tural limitations that make them unsuitable for keeping data sets whose size is measured in terabytes. This magnitude of data is becoming increasingly common with the advent of continuously monitored data sources such as those data sets generated from permanently instrumented vehicles. For example, the American Trucking Research Institute records approximately 4 billion position data points from com- mercial trucks annually (Bernardin et al. 2012). A research project in Singapore reported on a comparable data set with over 4 billion GPS observations coming from approximately 15,000 Singapore taxicabs (Koh, Nguyen, and Woodard 2010). This specific data set occupied over 300 GB of disk space and was loaded in a PostgreSQL database. The current trend in household travel surveys toward data collection methods that are primarily based on GPS meth- ods or sources is likely to increase the amount of data that needs to be managed (Giaimo et al. 2010; Oliveira et al. 2011). Smartphone applications that allow participants to collect GPS and other sensor-based data (such as accelerometer data) within existing travel surveys are also likely to reduce the data acquisition costs associated with the deployment of specialized devices while contributing to the creation of sig- nificantly larger large data sets (Bricka and Murakami 2012). A common practice used when managing and processing larger data sets is segmenting (also known as partitioning or sharding) the accumulated data into smaller units (Nemala 2009). Partitioning the data in this manner allows traditional database software to find and process records quickly by loading much of the data into RAM. Data partitioned in this manner can also be placed on different servers, which allows

34 is considered an expert in his or her respective discipline/ category based on the research team’s knowledge, available publications, and conference presentations. Representatives from different firms and research organizations were selected within each industry category to give adequate coverage and to minimize bias. Table 1-10 provides a complete list of all questionnaire respondents, with their company affiliation, organization or firm affiliation, and industry group. The questionnaires Survey of Industry Experts To assess data needs, current capabilities, and future direc- tions of both transportation data providers and users, custom- ized questionnaires were sent to industry experts who worked for companies or organizations that (1) collect or analyze GPS data for travel behavior research, (2) use GPS data for travel behavior and activity modeling, or (3) sell consumer travel or traffic data. Each person selected to complete a questionnaire Name Description Hadoop Apache Hadoop is an open-source software framework for data-intensive distributed applications and was originally created by Doug Cutting to support his work on Nutch, an open-source web search engine. It is currently one of the most popular frameworks for distributed processing. Cassandra Apache Cassandra is an open-source distributed database management system developed by Facebook to power its Inbox Search feature. Facebook abandoned Cassandra in favor of HBase in 2010, but Cassandra is still used by a number of companies, including Netflix, which uses Cassandra as the back-end database for its streaming services. HBase Apache HBase is an open-source, non-relational columnar distributed database designed to run on top of Hadoop. It provides fault-tolerant storage and quick access to large quantities of sparse data. HBase is one of a multitude of NoSQL data stores that have become available in the past several years. MongoDB Created by the founders of DoubleClick, MongoDB is another popular open-source NoSQL data store. CouchDB Apache CouchDB is still another open-source NoSQL database. It uses JSON to store data, JavaScript as its query language, and MapReduce and HTTP for an API. Source: http://www.networkworld.com/slideshow/51090 Table 1-9. Popular open-source big-data technologies. Industry Organization or Firm Travel Survey Practitioners Abt SRBI Battelle Memorial Institute ETC Institute GeoStats NuStats Resource Systems Group (RSG) University of Sydney Westat Travel Behavior Researchers Argonne National Laboratory/UIC Delft University of Technology ETH Zürich FHWA / USDOT IFSTTAR (French Institute for Science & Technology of Transport, Development & Networks) Texas Transportation Institute Tokyo Institute of Technology University at Albany University of Tokyo Table 1-10. List of experts by industry.

35 along with a few key, representative quotes. The responses appear in Table 1-11, Table 1-12, and Table 1-13, respectively. It should be noted that the traffic data providers did not respond to the questionnaires directly; instead, many of them simply copied marketing information into the questionnaire itself. Consequently, there is no summary table of responses for this last industry category. Instead the marketing information offered by the traffic data providers has been integrated within the relevant literature review sections of this report and their complete individual responses are provided in Appendix C. sent to these industry experts were designed to collect both current and future plans for GPS data use or provision, and were customized for each industry (see Appendix B for each questionnaire). As responses were received, follow-up con- tact was made as needed for clarifications or to collect refer- ence material mentioned in the response. These references were then reviewed and included if relevant in the literature review synthesis. Summary tables have been created that contain the main themes and responses received for each question, by industry, Industry Organization or Firm Transportation Planners and Modelers Atlanta Regional Commission Cambridge Systematics, Inc. Chicago Metropolitan Agency for Planning Chicago Transit Authority Jerusalem Transport Masterplan Mark Bradley Research and Consulting Metropolitan Council (Minneapolis & St. Paul) Ohio DOT Parsons Brinckerhoff, Inc. San Francisco County Transportation Authority Texas Transportation Institute Traffic Data Providers AirSage INRIX, Inc. Nokia (NAVTEQ) TomTom, Inc. TrafficCast International Table 1-10. (Continued). Question Summary Relevant Quotes 1. Current use of technologies All use passive vehicle or personal/wearable GPS loggers Surveying purposes include: – Correction of trip diaries (still the predominant use) – As supplemental data to trip diaries (for more accurate timing/location info) As sole source of travel information with either passive diary creation or used to generate prompted-recall questionnaires for nonspatial-temporal information Other uses include: – Driving behavior analysis in response to policy intervention at either the aggregate (i.e., road pricing – Minnesota) or individual level (TravelSmart – ITLS) – Health and physical activity surveys, either to match location to physical activity (2012 Nashville Transportation and Health Study) or to correct self-reports – Vehicle emissions and fuel consumption studies [GPS used in tandem with engine sensors (i.e., California Energy Commission interest in 2012 CHTS)] – Real-time vehicle information studies (i.e., bus tracking) – Survey administration – selection of intercept sites, track surveyors, etc. “We have used GPS in a range of travel behavior surveys over the last decade including pilot tests for conventional household travel surveys, evaluation of travel behavior change interventions, and in-vehicle driving behavior studies.” Table 1-11. Summary table of travel survey consultant responses. (continued on next page)

Question Summary Relevant Quotes 5. GPS-based travel behavior details included in deliverables Travel surveys generally provide, at a minimum, the trip segments (by mode if using person-based devices), and calculated origins and destinations for trips (corrected by diary data if a combined survey). Some provide link-matching details from GIS, which can correct inaccuracies in GPS data and link the trip records to detailed network information. Others match to TAZ or census tract. Most also provide speed, distance, and travel time from the GPS data. For combined sensor surveys (i.e., health studies), GPS data can be combined and enhanced with other sensor data (i.e., accelerometers). Many now include automated data imputation for modes (most common) and imputed purpose (if the survey did not include diary collection). Driving behavior data are extracted from GPS (i.e., VMT, start/end times). “Our standard GPS data deliverable contains tables with complete details on GPS households, GPS persons or GPS vehicles, GPS trips, GPS trip segments, GPS points, and network links matched to GPS points. For dual-method (diary comparison) studies, we also provide tables with GPS trips matched to reported trips and missed trip analysis results.” 6. Methods for privacy protection Privacy and security enforcement is generally maintained through the following mechanisms: Institutional controls [such as institutional review boards (IRBs), ethics guides, confidentiality agreements, human subjects training] Physical security – secured storage, protected databases De-identification – removing identifiers from travel information, including names and addresses Data separation This list is similar to solutions in all types of surveys. GPS raises new issues not addressed by the above, such as: Fuzzification of trip start/end (i.e., add random error, round to zone centroid, etc.) What to do about pattern information that can be uniquely derived Concerns about privacy can be mitigated as in thick-client paradigm (i.e., equipment collects and aggregates the data before transmission, actual raw data never sent). “We are concerned about the release of raw data and would welcome some easily processed method of hiding the specific locations in the data for the purpose of releasing the data.” 2. GPS plans for near/long term The near-term plans for GPS data collection mostly involve three main thrusts: Enhanced processing and imputation algorithms utilized in GPS data processing Sensor fusion or use in tandem efforts [i.e., adding accelerometers or OBD sensors (ITLS, GeoStats, Westat)] Transitioning to smartphone data collection, either as a survey application for respondents who have smartphones or possibly as replacements for GPS loggers “We are ... investigating [the] potential of smartphones, which I think are the future in this space.” “We foresee more GPS-only designs that leverage data processing and imputation algorithms to derive trip details.” “We are actively testing/fielding a new smartphone application.” 3. Impact of GPS use on participation rates and sample represent- ativeness In general, not a lot of information provided on impacts of GPS data collection on participation or response rates. Representativeness is not generally found to be an issue, with GPS either having no effect or actually increasing sample representativeness (at least in the ITLS/Sydney case and in the 2010–2011 NYC regional travel survey). Establishing representative samples for household travel surveys is often a requirement and handled by proprietary methods (Abt SRBI). Anecdotal evidence is mostly given, with wide variance depending on the type of study. Short-term data collection replacing or supplementing diary surveys has generally found either no change (ETC Institute, GeoStats) or an increase in participation or at least compliance (ITLS, Westat). On the other hand, longer-term surveys, which require more from participants, seem to be more challenging to recruit for, as was found in the Battelle road pricing study and was observed by GeoStats. “Recruit and retrieval rates seem to be most impacted by the overall level of burden associated with survey participation. Offering more options for participation . . . increase[s] both participation rates and sample representativeness by bringing in different population groups.” “While we can’t isolate the impact of the GPS per se on recruitment, anecdotally it had both positive and negative impacts.” 4. How do you process GPS data to generate deliverables? Mostly proprietary algorithms Most GPS-enhanced travel surveys reported following some variation of clustering points about stops and segmenting traces into separate trips or trip segments. Either fully automated or automated with manual review and correction – ITLS algorithms available for review (Stopher et al. 2012) – Other survey purposes require less processing—for example speed limit studies, road pricing impacts (which just get VMT, etc.), and some driving behavior studies. Simplified methods for vehicle-based data collection (not multimodal, which tends to produce messier data due to continuous power supply and more line-of-view obstructions with GPS satellites) “Most travel surveyors/firms have some algorithms to process the retrieved household travel survey GPS data from participants. These processes include determination of origins and destinations, travel paths, travel speeds, and travel modes.” “One consistent prevailing theme in this research is the concern expressed by the general public regarding privacy, or more speci- fically how studies like this could result in a reduction in personal privacy. Frequently, these fears are manifested through concerns of Big Brother type statements or how these studies would enable the government to track the movement of individual citizens.” Table 1-11. (Continued).

37 9. Limitations/ concerns about GPS Major concern exists regarding bias introduced through using GPS (i.e., respondents with privacy concerns regarding the collection method may select themselves out of the survey) Accuracy of processing algorithms, and time required to manually correct or the extra burden introduced for respondent correction Battery life Costs, compounded by potentially high loss rates of equipment Data loss due to compliance, device failure, or environmental limitations (i.e., weather) Cost “The final GPS data is still only as accurate and reliable as the individuals who have been recruited to carry the GPS devices.” “Perhaps the most salient limitation of GPS is cost. Like most technology, the cost of units is decreasing; however, the cost of acquiring, maintaining, deploying, and retrieving enough units to conduct a large-scale data collection effort is still high.” 10. Pricing information Data collection firms are reluctant to share cost information. GPS survey costs can range from slightly higher to much higher than traditional surveys, depending on scope of survey. GPS device prices have decreased dramatically over the past decade, and use of smartphone apps may further reduce costs. Low marginal cost of extending data collection period once devices are in place Can make costs comparable to traditional data collection with longer deployment periods and smaller sample sizes. Not relevant for surveys that can only be accomplished by GPS (behavior modification, road pricing, etc.) “Costs are rapidly becoming comparable to those for equivalent conventional survey procedures.” Question Summary Relevant Quotes 7. GPS coverage and accuracy compared to other methods Clear improvements in collecting spatial–temporal data [i.e., location within 5 m at every second (provided the device is charged, taken, and used correctly)] – Charging and carrying along are primarily issues with personal loggers. – Accuracy and coverage issues remain due to cold start delays/satellite acquisition times, sky blockage, and urban canyon issues. – Some issues may be addressed with improved technology (i.e., differential GPS, secondary sensors) Studies consistently show that GPS-collected travel surveys are more accurate than traditional diary surveys in terms of trip reporting. Technology continues to improve. Positional accuracy is now to a level that link identification, even between parallel roads, is possible with a high degree of accuracy. “Accuracy is clearly far greater than in diaries. People are notoriously bad at estimating the times at which they travel, how long they travel, and certainly how far they travel.” “Each generation of GPS units is better than the previous generation. For example, the delays in getting signals have been almost eliminated.” 8. Advantages of GPS Advantages largely relate to the previously identified improvements in – Accuracy of trips, – Detail of trips, and – Reduction in respondent burden. Additionally, the greater level of trip detail available allows other survey purposes beyond travel surveys, as mentioned (i.e., road pricing, travel behavior modification interventions, etc.). GPS data support next-generation travel models by providing more detailed data (e.g., exact routes, fine-grained location data) It is possible to reduce sample sizes through longer data collection periods, enabled with burden reduction “There is less respondent burden for capturing travel details while collecting more information and more accurate information.” “[GPS] is ubiquitous and available using off-the-shelf technology that is widely popular in the U.S. (i.e., a cell phone). No special equipment is needed. The phone almost always accompanies the individual and thus permits capturing both vehicle and non–vehicle-based travel.” Table 1-11. (Continued).

38 Question Summary Relevant Quotes 1. Current use of technology Many have used GPS data collection devices that include in-vehicle, wearable GPS loggers, smartphones, PDAs, or in-tablet PCs. Others have conducted GPS-based speed studies and employed GPS technology in bicycle studies, parking studies, traffic operations, and transit scheduling/planning to evaluate vehicle drive cycles and related emissions and to obtain route choice behavior. The ubiquitous presence of GPS and other embedded technologies in smart phones and PDAs has turned those devices into great travel logger tools. GPS logs have been used to obtain detailed trajectory of travel, travel mode, departure and arrival times, origin and destination, and (even) trip purpose. The GPS log data have been further enhanced by the use of other technologies, such as three-dimensional acceleration data (using accelerometers), and air pressure data for estimating detailed travel behavior, such as the microscopic movement for horizontal direction and the movement in a room. Several recent studies have employed a combination of a GPS data logger and an accelerometer followed by a prompted-recall instrument on the web. Others have used GPS on an experimental subsample (e.g., 10%) of a larger group participating in traditional travel diary surveys. GPS devices have provided the opportunity to collect passive data for longer durations and multiple days, thereby allowing for the collection of additional data (e.g., processes data) that were otherwise not presented to avoid survey burden. Availability of GPS data loggers with flash memory for data storage, as opposed to other devices that require proprietary software, has eased the process of data extraction, avoiding the need to collect and process the GPS data in separate steps. “The combination of GPS + web has made it possible to obtain whole travel behavior data that were not observed when only using the GPS.” “We wrote customized software that was loaded into the flash memory, which handled the data cleanup, conversion to trace format, extraction, and uploading transparently from the user perspective. The primary purpose of this was to shorten the recall period to the same day as data collection.” 2. GPS plans for near/long term The research and practice trends are toward developing apps for smartphones and tablet PCs. Use of other technologies to collect location data where GPS signal is not available is being considered. The improved technology would allow focusing on data collection for longer periods and eventually longitudinal GPS data collection where travel behavior could be studied as a function of life-cycle changes. Further extension of work on automatic processing of the GPS traces is expected, especially for trip purpose imputation, parking search, and mode changes. Post-processing of GPS data is becoming more important since it makes it possible to analyze and understand massive sets of GPS data by extracting knowledge from raw data while also maintaining its accuracy for planning purposes. Shifting to GPS-only household travel surveys “FHWA has a project looking at alternatives for collecting long- distance travel information. These could potentially include using cell phone tracking (e.g., from a commercial source), Twitter feeds, or Facebook posts from smartphones (which includes location) and combining these data sources with other more traditional sources.” “We are interested in collecting GPS data for use in an ongoing project involving travel demand modeling incorporating ITS strategies.” 3. Impact of GPS use on participation rates and sample represent- ativeness “The GPS technology generally tends to attract the higher income and higher educated respondents. We do see evidence of higher interest among the younger crowd, and avoidance from the elderly, and have had to adjust sampling plans to ensure an equitable distribution of participation.” A study by INRETS suggests that “GPS survey participation is positively correlated with higher education, higher income and, therefore, higher access to cars and greater mobility.” Therefore, using multiple survey modes and an appropriate method for weighting responses will be necessary. Recruitment and sample representativeness might be challenging in GPS + web studies since those without Internet access are sometimes eliminated from the study, resulting in sampling bias. In particular, older populations and those with limited Internet access will not be adequately covered to capture all travel markets. The cost of equipment and limited number of units in hand may affect the sample size and the duration of the study both in terms of total time needed to conduct the survey and duration of data collection from each participant. Most participants agree that the experience was interesting since the data collection burden for a respondent seems to be much less than for the traditional travel and diary surveys, thus allowing for a longer duration of survey. This will result in somewhat higher cooperation rates. One approach is to combine a GPS-enabled mobile phone with a web-based prompted-recall travel survey. Table 1-12. Summary table of travel behavior researcher responses.

Table 1-12. (Continued). Question Summary Relevant Quotes 4. How do you process GPS data? Typically there are multiple layers of data processing: – Quality assurance of the data collection – Log generation – Identifying trips by using a stay-and-move identification algorithm – Identifying travel attributes (e.g., route, mode, destination, purpose) using various heuristic, machine learning, data mining, or statistical models. Mostly use an in-house program to parse the data into trips and an in- house algorithm to impute trip attributes Map-matching algorithms are also developed to identify the route (path) in the network. 5. GPS-based travel behavior details included in deliverables Many aim to obtain data as comparable as possible to those produced by conventional surveys. GPS-based travel surveys generally result in full traces of the logger movement at up to second-by-second resolution. The processed trip and activity records include basic activity-travel information for each episode, including origins and destinations for trip segments, departure and arrival times, trip frequencies, chains, and the route in the network. In some prompted-recall surveys, additional information is collected on what respondents were doing, who they were with, what the activity at the trip ends was, how the activity-travel episodes were planned, the time constraints at trip ends, the payment of fares, and so on. Qualitative data, such as attitude and opinion, can be also obtained in relation to the actual travel behavior. Most (if not all) trip and stage details include location coordinates (or geocodes). Mostly have geocoded the stage/trip details. 6. Methods for privacy protection Methods for privacy protection in GPS studies are similar to those employed in traditional surveys: – Institutional controls (IRB, ethics guides, confidentiality agreements, human subjects training) – Removing identifiers from travel information, including names and addresses – Physical and digital security, including the use of firewalls, secured storage, and protected databases, as well as data separation where data identifiers are kept separate from travel data – There have been recent efforts by the Department of Energy and the National Renewable Energy Laboratory to establish the Transportation Secure Data Center to improve access to GPS data while maintaining individual confidentiality. The right to participate and start the survey remains with respondents, and they can turn on/off the equipment whenever they desire to do so. Synthetic GPS traces and multi-trace compression can be used as a “We are experimenting with synthetic GPS traces that could be developed from actual trace characteristics but still be shared with the public as they won’t expose participant behaviors. Multi-trace compression is another approach we are using to ‘shelter’ the information in individual traces.” solution to deal with the privacy issue. 7. GPS coverage and accuracy compared to other methods The data received from GPS units (both spatial and temporal) are fairly accurate—certainly more accurate than traditional diary surveys— although minor corrections might be necessary to the logs from inaccurate location tagging. Researchers have found that GPS surveys result in higher trip frequencies. There are several well-known issues with the accuracy of GPS studies, including: – Signal losses through urban canyons, tunnels, and buildings; – Cold start and loss of signal at the beginning of trip; and – Need to apply an effective and suitable imputation method to fix these issues. “We have seen the quality of the GPS traces improve as the GPS technology has improved.” “The location data itself was also fairly accurate with very few corrections made to the logs from inaccurate location tagging. The accuracy as far as respondent- identified to algorithm-identified activity locations were above 95%.” “Recent work has shown that by using Wi-Fi most of the time and GPS only when Wi-Fi is not available, the draw on the battery can be much less. However, this increases the error on route delineation and does not provide sufficient information about travel speed (if someone wants speed).” (continued on next page)

40 Question Summary Relevant Quotes 8. Advantages of GPS There are several clear advantages to GPS studies as opposed to traditional surveys, including: – Ability to collect all movements, precise times, locations, and routes; – Respondent burden reduction on data collection, with the individuals not needing to remember exact times and locations; longer reporting and ability to collect multiple days of travel to examine variability of travel; – Ability to capture route choice and speed; – Improved data quality: not reliant on self-report; and – Ability to look at activity time/space prisms and use in travel micro-simulation. The use of GPS with smartphones also seems to allow for richer, more interactive data collection, where the location-based services provided by phone companies, Google maps, etc. can be used to enrich the data set. – Using a respondent’s smartphone reduces the cost and time compared to sending and retrieving GPS equipment. – Smartphones could be used for longitudinal studies, attitudinal surveys, daily travel, or long-distance travel diaries. GPS data collecting supports activity-based models and next-generation travel models with more detailed data. “[Use of smartphones] can be an excellent tradeoff – giving participants a significant cash [that is saved by not sending GPS units to them] incentive (toward their monthly cell phone bill) to participate in a travel behavior project.” 9. Limitations/ concerns about GPS Sampling bias in GPS studies is inevitable. It seems better to develop the analytical methods with the biased samples. Despite significant improvements, the life of batteries when GPS is constantly in use is still not satisfactory. Several solutions have been suggested to remedy battery life issues: – Collect GPS data less frequently by setting a longer data collection interval (i.e., not every second) – Use GPS along with other technologies (accelerometer) so the unit can go to sleep when the person is not in motion – Use combinations of GPS and Wi-Fi to collect data While some attributes (e.g., start and end time, speed, duration, and mode) could be detected, additional questionnaires might be needed to capture other travel attributes like trip purpose. There are signal losses in urban canyons, tunnels, and buildings. Privacy issues and securing fine-grained GPS records is a point of concern. 10. Pricing information Most researchers have performed small GPS studies, so their cost estimates significantly vary and do not reflect the true cost of a major GPS study. The survey cost is decreased as the price of devices drops. Assuming that a GPS mobile phone is given to a respondent for a month and he/she has to carry the mobile phone and update the web diary every day, the direct survey costs consist of shipping costs, communication costs, and monetary incentives for the respondent. “The devices were approximately $60, and we gave out a survey incentive of $35 per household. Other costs included the student employees to deliver the devices and provide about 1 hour of training to the respondents on taking the survey and using the device.” Table 1-12. (Continued).

41 Table 1-13. Summary table of transportation modeler responses. Question Summary Relevant Quotes 1. Current roles of GPS Common use of GPS subsample data for comparison to traditional diary methods Common use of GPS floating car data for travel time and delay studies, as well as for defining link-level speeds by time of day Common use of GPS for performance evaluation of transportation networks Use of smartphone application for collecting special use data (CycleTracks) Growing use of GPS as primary source of travel data for model development Use of AVL data for bus speed models and route planning Some analysis of GPS travel data in route choice Some use of GPS data for model calibration and validation “We have participated in SFCTA’s CycleTracks application to allow cyclists to voluntarily provide us route information for analysis. We are considering expanding this program and developing our own applications.” 2. Specific roles of GPS data in model development Development of trip correction factors Estimate trip rate variability from multiday data sets Provide baseline network speeds Calculating core travel time and distance statistics for trip purposes and demographics Use as primary source in revealed preference household survey Use as primary source for trip, tours, and activity patterns Estimate bicycle route choice model Validation of travel times in DTA “In a 100% GPS-assisted HTS (like the Jerusalem HTS), the GPS traces of individual person travel are the basis for extracting trips, tours, and activity patterns. In this case, and especially if the prompted-recall method is applied, the GPS data constitute the core component from which all other data items are derived, and not just correction factors. My personal view is that is the best approach.” 3. Secondary uses of GPS data GPS data commonly being shared outside of agencies for research purposes: – Air quality analysis – Active transportation (bike/walk) – Congestion analysis – NCHRP Project 8-57, SHRP 2 C04, and SHRP 2 L04 GPS travel survey data used for congestion analysis and bottleneck ID 4. Plans for future use of GPS data for travel demand modeling Plans to use GPS as the primary source of travel behavior data Find volunteered GPS data instead of conducting large recruiting efforts. Sampling bias is probably as bad as self-selection bias. Implement 100% GPS-based prompted recall, with multiday surveys (for at least 1 week) Use GPS for focused subsamples (visitors, taxis, trucks/commercial vehicles) Use GPS for surveys of visitors, taxis, and commercial vehicles to assist with modeling Explore data imputation methods for mode and purpose Use GPS data to generate models of transit customer trip times, access times, and wait times. Development of route choice models “We hope to use GPS trace data (with subsequent follow-up questions) more as the main data source for surveys and models. Hope that there are ways that more and more ‘volunteer’ GPS data can be used to support modeling, as opposed to (or in addition to) launching expensive surveys that purport to have ‘probability-based samples,’ but, due to inevitable sampling biases these days, probably aren't much better than self-selected samples (or at least have compensating biases).” “GPS will remain an integral part of household (HH) travel surveys, providing actual measured, revealed preference data on times, paths, durations, amounts, locations to supplement surveyed responses on motivation, purpose, costs, scheduling, etc.” (continued on next page)

42 Question Summary Relevant Quotes 5. Other sources of origin– destination data Several modelers are exploring the use of cell phone and sensor-based systems for origin–destination travel times. Several are currently using consumer data for improving baseline network data and congestion analysis. RFID data can be used for identifying transit trip start and end locations and trip travel times. “As it stands now this data can be used only for certain types of analysis (travel times and speeds as well as individual route choice trajectories). This data cannot replace HTS because it is non- behavioral in nature. The main difference between behavioral and non-behavioral data is that behavioral data includes characteristics of the individual (such as age, gender, and income) and characteristics of the associated daily activities and travel (trip purpose, car occupancy, other trips made on the same day, etc.). 6. Recent purchases of travel behavior data Bluetooth readers for travel time and special OD studies TomTom speed data for model validation and baseline speeds INRIX for real-time speed monitoring is also being used for network speed validation. TomTom speed data used to identify travel time reliability Considering AirSage OD data 7. Plans for short-term modeling Multiple efforts for building/integrating DTA models to support activity- based models Regional DTA tools to evaluate system improvements DTA to evaluate toll roads and managed lane projects “Observed activity patterns would also be useful in validating activity-based models in addition to using for short-term forecasts. Having the data at the activity- pattern level allows one to more easily test mode and time of day shift possibilities, among other things.” 8. Key data needs for long- range modeling Better baseline network and demographic data Better LOS skims Better data on hard-to-reach populations Identification of long-term trends in changing travel habits Better survey data – Spatial and temporal accuracy – Oversample some population groups – Complete/accurate spatial traces Better networks Better parking location information Revealed travel data Traveler stated preferences Tourist and visitor surveys for major cities Taxi models Delivery vehicle models Ideally, real-time data would feed self-calibration model routines. Reliability metrics Data for understanding the inertia effects on shifting patterns over time Passive data that can have key behavioral information imputed “Long-term forecasts require both the revealed travel data that can be obtained from counts, GPS and cell phone/AVL, and also surveyed data, which, in the future, needs to focus more on preferences, scheduling, flexibility, purposes, etc. rather than respondents trying to tell you what they did (which will be measured instead).” 9. Key travel behavior data issues for activity-based models Completeness of household interactions and schedule coordination Complete household travel data (no missing persons or trips) Geocoded data that can be tied to parcel location Completeness of individual daily patterns Intra-person time/space consistency Inter-person, intra-household consistency Survey information for an entire week Observed behavior Long-term and travel-related decisions “Quality is important for all aspects; less good data is better than more bad, focus needs to be more on quality than sample sizes.” “Data fidelity and resolution, finer level grain of detail.” Table 1-13. (Continued).

43 Question Summary Relevant Quotes 10. Benefits of GPS for understanding travel behavior Increased accuracy of spatial and temporal travel details Completeness and minimization of underreporting Spatial and temporal resolution Reduced burden on the respondent to report addresses and timing for all trips/locations, thereby significantly speeding up the survey Attractive high-tech image of the survey, especially if the prompted-recall method applied is integrated with GIS (such as in the case of the Jerusalem HTS, where the recruitment rate was 70%–80%) Possibility of collecting data non-invasively for multiple days with subsequent automated imputations of travel modes, purposes, and other data “A fusion of the traveler’s own experience (GPS) with adjacent system conditions (Bluetooth/INRIX) and an effective means for gathering and recording traveler perceptions of conditions and reactions.” 11. Disadvantages of GPS for understanding travel behavior Sampling issues due to reliance on traditional recruitment methods Specific errors associated with GPS not as well understood relative to traditional survey methods Need for auxiliary data in addition to GPS trace (can’t just use passive data) Signal issues in some locations Respondent burden of special GPS device Cost Limited set of tools for processing GPS into trips “GPS has a high cost, and charging requirements coupled with device costs create a high respondent burden. The emergence of smartphone technology provides a great opportunity to bring down the costs of data collection and analysis.” “GPS has its own potential data error/quality issues that are not as well understood yet as the types of errors/biases that are inherent in travel diary surveys.” Table 1-13. (Continued).

Next: Chapter 2 - Summary of Best Data Sources and Methods to Test »
Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests Get This Book
×
 Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB’s National Cooperative Highway Research Program (NCHRP) Report 775: Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods, and Tests describes the research process that was used to develop guidelines on the use of multiple sources of Global Positioning System (GPS) data to understand travel behavior and activity. The guidelines, which are included in NCHRP Report 775, Volume II are intended to provide a jump-start for processing GPS data for travel behavior purposes and provide key information elements that practitioners should consider when using GPS data.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!