National Academies Press: OpenBook

Cell Phone Location Data for Travel Behavior Analysis (2018)

Chapter: Chapter 6 - Measuring Individual Activities: Home, Work, Other

« Previous: Chapter 5 - Extraction of Daily Trajectories
Page 69
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 69
Page 70
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 70
Page 71
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 71
Page 72
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 72
Page 73
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 73
Page 74
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 74
Page 75
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 75
Page 76
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 76
Page 77
Suggested Citation:"Chapter 6 - Measuring Individual Activities: Home, Work, Other ." National Academies of Sciences, Engineering, and Medicine. 2018. Cell Phone Location Data for Travel Behavior Analysis. Washington, DC: The National Academies Press. doi: 10.17226/25189.
×
Page 77

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

69 6.1 Roadmap to the Chapter This report has so far discussed raw cell phone data, ways to remove noise from call detail record (CDR) data, and methods for extracting meaningful stay points that reflect locations where individual activities are anchored. To derive reliable origin–destination (O-D) trips by purpose, it is important to identify activity types that correspond to “home,” “work,” and “other” stay locations. Practitioners traditionally use household survey data to establish respondents’ locations of home, work, and school and to study educational, recreational, shopping, and personal business purposes in detail. This chapter first reviews the analysis steps needed to identify activity types at the inferred home, work, and “other” stay locations detected by the CDR data. Sample expansion is key for practitioners who develop detailed weights by comparing survey and population totals for selected household characteristics and market segments. This chapter presents the filtering of the CDR sample to remove observations with infrequent cell phone use. The expansion of active cell phone users to the metropolitan population is also discussed. Taken together, the inference and sample expansion methods for CDR data discussed in this chapter provide the building blocks for developing estimates of total trip making and O-D person- trip tables at a regional level. Finally, the chapter compares the expanded home and work trips produced and attracted on the basis of CDR data versus the Census Transportation Planning Products (CTPP) journey-to- work data. The Boston, Massachusetts, region is used as a case study to make these comparisons. 6.2 Activity Inference 6.2.1 Goal and Approach It is well documented in the transportation literature that trips are induced by the need or desire to engage in activities (Manheim 1979, Pinjari and Bhat 2011). Therefore, an understand- ing of patterns and types of activities is crucial in deriving estimates of travel flows and travel demand. Recent studies from various cities (Alexander et al. 2015, Colak et al. 2015, Toole et al. 2015) have used cell phone data at city scale and with low costs to enhance knowledge about human mobility and methods of estimating O-D trip tables. It has been demonstrated that human mobility patterns are characterized by regularity, with frequent returns to previously visited locations (González et al. 2008; Song et al. 2010a, 2010b; Schneider et al. 2013; Jiang et al. 2013; Hasan et al. 2013). Because of this predictability, stay activities for users’ most-visited locations C H A P T E R 6 Measuring Individual Activities: Home, Work, “Other”

70 Cell Phone Location Data for Travel Behavior Analysis can reasonable be inferred from observations of cell phone records made over multiple days. For each user, the stay extraction process detailed in Chapter 5 results in a time stamp and dura- tion for each observed visit to a stay location. With trajectories of stay points without the noise of raw data, the next step is to infer contextual information about each location. There are two approaches to infer activity types. The first approach depends only on cell phone data and estimates activity types on the basis of circadian rhythms and regularities exhibited in human mobility. Alexander et al. (2015) and Colak et al. (2015) improved on methods introduced by Wang et al. (2012) and Iqbal et al. (2014) by using visitation frequency and temporal data to infer contextual information such as a location’s function or trip purpose. The second approach incorporates additional land use information and data on points of interest in addition to the individual trips extracted from cell phone data to infer activity types in detailed categories such as home, work, recreation/leisure, shopping, and other. For example, Jiang et al. (2013) proposed to infer activity types on the basis of dependencies among daily mobility motifs, temporal information about trips, and data on land use and points of interest. Widhalm et al. (2015) demonstrated the estimation using a relational Markov network with Vienna, Austria, and Boston as examples. They found that the inferred activity clusters were stable across days. Widhalm et al. (2015) also pointed out limitations of the approach in areas with mixed land use with regard to inferring detailed activities. This section focuses on the first approach for three reasons: • Its simplicity for application and limited requirements for external data make the analysis approach functional and standard. • Detailed land use and point-of-interest data may not be available consistently throughout the country. Therefore, it is unclear whether commercial vendors can use the second approach consistently. • The Boston region is highly urbanized and has several zones with mixed land use. Breaking down the actual activity at the trip end for areas with mixed land use is tricky and may intro- duce errors. This section describes the assumptions and methods used in the research team’s approach to assigning an activity type of home, work, or “other” to each user’s stay locations and validates the number of trips produced and attracted. In Chapter 7, the distribution of trips by purpose and by time of day is discussed and the results from CDR models are compared with those from traditional survey summaries and regional model outputs. In Chapter 8, travel flows are exam- ined and O-D trips are compared by trip purpose and time of day with results from the Boston Metropolitan Planning Organization’s travel demand model. 6.2.2 Algorithms 6.2.2.1 Inferring Home Location Each user’s home location was identified as the stay location that had the most visits on week- ends and on weekday nights as defined by a time parameter specific to the local context. This parameter represents the time window(s) during which users are expected to spend a substantial amount of time at home. For the study area, the period between 7 p.m. of a given day and 8 a.m. of the next weekday was defined as a weekday night. In addition to inferring trip purpose, the home stay location of each user was also used to filter out users with too few data points. Another important function of the home location is that it provided control totals for data expansion from the sample of cell phone users to the population in the study area.

Measuring Individual Activities: Home, Work, “Other” 71 6.2.2.2 Inferring Work Location Inferring a user’s work location involves considerably greater uncertainty than inferring his or her home location. Two methods of inferring a user’s work location, each of which is based on different assumptions, are discussed below. Conservative Model. The assumption of this model is based on the rationale and historical evidence (Levinson and Kumar 1994, Schafer 2000) that, for a given frequency of visits, longer- distance trips are more likely to be work trips than are shorter-distance trips, which are more likely to be for nonwork purposes such as going to a nearby grocery store. A work location is identified as the stay not previously labeled as home to which the user trav- els the maximum total distance from home max(d * n), where n is the total number of visits to a given stay on weekdays between, for example, 8 a.m. and 7 p.m. and d is the straight-line distance between the home stay location and the given stay as calculated by plane approximation. If the user visits the identified work stay less than once per week on average (i.e., eight times in total during the observation period of 8 weeks), or if the distance is less than 0.5 km (d < 0.5), then the activity of the stay region is identified as “other” rather than as work. In effect, not all users are assigned a work stay, in recognition that not all users commute to a job. These classifi- cation assumptions serve to avoid falsely identifying a location as work that either is not visited frequently enough to be a work location or is close enough to a user’s home that it could reflect signal noise rather than a distinct work location. Relaxed Model. The second approach relaxes the condition for labeling as work. A user’s work location is defined as the stay point other than home that a user visits most often during the daytime on a weekday between the hours of 8 a.m. and 7 p.m. Because many individuals do not work, the work location is left blank if the candidate location is not visited more than once per week or if the location is less than 500 meters from the home location. Neither approach to inferring work location distinguishes between work and school or uni- versity locations. The work location must actually be considered the location for the most com- mon mandatory activity. Of the two methods, the more conservative method may be better suited to modeling work activity because it minimizes the error of falsely assigning the label “work” to nonwork locations. 6.2.2.3 Inferring “Other” Locations All remaining stay locations that were not identified as either home or work locations were designated as “other.” Future research can expand the designation “other” to reflect activity types such as school, shopping, recreation, and social. To further distinguish activities at this level of detail, additional contextual information such as detailed land use data will most likely be required. The research team used “other” to represent all nonhome and nonwork activities. The team acknowledges that under these simple assumptions, a user’s true home and work locations might be misidentified, along with their corresponding trip purposes. For example, a school activity might be misidentified as a work activity if it satisfies the conditions assumed by the work location identification model. However, the comparisons with Census data presented in the model validation in Section 6.3 suggest that the procedure offers good estimates of the distribution of home and work locations and home–work flows in the study region. These assumptions are also related to the length of the observation period and the spatial resolution of this CDR data set. It may therefore be necessary to adjust the criteria used for applications of this method to other data sets. The following sections illustrate the results in space and time by implementing the activity inference algorithms that have been discussed here.

72 Cell Phone Location Data for Travel Behavior Analysis 6.2.3 An Individual Example Figure 6-1 shows the results of analysis of 3 days’ worth of data by the student who voluntarily donated his self-selected cell phone data, collected over a period of 18 months, to the Massachu- setts Institute of Technology HuMNet Lab for research purposes. The spatial distribution of the raw data is shown in Figure 4-11. Figure 6-1 shows the sequence that translates the raw data into the inferred activity types for this individual. The extracted stays were developed in three steps: • The raw cell phone data over the course of 18 months are shown as blue points in Figure 6-1a; • The raw data of each selected day are the purple dots in Figure 6-1b; and • The extracted stays for the day are shown as red circles in Figure 6-1c. Source: Jiang et al. 2016. Figure 6-1. Inference of student’s activities on basis of 3 days of data.

Measuring Individual Activities: Home, Work, “Other” 73 The inference of activity types at the extracted stays are also shown: • Home is represented by the yellow-faced circles in Figure 6-1, d–f ; • Work is represented by the blue-faced circles in Figure 6-1, d–f ; and • “Other” is represented by the red-faced circles in Figure 6-1, d–f, and the green-faced circles in Figure 6-1, d and f. The number next to each stay location represents the visitation sequence within each day. Trips from one stay location to another are color-coded with the same color as the activity type at the destination. For example, in Figure 6-1d, the trip from home to “other 1” is in red, the trip from “other 1” to work is in blue, the trip from work to “other 2” is in green, and the trip from “other 2” to home is in yellow. Above Figure 6-1, d, e, and f, there is a time bar for a 24-hour period. The color represents the inferred activity type while the length of the bar shows the inferred duration of the activity. The methods used to infer trip departure time are discussed in detail in Chapter 7. 6.2.4 Sample Filtering and Expansion 6.2.4.1 Sample Filtering By definition, cell phone CDR data will not provide a full picture of some users’ travel pat- terns. In particular, users who do not use their cell phone often for calls, texts, or Internet data access will yield too few events to allow for meaningful extraction of their stay locations and travel patterns. To address this question, the research team filtered out observations with fewer than eight visits over the 2-month period to home stays. This filter corresponds to less than one visit per week on average to designated home stay locations. This filter served the additional purpose of ensuring, within a reasonable degree of certainty, that the designated stay was the user’s home, a key assumption in the team’s method of expanding users to population. This filtering process by definition excludes visitors for whom a home location is not observed in the data. Future research could look at extracting visitor trips from CDR data by using an assumption other than home location to expand these trips. Following the application of this filter, 335,795 users remained in the Boston CDR data set. This sample size is an order of magnitude larger than that of most household travel surveys and could also increase, given a longer period of observation. 6.2.4.2 Sample Expansion To expand the filtered sample of cell phone users to the total population of the study region, the number of home stays was aggregated to Census tracts in the Boston metropolitan area. An expansion factor was then calculated for each tract as the ratio of the 2010 Census population and the number of residents identified in the CDR data. There were a few Census tracts with fewer than 10 CDR residents. For those tracts, the expansion factor was set to zero to ensure that users who might not be representative of a given Census tract were not overweighted. Figure 6-2 shows the distribution of the expansion factor values. The values of the first, sec- ond, and third quartiles of the expansion factors were 9.4, 14.2, and 25.1, respectively. Figure 6-2 also illustrates the spatial distribution of the expansion factors. The research team’s analysis sug- gests that the tracts in the suburban western portion of the study area tend to be more heavily weighted than the core central area, which is better represented in the sample. The availability and analysis of cell phone CDR data for a period greater than 2 months would most likely require lower expansion factors and result in better spatial distribution of users. The

74 Cell Phone Location Data for Travel Behavior Analysis analogy with traditional surveys is the increase in sample size and the focus on geographic and socioeconomic market segments to improve the representativeness of the sample. 6.2.4.3 Analogies with Household Surveys There are some interesting analogies that are worth noting when the cell phone expansion factors are compared with the sampling weights in a traditional household survey. First, the motivation behind the sampling weights in traditional surveys is the difference in making contact and the willingness to participate in a survey. Both of these steps in traditional surveys require a correction through the development of sample weights that expand the sample to be more representative of the regional population. • These survey sampling weights reflect the under- and over-representation of certain geo- graphic and socioeconomic market segments in the survey. Implicit in sample weighting is the need to adjust the representation by members of these market segments to avoid a model that under- or overpredicts travel in the region. • In the case of the cell phone data, the need for expansion factors similar to the sampling weights is directly linked to the market penetration and use of cell phones during a typical day. Younger, more educated, and more technology-savvy users of cell phones are likely to provide more traces of their daily routines through their increased use of calls, text messages, and Internet data access when visiting websites or receiving passive signals from apps. Second, the cell phone expansion weights are smaller in magnitude compared with the sampling weights of a traditional household survey. This is an expected result, given that a sampling rate of 1% in a traditional survey would correspond to an average expansion factor of about 100, with higher values for difficult-to-reach geographic and socioeconomic market segments. Third, one expects to find differences due to geography and socioeconomics for both tradi- tional surveys and cell phone use. • In traditional surveys, large households and households that include respondents who are younger, rely less on phone land lines, have lower incomes, and live in urban areas are likely Source: Alexander et al. 2015. Figure 6-2. Expansion factors for Census tracts: (a) probability distribution of expansion factors for Census tracts and (b) spatial distribution of expansion factors at Census tract level.

Measuring Individual Activities: Home, Work, “Other” 75 to have a lower response rate compared with older, suburban respondents who are more likely to be contacted and to respond to a survey. • However, market penetration is higher and the usage of cell phones more extensive among younger cohorts of the population. Data and analyses based on a cell phone sample are likely to reflect the travel behavior and habits of some of the hard-to-reach segments of the house- hold survey. 6.3 Validation 6.3.1 Productions and Attractions Accurate extraction of users’ stays and proper expansion to the regional population are critical to trip generation and estimates of total travel in a region. The regularity of human behavior (González et al. 2008; Song et al. 2010a, 2010b; Jiang et al. 2013) enabled the research team to infer users’ home stays and, where applicable, their work stay locations from the CDR data. The spatial distribution of home and work locations, when aggregated to the 164 cities and towns in the study area (MassGIS, 2014), looks reasonable. Chapters 7 and 8 discuss in more detail the effect of geographic aggregation on how the results compare with traditional data and models. Figure 6-3a compares home locations by town on the basis of 2010 Census data and raw and expanded CDR data. As expected, given that the Census tract population was used to expand the data, the number of residents in each town is almost identical to the estimates from the expanded CDR data. The raw CDR home location data are shown as hollow red circles; the expanded CDR home location data are shown as solid red bullets. The slope of a best-fit line through the raw CDR data is also close to 1, which strongly suggests that the overall distribution of CDR users in the raw data is fairly representative and that a simple factoring method is appropriate for expanding the cell phone users to the total population and its distribution across the region. Source: Alexander et al. 2015. (a) C en cu s P op ul at io n Scaled CDR CDR Residents Raw CDR 107 106 105 104 103 102 101 100 107106105104103102101100 107 106 105 104 103 102 101 100 107106105104103102101100 (b) Scaled CDR CDR Workplaces C T P P W or kp la ce s Raw CDR Figure 6-3. Comparison of scaled CDR residents with Census data by town: (a) residents estimated with CDR data versus 2010 Census population by town before and after population expansion and (b) workers estimated with CDR data versus 2013 CTPP workers by town before and after population expansion.

76 Cell Phone Location Data for Travel Behavior Analysis Figure 6-3b also compares work locations aggregated at the town level. As with the raw CDR data on the home end, the distribution of raw workplaces is fairly consistent with the 2006–2010 CTPP. The raw CDR work location data are shown as hollow blue circles; the expanded CDR work location data are shown as solid blue bullets. The data slope is again approximately 1, and the sample expansion method adjusts well for the differences in magnitude across towns. This strong correlation is noteworthy, con- sidering that each user’s home and work locations were expanded on the basis of their home location only. 6.3.2 Commuting Flows in Space Comparisons of the CDR and the CTPP data sets were also made by using town pair flows. Figure 6-4 shows the CDR and CTPP home-to-work flows for all of the intratown and intertown pairs, with correlations of 0.99 and 0.95, respectively. Figure 6-4 strongly suggests that the validation of town pairs that have many trips is better than that of pairs with few trips, especially those with fewer than about 500 daily trips. This trend is most likely due to the scarcity of data for the smaller markets. Figure 6-4 also shows the trip-length distribution of daily home-to-work flows derived from the CDR data and the 2006–2010 CTPP data. The results show that the estimated home-to-work flows from the CDR data are close to those reported in the CTPP data at a town level. Figure 6-4, c and d, illustrate spatially the distribution of home-to-work flows for key markets (intertract pairs with greater than 1,000 daily trips) for the CDR and Census data, respectively. These patterns suggest that the CDR data capture patterns similar to those of the CTPP commut- ing data, with the majority of flows directed in and out of Boston as well as a few shorter-distance markets in suburban towns. 6.4 Summary The definitions of activities, their location, and the duration spent at each stop are key ele- ments of any trip-based or activity-based model. This chapter describes the research methods used to extract three locations—home, work, and “other”—from cell phone CDR data. The steps needed to detect and identify these stay locations are reviewed and the different algorithms used for this purpose are discussed. The chapter presents how the sample was filtered by removing observations for which cell phone use during the period of observation was infrequent. The population-based expansion of the data from active cell phone users to regional totals is discussed, and the expanded home and work trips produced and attracted from CDR data are compared with those from the CTPP journey-to-work data for the Boston region. The home, work, and “other” locations are key designations for practitioners who tradition- ally use household survey data to establish respondents’ locations of home, work, and school. Surveys are also analyzed to describe in great detail nonwork travel related to educational, rec- reational, shopping, and personal business. The sample expansion presents a key consideration for practitioners. In traditional approaches, the distribution of sample observations is compared with population totals for market segments reflecting household characteristics. The expansion weights typically account for differences in household size, number of workers, and number of autos per household.

Measuring Individual Activities: Home, Work, “Other” 77 The CDR data, however, are anonymized and do not include socioeconomic characteristics. The sample expansion used a simpler approach that compares the total population in each Cen- sus tract with the number of cell phone users who live each tract. Taken together, the inference and sample expansion methods discussed in this chapter pro- vide the building blocks for developing estimates of total trip making and O-D person-trip tables at a regional level. These estimates are refined further in Chapter 7 by focusing on trip tables by purpose and time of day. Source: Alexander et al. 2015. (c) (d) (b)(a) Figure 6-4. Work trips: (a) travel flows, (b) trip lengths, and O-D trip patterns for (c) CDR data and (d) Census data.

Next: Chapter 7 - Trips by Purpose and Time of Day »
Cell Phone Location Data for Travel Behavior Analysis Get This Book
×
 Cell Phone Location Data for Travel Behavior Analysis
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB's National Cooperative Highway Research Program (NCHRP) Research Report 868: Cell Phone Location Data for Travel Behavior Analysis presents guidelines for transportation planners and travel modelers on how to evaluate the extent to which cell phone location data and associated products accurately depict travel. The report identifies whether and how these extensive data resources can be used to improve understanding of travel characteristics and the ability to model travel patterns and behavior more effectively. It also supports the evaluation of the strengths and weaknesses of anonymized call detail record locations from cell phone data. The report includes guidelines for transportation practitioners and agency staff with a vested interest in developing and applying new methods of capturing travel data from cell phones to enhance travel models.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!