National Academies Press: OpenBook

Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys (2016)

Chapter: Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys

Page 1
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 1
Page 2
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 2
Page 3
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 3
Page 4
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 4
Page 5
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 5
Page 6
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 6
Page 7
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 7
Page 8
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 8
Page 9
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 9
Page 10
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 10
Page 11
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 11
Page 12
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 12
Page 13
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 13
Page 14
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 14
Page 15
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 15
Page 16
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 16
Page 17
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 17
Page 18
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 18
Page 19
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 19
Page 20
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 20
Page 21
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 21
Page 22
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 22
Page 23
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 23
Page 24
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 24
Page 25
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 25
Page 26
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 26
Page 27
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 27
Page 28
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 28
Page 29
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 29
Page 30
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 30
Page 31
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 31
Page 32
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 32
Page 33
Suggested Citation:"Research Results Digest 400 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys ." National Academies of Sciences, Engineering, and Medicine. 2016. Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys. Washington, DC: The National Academies Press. doi: 10.17226/24614.
×
Page 33

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

NATIONAL COOPERATIVE HIGHWAY RESEARCH PROGRAM October 2016 Responsible Senior Program Officer: Lawrence D. Goldstein C O N T E N T S Summary, 1 Chapter 1 Introduction, 2 Chapter 2 Data and Method, 7 Chapter 3 Bias Checks, 13 Chapter 4 Results for Tabular Data Summaries, 15 Chapter 5 Results for Model Estimation, 19 Chapter 6 Summary of Results, 24 Chapter 7 Conclusions and Recommendations, 26 References, 31 Appendixes, 32 Research Results Digest 400 SAMPLE SIZE IMPLICATIONS OF MULTI-DAY GPS-ENABLED HOUSEHOLD TRAVEL SURVEYS This digest summarizes key findings of research conducted in NCHRP Project 08-36/Task 123, “Survey Sample Size and Weighting.” This digest is based on the project final report by Louis Rizzo of Westat and Gregory D. Erhardt of RAND Corporation. The appendixes to the project’s final report are available to download from the TRB website. SUMMARY There has long been interest in conduct- ing travel surveys in which the same respon- dent provides information for multiple travel days, but such surveys have been limited by the challenge of ameliorating biases result- ing from survey fatigue. More recently, new technology (in the form of GPS-enabled travel surveys) has made multi-day data collection more practical by offering the promise of a lower respondent burden. Despite this progress, questions remain about the value of additional days of sur- vey data versus the value of additional re- spondents. NCHRP Project 08-36/Task 123 seeks to address these questions by (1) eval- uating whether or not GPS-enabled multi- day surveys overcome the survey fatigue challenges faced with multi-day diary surveys, (2) investigating the effects of using multi-day data for developing travel demand models, and (3) providing empirical evidence on the sample size implications of multi-day versus single-day surveys. These questions are explored using data from the 2012 Northeast Ohio Regional Travel Sur- vey, a GPS-enabled household travel sur- vey (HTS) in the Cleveland region. The survey featured a design in which (1) sepa- rate GPS-only versus GPS-with-prompted- recall samples were collected (allowing for comparison of these approaches) and (2) travel data for the GPS-only sample was collected for 3 to 4 consecutive days (allowing for a multi-day analysis). The data was used to (1) estimate mod- els of key travel choices (e.g., travel demand models) and (2) generate tabular data sum- maries of those same travel choices. The key travel choices were selected to illus- trate the effect of sample size and number of days on different behavioral components and include models of automobile owner- ship, tour generation, destination choice, and mode choice. Separate estimations and tabulations were generated using separate 1-day, 2-day, and full-sample data files to demonstrate how the results varied among the three. An important issue in working with multi-day travel survey data is known as the repeated measurement problem. When multiple observations are taken from the same individual or household, those ob- servations are likely to be correlated across days, and thus contain less new informa- tion than an independent observation taken from a new household. For example, most people can be expected to go to work at the same place every day, although they may participate in non-work activities at different locations on different days. Mod- els estimated from such data will have

2incorrect. This research dem onstrated how the jack- knife can be used to estimate the true variance and overcome these issues. The estimated design effects and a values are specific to each parameter within each model com- ponent, and the research found a wide range of val- ues. The median a value for automobile ownership was 100%, indicating perfect correlation across days (which is to be expected, and serves to verify the calculations), while the median a value for a table of total tours by person type and tour purpose was 4.4%, indicating a high degree of day-to-day varia- tion for each person. The median value for tabular data was 26% when averaged across model com- ponents, and the 75th percentile value was 32%. The median value for estimated model parameters was 52% when averaged across model components, and the 75th percentile value was 64%. How these a values are combined depends on the analyst’s preference for which model parameters are viewed as most critical and is ultimately a mat- ter of judgment. Assuming one proceeds with the importance weights assigned, the analysis shows that the 3-day GPS-only sample of 2,780 households in the Northeast Ohio Survey would be equivalent to a single-day sample of between 3,631 and 5,487 house- holds (32% to 97% higher), depending on how con- servative one wants to be. Based on this analysis, it appears reasonable to conduct surveys with 2, 3, or 4 days of data collection per household, depending on how one combines the different terms and depending on the relative cost of collecting additional households versus additional travel days. This evidence would argue against very long surveys with a week or more of data collection, unless the cost factor is very low or the intended purpose is very specific. Furthermore, regardless of the number of days, enough households must be sampled to meet the needs of models with little or no day-to-day variation. This research was based on a detailed analysis of a single household travel survey. Future research could repeat the analysis on additional data sets, especially those with no bias across travel days. CHApTER 1 INTRODuCTION NCHRP Project 08-36/Task 123 was a study of the design of household travel surveys. Many house- hold travel surveys have collected 1 day of infor- mation only, due to the fatigue of respondents from unbiased parameter estimates, but the variance esti- mates will be flawed. Therefore, the key to under- standing multi-day data is to measure the relative amounts of intra-person day-to-day variability versus inter-person variability. This research used a statistically rigorous method for measuring the relative amounts of intra-person day-to-day variability versus inter-person variability. This method estimates the true variance to calculate a set of design effects and correlation coefficients. The design effect is the ratio of the true variance to the variance assuming a simple random sample, indicating the relative value of adding survey days versus adding households to the sample. The cor- relation coefficient, or within-person a value, is a measure of how consistent person-days are within persons as measured by the variance estimates. A 100% a value would indicate a very high correlation across travel days, indicating that the extra days pro- vided no new information, such as would be expected with automobile ownership. A 0% a value would indicate no correlation across travel days, such that sampling additional days would add as much value as sampling additional households. To use multi-day survey data to develop an oper- ational travel model, the data must be consistent across collection days. Unfortunately, the GPS-only sample in the 2012 Northeast Ohio Regional Travel Survey did not meet this prerequisite. The data shows significant differences across collection days within the GPS-only sample, including fewer trips and less non-automobile travel beyond Day 1. In addition, the GPS-only sample and the GPS-with- prompted-recall sample differ significantly. While the GPS-with-prompted-recall method appears to be a reliable way of collecting detailed data about travel behavior, the GPS-only approach needs further de- velopment before it can be relied on as a stand-alone paradigm—both to ensure consistency across days and consistency with the GPS-with-prompted-recall sample. Research improving these methods could be beneficial. Assuming consistent multi-day data can be col- lected, it is acceptable to use such data in travel model development, both for estimating models and for generating tabular summaries. The parameter es- timates when doing so will remain unbiased, but the variance estimates will be flawed due to the repeated measurement problem. In practical terms, this means that the analyst should expect the reported t-statistics to be inflated and that the statistical inferences may be

3maintaining diaries or recalling travel. Multi- day travel surveys are now more feasible, given Global Positioning System (GPS) technology. The assumption with these surveys is that using a GPS device is less burdensome than completing a travel diary, so there should be less drop-off in response rate for the subsequent days. Tracking households over time for multiple days will have differing prop- erties, depending on the type of travel being studied and characteristics of the household and region. The tradeoff between the number of households and the number of days is critical and needs to be rigorously evaluated. This project was designed to develop a way to evaluate these issues and provide travel survey designers with tools for evaluating trade- offs rigorously. 1.1 Background 1.1.1 The Multiple Uses of Travel Surveys In answering questions about sample size, it is critical to consider how the data will be used. Often, sample sizes are determined based on the maximum allowable error around a certain measure or set of measures. Although such an approach may be appro- priate for a public opinion survey (where the goal is to understand, for example, the percent of people who would vote for a particular candidate), such an approach oversimplifies how travel survey data is used in practice. In addition to providing basic observations of travel behavior, a key use of household travel survey (HTS) data is to develop travel demand models used to forecast the behavioral response to transportation infrastructure and service changes. In model devel- opment, such data plays a role both in model estima- tion and in model calibration. For model estimation, the data is used to statistically estimate model coef- ficients for key travel choices (e.g., the sensitivity of mode choice to travel time and cost). Aggregate tabulations of the weighted and expanded data are also used to calibrate the outputs of the implemented travel model. Sample size will affect both the coef- ficient estimates and the weighted data tabulations. Therefore, it is important to analyze the data in the way it will ultimately be used. The practical chal- lenge in doing this is both in providing sufficient details as they relate to the different uses and in deter- mining how to value those detailed analyses so as to determine “the bottom line.” The Southeast Florida Transportation Council (2014) provides discussion of these issues and how they may affect sample size decisions. 1.1.2 The Motivation for Multi-Day Surveys Before GPS, most travel studies tended to be single-day studies. There has long been an interest in conducting multi-day household travel surveys, motivated in large part by three factors: (1) the potential to reduce survey costs by sampling fewer households; (2) by better distinguishing between- person versus within-person variance, multi-day sur- veys could provide more reliable model parameter estimates under certain conditions; and (3) multi-day surveys allow the development of more sophisticated travel models that account for day-to-day dynamics. Multi-day travel surveys could reduce survey costs by requiring sampling of fewer households. This potential exists because much of the survey cost reflects recruiting respondents, so the cost of add- ing a second day of data collection is expected to be less than the cost of recruiting a second household. Regarding model estimation issues, the chal- lenge is that it is difficult to distinguish between- person versus within-person variance in single-day surveys. This limitation existed from very early on. Pas (1987) presented a framework for understand- ing these issues using a basic analysis of variance breakdown for any continuous travel measure per person per day. A similar situation exists in the context of stated preference (SP) surveys. In a stated preference sur- vey, respondents are asked what they would do in a set of hypothetical scenarios. Typically, each respondent will be asked to answer for multiple experiments (i.e., hypothetical scenarios), in which the descriptive vari- ables, such as time and cost, are varied for each. By collecting multiple responses per person, the number of people surveyed and the resulting cost of the sur- vey are reduced. Cherchi and Ortúzar (2008) and Rose et al. (2009) addressed the tradeoff of respondents versus responses per person by using simulated data and systematically varying both dimensions. These researchers found that, even beyond any cost effects, it is better to have multiple responses per individual than more single responses (e.g., 500 people giving two responses rather than 1,000 people giving a single response). In addition to the aforementioned estimation issues, multi-day surveys could be used to develop

4were less likely to report trips as they became more fatigued. In addition, the Chicago survey collected part of the sample using a 1-day travel diary, allow- ing for a comparison of the 1-day and 2-day diaries. This comparison showed that both days in the 2-day sample had lower trip rates than the 1-day sample, indicating that respondents were less likely to report trips, even on the first day. 1.1.4 GPS as an Enabling Technology As GPS technology began to emerge into com- mon use in the 2000s, it began to be used as a tool in conducting travel surveys. NCHRP Report 775 (Wolf et al. 2014) addresses the use of GPS to un- derstand travel behavior, with GPS use in household travel surveys an important component of that report. Several mechanisms for using GPS in surveys were identified. The most common, thus far, has been to collect a GPS-enhanced sub-sample that can be used to correct for under-reporting of trips in the main sample. A second means is to conduct a GPS-based prompted-recall survey. In this approach, the GPS is first used to passively measure the respondent’s travel pattern, and the respondent is then contacted and asked for details about that travel pattern. A third approach is an all-GPS survey, in which the GPS provides passive data collection, with trip pur- poses and travel modes inferred from the GPS traces themselves. NCHRP Report 775 provides a detailed review of the history and types of GPS-enhanced travel surveys, and readers are referred there for further information. A key benefit of GPS technology is that it makes data collection more passive and reduces the re- spondent burden. In turn, this enables multi-day surveys, such as the 2012 Northeast Ohio Regional Travel Study used in this research, which combines a GPS-with-prompted-recall sample and a GPS- only sample. 1.1.5 The Repeated Measurements Problem and How to Correct for It The standard method for estimating travel mod- els is to treat each record as an independent observa- tion. When the observations are not independent, as is true when collecting multiple days of travel in- formation from the same individual, the “repeated measurements problem” occurs. The basic problem is that there is less information in the repeated mea- more sophisticated travel models that capture day- to-day variations in travel. The applications them- selves were not the focus of this research, but a few examples are provided to suggest the types of models that could be developed. As noted in the discussion of Cherchi and Ortúzar (2008), data with multiple responses per individual may provide an enhanced ability to estimate models that account for taste het- erogeneity. Such models may reflect, for example, that travelers may have significant variations in their values of time, beyond what can be captured by an income effect alone. This can be important when modeling toll roads or congestion pricing, as was done by Erhardt et al. (2008) and in subsequent activity-based models. Kang and Scott (2009) pro- posed a model of the day-to-day dynamics of activ- ity participation, accounting for intra-household interactions. They found, for example, that tele- workers are more likely to participate in joint out- of-home activities with other family members on weekdays, but not on weekends. Bhat, Srinivasan, and Axhausen (2005) used hazard models to ana- lyze the time between activities using multi-week data in Germany. They found a weekly rhythm to participation in certain types of activities. Xu and Guensler (2015) used multi-day survey data to define the concept of a personal modality style, while Dill and Broach (2015) used multi-day vehicle-based GPS data to examine travel time reliability. For fur- ther reading on models developed from multi-day travel survey data, Kang and Scott (2009) provide a good literature review. 1.1.3 The Challenges of Multi-Day Surveys Multi-day travel studies before GPS technol- ogy were difficult because they were dependent on travel diaries that needed to be collected from coop erative household respondents. Among the literature that documents problems with bias from “diary fatigue” are Meurs et al. (1989), Golob and Meurs (1986), and Murakami and Watterson (1992). Pendyala and Pas (2000) provide an overview of experience with travel diaries before the turn of the century. Parsons Brinckerhoff et al. (2014) reported that the 2000 Bay Area Travel Survey, the 2001 Atlanta Regional Travel Survey, and the 2006 Chi- cago Regional Travel Survey collected travel diary data for 2 travel days, and, in all three cases, the trip rates for the second travel day were significantly lower than for the first, indicating that respondents

5team used the jackknife to provide an unbiased esti- mate of the true variance. 1.1.6 Proposed Formula for Sample Size Equivalency of Multi-Day Surveys Parsons Brinckerhoff et al. (2014) proposed a formula for a multi-day survey versus its single-day equivalent: ( ) ( )= + + − 1 Eq 1 10S S R D R D N   where • SN is the new (reduced) sample size. • S0 is the sample size for a 1-day survey. • R is the ratio of day-to-day (intra-person) vari- ability s2e to inter-person variability s2ju. • D is the sample length in days. This method is referred to as the Vovsha method after one of the report’s authors. The design effect ( ) + + 1 R D R D is equivalent to the Pas design effect a T T ( )+ −1 1 in Chapter 4. The correlation coeffi- cient a is σ σ + σε , 2 2 2 j u j u and = σ σ ε , 2 2 R j u so that a R= +1 1. After replacing T in the Pas Equation with D, the re- search team obtained 1 + a (D - 1) = 1 + D R − + 1 1 = R D R + + 1 , so that a D D ( )+ −1 1 =  R D R D( ) + + 1 . This equivalency is important because it is a that is reported through- out the results section of this report. In this formula, as R approaches zero, SN will equal S0. Such a situation would apply if there were no variability in the data over time, such as with au- tomobile ownership, where the number of vehicles owned is the same on all travel days. With no intra- person variability, this formula indicates no value to collecting additional travel days. Conversely, as R approaches infinity, the new sample size is the 1-day sample size divided by the number of days (SN = S0/D). In other words, this situation would imply that adding days is equivalent to randomly sampling additional households. surements than in the same number of independent measurements. Cirillo et al. (1998) examined the repeated mea- surements problem in the context of SP data. They pointed out that variance estimation assuming the multiple preferences collected from each sample individual are independent led to an underestima- tion of the true variance. They found that models estimated using such data had unbiased parameter estimates, but that the underestimated variance meant that the t-statistics on those parameters would be too high. Cirillo et al. (1998) reported that both the jack- knife and bootstrap methods would correct the repeated measurements problem and produce un- biased estimates of the true variance (and therefore the true t-statistics). Both jackknife and bootstrap methods use resampling, meaning that a sample of individual records are deleted. This appli cation is not the same as the current application of having multiple days for each sample individual, but the principle is the same. Multiple preferences or mul- tiple days are like a second stage of selection, with households or individuals being the first stage of selection, and resampling techniques that focus on the first stage of selection will correctly estimate the variance as the first-stage units can be viewed as independent, whereas the second-stage units are cor- related within the first-stage units (whether they are multiple stated preferences or multiple travel days). Ortúzar and Willumsen (2001) noted that this repeated measurements problem is commonly ignored in transportation, based on the assumption that the para meter estimates remain unbiased. However, it is not unusual for a jackknife or bootstrap correction to be applied, particularly in SP applications. For example, it is common practice in RAND Europe’s work. Stopher et al. (2008) pointed out that the repeated measurements problem also applies to 1-day travel surveys, because multiple trips made by the same person are not independent, nor is travel by different persons within the same household. This means that sampling errors are routinely underestimated. This may not be a large problem if the interest is in the parameter estimates themselves, but it can lead to faulty conclusions about the significance of variables in the models, and it is problematic when the interest is in the variance itself. To understand the value of multi-day surveys, one must confront this problem head on, and the research

6to 1. This research measures the design effects using a real-world survey, leaving the cost calculations to others. 1.2 project Objectives This research aims to help practitioners make more informed decisions about the design and size of HTSs, particularly those that involve data collec- tion for multiple travel days. Starting from the gaps in existing research, this digest is structured to meet three specific objectives: 1. To evaluate whether or not GPS-enabled multi-day surveys overcome the survey fatigue challenges faced by multi-day diary surveys. The research began with the assumption that GPS-enabled surveys could overcome the fatigue issues observed in past multi-day sur- veys. However, during the research, it became apparent that checking this assumption was an important objective in its own right, as was checking the validity of the GPS-only data collection approach. Therefore, this objective is included here. 2. To investigate the effects of using multi-day household travel surveys for developing travel demand models, and demonstrate a method of correcting for the repeated measurement problem. This objective focused the research specifi- cally on using HTSs to develop travel demand models. A key limitation in using multi-day data is the repeated measurement problem. The objectives here were to (1) illustrate the magnitude of the problem and (2) demon- strate a method (the jackknife) for correcting for it during model development. 3. To provide empirical evidence on (1) the sample size equivalency of single-day and multi-day surveys and (2) the relationship between sample size and travel model esti- mation results and data summaries. Section 1.1.6 describes the Vovsha method for calculating the equivalent sample size of a multi-day household travel survey; Sec- tion 1.1.7 describes a framework for trad- ing off the cost of additional survey days ver- sus the additional information they provide. Central to both approaches is the notion of a design effect, which is a measure of the There are two important points to make about this derivation. First, it is specific to one particular component of travel, so it will be different for a model like car ownership versus destination choice. Vovsha addresses this by considering the relative importance of several model components to achieve a weighted average of the equivalent sample size across all model components. Second, the value of R is central to the calculation and, in the analysis provided, is taken as an assumption. The R values in the Vovsha method are directly related to the design effects and a values at the heart of this study (and discussed further in the method section of this digest). A key contribution of this research will be to derive the design effects empiri- cally, given a real-world data set and a realistic set of travel model components. 1.1.7 Cost-Benefit Analysis for Multi-Day Surveys To understand whether or not the potential cost savings from multi-day surveys are realized, one must consider the value of the additional data versus its cost. Stopher et al. (2008) provide a framework for evaluating variance for multi-day surveys in the GPS era, and Pas (1986) develops an explicit cost model for comparing single-day and multi-day studies (see Appendix A). The result is that the optimal number of survey days is determined by two factors: cost ratio and design effect. The cost ratio is the ratio of the marginal cost of surveying a household for T days to the cost of surveying a household for 1 day. The cost ratio of a 1-day survey is, by definition, 1, and the cost ratio for multi-day surveys will be higher than 1. [Exam- ple values, from Stopher et al. (2008), are presented in Appendix A.] The design effect is a measure of the additional information gained by surveying additional days versus surveying additional households. It is mea- sured as the ratio of the true variance to the variance assuming a simple random sample, with each day’s worth of information assumed to be fully indepen- dent. A design effect of 1 would indicate that the variances are the same and that adding days or add- ing households to the sample would be equivalent. A design effect larger than 1 indicates positive correla- tion between days within households, reducing the information contributed from the extra monitored days beyond the first day. A multi-day survey is ideal in a situation where the cost ratio is low and the design effect is close

7the research tested for biases between the GPS-only sample and a GPS-with-prompted-recall sample. This research acknowledges that the repeated measurement problem may bias the variance esti- mates when multi-day survey data is used for model development. The consequences of this for model estimation are that the t-statistics associated with the parameter estimates may be flawed. To demonstrate how this problem could be corrected when using multi-day survey data, this research uses the jack- knife as an unbiased estimator of the true variance. The research went on to calculate a set of design effects for each of the key travel choices. The design effects were calculated by measuring the day-to- day variation versus the person-to-person variation found in the data. This research provides a means for under standing the additional information provided by sampling additional survey days versus sampling additional households. The design effects could be used to calculate the equivalent sample size of a multi- day survey or the cost-benefit of a multi-day survey. The remainder of this chapter describes this method in further detail. Section 2.1 documents the key travel choices used for this research. Sec- tion 2.2 describes the data set used (the processing of that data is described in detail in Appendix I). Section 2.4 describes the bias checks conducted. Section 2.5 describes the stratification structure and the estimation of the mean values of the travel char- acteristics of interest. The stratification structure is of particular importance in computing variance estimates. Section 2.6 describes the method for esti- mating the variance. Understanding the variance is a critical aspect of this research because the variance is closely related to the effect of the sample size. In the simplest sense, if the variance is high, then a larger sample is needed to estimate the mean within a certain margin of error. The calculation of variance estimates is non-trivial (further detail is provided in Appendix B). Section 2.6 introduces the concept of design effects, which provide a statistical measure of the relative value of adding additional households versus additional days to the survey. 2.1 Key Travel Choices for Empirical Analysis The results of the analyses presented in this digest are specific to the models or data tabulations of interest. To provide information to practitioners making decisions about survey design and sample size, the researchers selected a set of travel choices of importance in developing travel models and additional information gained by surveying additional days versus surveying additional households. This research sought to measure those design effects empirically. Because travel surveys are used for multiple purposes in model development, including for estimat- ing models and creating data summaries, the design is specific to the use. In conducting this analysis, evidence was also obtained show- ing the relationship between sample size and model estimation results and between sample size and data tabulations. A detailed literature review (Appendix A of the Contractor’s Final Report) provides background for the research. Chapter 2 describes the basic attri butes of the data set, the reason for its selection in this study, and the processing conducted to make it usable for this analysis, as well as a basic framework for computing variances and design effects. The research team did a series of bias checks. These checks documented biases which were artifacts of the method and which are reviewed in Chapter 3. These biases have impli- cations for the method used in future travel surveys. Chapter 4 presents the basic results for variance estimates and design effects for tabulations; Chap- ter 5 does the same for the models. Chapter 6 sum- marizes the results. Chapter 7 presents conclusions and recommendations for future research. CHApTER 2 DATA AND METHOD The overall strategy for meeting the project objec- tives was to analyze a specific multi-day household travel survey (i.e., the 2012 Northeast Ohio Regional Travel Study), which includes a GPS-only sample for 3 travel days. The survey was used both to generate data tabulations suitable for travel model calibration and to estimate choice models of key travel choices. These data tabulations and model estimations were repeated for three separate files derived from the full survey: a 1-day file that included only the first travel (week) days’ worth of data for each household, a 2-day file that included data for the first 2 travel days’ worth of data for each household, and a full-sample file that included all travel days. The goal was not to replicate any specific travel model, but to illustrate how the dif- ferent subsets of data might affect the data summaries and model parameter estimates for what might be a typical set of key travel choices. The evaluation of the GPS-enabled survey was completed by testing for biases among these three data files. In addition,

8These travel choices were selected to illustrate the effect of multi-day data on model components expected to have different characteristics and are not necessarily an argument for or against any particu- lar model design. For each travel choice, the basic unit of analysis was the unit that would be modeled in a travel demand model. Automobile ownership was modeled at a household level, tour generation at a person level, and destination choice and mode choice were modeled at a tour level. First, HTS data was considered for calibrating the overall model system such that it matched aggre- gate control totals. This is typically done by adjusting the alternative-specific constants in the implemented models to match control totals. Calibration also serves as a check that the overall system is functioning prop- erly and can be used to identify and correct possible problems. The basis for model calibration comes from calibration targets, which are most commonly cross- tabulations of the weighted and expanded household survey data. Sometimes these cross-tabulations are supplemented by secondary data sources (e.g., on- board transit surveys). The Census Transportation Planning Package (CTPP) and the American Com- munity Survey (ACS) also are valuable resources for generating model calibration targets, particularly when the interest is in journey-to-work flows. For each key travel choice, a set of tabular data was gen- erated from the survey in a form that could be used for model calibration. (Appendix E provides more detailed information.) A second important use of most travel surveys is to estimate the model coefficients for the key travel choices of interest. Often this is done using logit models and maximum likelihood estimation. For each key travel choice, a logit model is estimated from the survey data to provide the basis for understanding the effects of the sample size and number of survey days on the model coefficients. (Appendix G provides more detailed information.) 2.2 Data Set and Data processing Given the objective of evaluating single-day and multi-day travel survey data on model estimation, the research team considered the range of multi- day datasets collected in the United States over the past decade that would be available for analysis. Although many HTSs have been conducted with 1 day of travel diary data collected and 3 to 4 days focused the analysis on how HTS data is expected to be used. The researchers did not seek to analyze every possible use of the data, but focused on uses illustrating the issues at hand. When HTS data is used to develop travel models or analyze travel behavior, it is rarely a single travel choice being considered, but rather a series of linked choices. The choices will depend on the overall model design, particularly whether it is a trip-based model or an activity-based model. The key factor in the analysis is the relative day-to-day versus person- to-person variance. A long-term choice, such as auto- mobile ownership, will rarely change between travel days for the same person or household. Therefore, when it comes to analyzing automobile owner- ship, having 3 days of data is no better than having 1 day of data. In contrast, tour generation for non- workers can be expected to vary between travel days for the same individual, so there could be value in having multiple days of data. In addition, different models typically use different levels and dimensions of segmentation, which can also affect the results. The research focused on four key travel choices: • Automobile ownership – There is low or no day-to-day variation, but the number and spa- tial location of zero-automobile households is a major driver of downstream travel choices. • Tour generation – Separate models are de- veloped for workers and non-workers, with the expectation that non-workers will have more schedule flexibility and a higher level of day-to-day variation. Demographic and socio- economic variables are of particular impor- tance, providing the potential to highlight sample size issues for some segments of the population. • Destination choice – The inclusion of destina- tion provides a way to understand the varia- tion both in distance and in location. Separate models are developed for work tours and for social/recreation tours to highlight the differ- ences across purposes. • Mode choice – The choice of modes is cen- tral to the modeling process in many loca- tions, allow ing issues related to this key travel choice to be highlighted. Separate models are developed for work tours and for social/ recreation tours.

9The GPS-with-prompted-recall data was only fully processed for Day 1 as part of the original sur- vey work. This was not an inherent limitation of the method, but the focus of the 2012 Northeast Ohio Regional Travel Survey was on getting a complete Day 1, and the later days were not a priority in terms of resources. The result is that the GPS data beyond Day 1 are not available to this study. For households in the GPS-only sample, the same GPS method was used, but no prompted-recall sur- vey was completed. Simas Oliveira and Gupta (2013) described the process by which the following infor- mation was imputed for the GPS-only observations: address, shared travel, travel mode, tolling, parking activity, and trip purpose. A full set of imputed travel information is available for all survey days. The travel log-only households were for house- holds where all members were older than age 75. Data for this group was collected via a travel log, and they were not eligible to use GPS devices. As described in Chapters 2 and 3, a component of this research was to check for biases between the prompted-recall and GPS-only samples and between different travel days within the GPS-only sample. The research began from the final survey data set, as described by Wolf et al. (2013). (The data was further processed as described in Appendix I to put informa- tion in the form necessary for model development.) The estimated models were based on tours, rather than trips. It is reasonable to expect that the findings of this digest would apply to models developed using a trip-based structure as well, although the specifi- cation of the models would be somewhat different. Tours are preferred as the basic unit of analysis in these models because they are thought to be more robust to GPS imputation errors. Whereas defining whether or not a geographic pause represents a trip end or traffic congestion can be a possible source of of GPS person-based travel (Westat staff have conducted at least ten such surveys over the past decade), very few surveys have been conducted that have trip purpose and other key travel attributes for days after the first travel day. This requirement limited the candidates to three surveys: a multi-day diary-based travel survey conducted in Chicago in 2007–2008, a 2009 all-GPS survey in Cincinnati, and a 2012 GPS-enabled household travel survey in Cleveland. The 2012 Cleveland survey was selected because (1) travel data was collected for 3 to 4 consecutive days, allowing for multi-day analysis beyond Day 2; (2) the survey featured a unique design in which separate GPS-only versus GPS-with-prompted-recall samples were collected, allowing for a comparison of these approaches; and (3) the survey included a larger sample of households than the Cincinnati GPS travel survey. The 2012 Northeast Ohio data set had a total sample size H = 4,540 completed households. These were drawn from five counties in Northeast Ohio (i.e., Cuyahoga, Geauga, Lake, Lorain, and Medina), with an aggregate population of roughly 850,000 house- holds all together (Wilhelm et al. 2013). The final sampling fraction (after nonresponse attrition) was roughly 1 in 200 households. The sample was broken into three distinct partitions based on the method of data retrieval, as shown in Table 2-1. For households in the prompted-recall sample, participants were asked to use a GPS device to track their travel for 3 or 4 days, depending on the day of the week. Participants who began the survey on a Friday used the devices for 4 days to capture at least 2 weekdays of travel. A prompted-recall survey was then used where households were asked to confirm trips made on Day 1 and to collect information such as trip purpose and mode. Table 2-1 Data retrieval partitions. GPS flag Days of complete travel information Households Percent of households Prompted-Recall Participants 1 1,312 28.9 GPS-Only Participants - No Retrieval Interview 3–4 2,775 61.1 Travel Logs Only - Not GPS Eligible - No Retrieval Interview 1 453 10.0 Total 4,540 100.0

10 If both samples provided an equal measure of the underlying travel behavior, the means of the two sam- ples (for the selected travel characteristics) should be the same. Second was a check comparing the 2-day and 3-day file mean values with the 1-day file mean values. This second test determined whether or not survey fatigue could be identified, in which case par- ticipants were less likely to report their travel or carry their GPS devices on subsequent survey days. The null hypothesis for both was that there was no bias. 2.5 Stratification Structure and Estimates of Mean values This section documents how the mean values were calculated for the survey tabulations. A weighted mean was used to address the fact that the survey was based on a stratified sample, rather than a single uniform sample of the population. For model estima- tion, a uniform weight of 1 was used for conduct- ing unweighted choice model estimation, which is preferred. The stratification structure is integral to the mod- eling and the variance. This stratification structure for the 2012 Northeast Ohio data set was based on the following dimensions (Wilhelm et al. 2013, Section 3.2): • County C Cuyahoga C Geauga C Lake C  Lorain C Medina • ABS/Landline Strata C Matched Address Based Sample (ABS) C Unmatched ABS C Listed Random Digit Dial (RDD) b General non-oversampled b Target large household (4+persons) b Target 1-person household with low income (<$25,000 per year) b Other 1-person household b Target high probability zero-vehicle b Transit area oversample The landline strata were based on Census tracts with larger percentages of these target populations. The strata are subscripted as s = 1, . . . , S. The sample size of completed households within each stratum is ns, s = 1, . . . , S. The research team subscript sampled households within each stratum as error, identifying a loop that starts and ends at home from a GPS trace should be straightforward. 2.3 1-Day, 2-Day, and full-Sample files The analysis of single-day versus multi-day vari- ances and design effects focused specifically on the GPS-only sample, because that is the sample that col- lected multiple days of travel information. Either 3 or 4 days were collected for each completed household, depending on the randomly selected first day. If the first day was Monday through Thursday, then 3 days were taken. If the first day was Friday, then 4 days were taken. For this study, the research team was interested in weekday travel, so only weekday travel days were retained. The total number of weekdays observed was equal to 2 or 3, depending on the day of week. If the first day was Monday through Wednesday, then 3 weekdays were retained. If the first day was Thurs- day or Friday, only 2 weekdays were retained. The research team created three files from the full file. Each of these files had the full set of house- holds. The first file (the “single-day file”) consisted of the first monitored weekday for each household.1 The second file (the “2-day file”) consisted of the first 2 monitored weekdays for each household. The third file (the “full file”) consisted of all monitored weekdays (up to 3). With these three experimental data sets, the research team proceeded to compute percentage estimates, mean estimates, regression parameter estimates, and jackknife variance estimates of all of these. This segmentation into three separate files did two things—it allowed the researchers to calculate the design effects, which measure the information provided by the additional travel days, and to report both the data summaries and model estimation results for each of the three files. This provided a way to see how much the three sample sizes affected those results. 2.4 Bias Checks The data was checked for bias across two dimen- sions. First was whether there was bias between the GPS-only and GPS-with-prompted-recall samples. 1The research team checked both the first and second day for completeness. If the second day was more complete, the research team designated that to generate the single-day file.

11 estimation in the 2009 National Household Travel Survey [see 2009 NHTS User’s Guide (2011)]. The jackknife variance estimators of these mean values are written as vJ(y_(d)). The jackknife variance estimator provides consistent estimators of the sam- pling variance (the variability of the estimators over the full sampling distribution comprising all pos- sible samples).2 The great advantage of the jack- knife variance estimator is that the variances are only depen dent on the sample design and are not depen- dent on the assumptions from a particular model and thus are robust against model misspecification. In particular, the jackknife variance estimator will pick up correctly effects of within-household across- day correlations, as well as effects from heterosce- dasticity, without explicit specification of particular variance models. The former quality is of particu- lar importance in this analysis, as within-household across-day correlations are at the heart of design- ing multi-day studies. Another advantage is that the jackknife variance estimator will correctly estimate approximate variances for nonlinear functions of the basic sample means, variances, and covariances, such as regression coefficients and more complex model parameters. This is also important given the complexity of some of the travel models being estimated. Among the many references describing the properties of the jackknife are Rust and Rao (1996), Valliant et al. (2013) Section 15.4.1, and Shao and Tu (1995). The basic approach is that the model estimation (or the calculation of mean values) is repeated mul- tiple times, each time using a different set of weights that excludes a subset of records. The jackknife vari- ance estimate is then calculated as a weighted com- bination of each individual estimation. sh, s = 1, . . . , S, h = 1, . . . , ns. The sample weight for each household is wsh. This includes the base weight (the reciprocal of the probability of selection of the households) and all adjustments for nonresponse as done for the 2012 Northeast Ohio study. yshd is the y-characteristic value for stratum s, household h, day d. The estimates of y-characteristic means from the single-day file is y w y w sh shh n s S shh n s S s s ∑∑ ∑∑= ( ) == == 1 111 11 The estimates from the 2-day file are  y w y y w sh sh shh n s S shh n s S s s ∑∑ ∑∑ ( ) = +( ) == == 2 2 1 211 11 The estimates from the full file are  y w y w D sh shdd D h n s S sh shh n s S shs s ∑∑∑ ∑∑= ( ) === == 3 111 11 The superscript 3 refers to the full file, which has households with 2 or 3 days. Dsh is the total num- ber of days monitored and retained for household sh (either 2 or 3). 2.6 Jackknife Estimates of variance Any time there is more than one observation from the same person or household, such as multiple travel days, it is reasonable to expect some intra-person cor- relation. In such a situation, the mean values and para- meter estimates should be unbiased, but the variances are not. This can lead to flawed t-statistics or tests of statistical significance. The jackknife was used in this research so as to obtain unbiased estimates of variance and avoid these problems. The jackknife works by deleting groups of sam- pled households. Each initial replicate weight cor- responds to dropping a set of households. For this replicate, the weight of these household is zero, and the remaining sampled households in the variance stratum have their weights for that initial replicate weight increased by the factor ms/(ms - 1), where ms is the number of replicate groups in sampling stra- tum s. Wolter (2007), Chapter 4, and Valliant et al. (2013) are basic references for the jackknife vari- ance estimator. The jackknife was used for variance 2The sample design for the 2012 Northeast Ohio Regional Travel Survey is based on the stratification structure as described above and includes a pseudo-sampling phase to represent non- response attrition. The final set of respondents is assumed to be a simple random sample from the sampled households, and the final set of respondents then is posited to be a simple random sample of the respondents from the population set of households in the stratum with a fixed probability of inclusion. It implicitly assumes that probabilities of response for sample respondents do not differ within the sampling strata. These are standard assump- tions in survey sampling in the presence of nonresponse. This framework also implicitly ignores the quota sampling aspect of the travel survey (households were drawn and added to meet the desired sample numbers for each stratum).

12 In terms of design effects, y_(2) has twice as many days, so one would expect the variance to be half as much if each extra day per household is providing as much new information as the first day. Thus the design effects are defined to compare the variance of y_(2) to half that of y_(1), and similarly for y_(3): ( ) ( )( ) ( ) ( )( ) = = ( ) ( ) ( ) ( ) ( ) ( ) + 2 2 2 1 3 3 1 3 deff y v y v y deff y v y v y T J J J J T 3+ is the mean number of monitored week- days per sampled household for the full file w D w sh shh n s S shh n s S s s ∑∑ ∑∑    == == equal to .11 11 As there are 3 ran- domly selected travel days with T = 3 (Monday, Tuesday, and Wednesday as the start days) and 2 randomly selected travel days with T = 2 (Thurs- day and Friday as the start days), one usually sets T 3+ = 2.6. Following Pas (1986), one can decompose the variance ratios and design effects as follows, with T equal to 2 for the 2-day file and equal to T 3+ = 2.6 for the full file: ( ) ( ) ( ) ( ) = + − = + − ( ) ( ) 1 1 1 1 vr y a T T deff y a T T T   Suppose the design was simple random sampling under a simple multivariate normal (MVN) popula- tion model as follows: = + µ + ε µ ε     =     σ σ      ε , fixed constant, MVN 0 0 , 0 0 2 2 y A Ashd s sh shd s sh shd u In the case above, a in the deff formula would be the correlation coefficient r = s2u/(s2u + s2e). In terms of the research, the design and estimator are more complicated: y_(1) itself has design effects from household-level clustering and weighting effects (Appendix B describes the proposed jackknife variance estimator for the 2012 Northeast Ohio Regional Travel Survey. Appendix F provides fur- ther technical details on how the jackknife is cal- culated for model estimation, as opposed to when calculating mean values.) 2.7 Design Effects The design effect is used to calculate the value of increasing the number of survey days versus the number of households surveyed. The design effect is the ratio of the true variance (of which the jack- knife is a consistent estimator) to the variance assum- ing a simple random sample, with sample size the total number of days monitored across the sampled households (i.e., each day from each household is assumed to be independent of the other days and to contribute a full-sample unit’s worth of informa- tion). A design effect of 1 would indicate that the variances are the same and that adding days or add- ing households to the sample would be equivalent (in other words, doubling the number of days per house- hold would be the same as doubling the number of households). A design effect larger than 1 indicates positive correlation between days within households, reducing the information contributed from the extra monitored days beyond the first day. The best variance estimator for the proposed base- line simple random sample design is vJ(y_ (1)) itself. This jackknife variance estimator over the data set including only 1 day taken for each household is the best baseline, as it includes all of the design effects for household sampling, differential weights, and stratification automatically. There is no need for dis- aggregation of the variance components. This is a powerful advantage of the jackknife in this situation. The only portion of the design effect that is excluded is that for multiple days, as only 1 day is taken for each household. It is an ideal baseline variance for computing design effects for multiple days. The multiple-day variance ratios for 2-day and full files, as compared to the 1-day file, are computed as ( ) ( )( ) ( ) ( )( ) = = ( ) ( ) ( ) ( ) ( ) ( ) 2 2 1 3 3 1 vr y v y v y vr y v y v y J J J J

13 mation (e.g., trip purpose and mode) by asking the respondents, the equivalent information was imputed for the GPS-only observations. Although including the full sample allows for a much bigger sample size, it is not clear whether the imputation process affects the model estimates. To test this, the research team checked for significant differences in the estimates between the GPS-only and GPS-with-prompted- recall households. The research team’s comparisons of the GPS- with-prompted-recall data and the GPS-only data was done only for Day 1 data for both data sets, because only Day 1 data was available in both groups. The research team compared the total number of trips and the number of trips by trip purpose. Trip purpose was one of the key values that differed between the GPS- with-prompted-recall and GPS-only data, as in the for- mer case it was obtained by interview and in the latter case it was obtained by imputation. (Appendix C has the tables of full results of these comparisons.) These results are summarized as follows: • There was a limited and marginally significant difference in trips per person between the two GPS strata. There were 4.03 mean trips per person per day for the GPS-only data, and 4.23 mean trips per person per day for the GPS-with-prompted-recall data; • There were significant differences in trips per person between the two GPS strata for most of the trip-purpose domains. In some cases (home-based shopping, home-based work, home-based school), the GPS-with- prompted-recall stratum had the greater mean. In some cases (non-home-based, home-based social/recreation), the GPS-only stratum had the greater mean. Similar results were obtained when the data was analyzed as the average number of tours per per- son, segmented by person type. The total number of tours per person was lower, to varying degrees, in the GPS-only sample stratum, and the number of tours by purpose was different. Households not composed of persons aged 75+ were allocated to the GPS-with-prompted-recall and the GPS-only subgroups by a random selection pro- cess. With this careful random selection, there is no reason why there should be any difference in real- ity between the GPS-with-prompted-recall and GPS- only households, or the household groups assigned a (effects on variance of the differential weights wsh), which y_(2) and y_(3) also share. In terms of the research then, a is interpreted as something analogous to, but not completely equivalent to, a bivariate normal cor- relation coefficient.3 This estimated within-person a value is reported on a percentage basis in the results section of this digest. It is a measure of how consistent person-days are within persons as measured by the jackknife vari- ance estimates. The interpretation of positive values is that if a behavior happens on 1 day, it is more likely to happen on the next (e.g., as is true of full- time workers going to work). The interpretation of negative values is that if a behavior happens on 1 day, it is less likely to happen on the next (e.g., shopping tours, because many people may be able to fulfil their weekly shopping needs with a trip on a single day). A +100% a value would indicate a very high posi- tive correlation across travel days, indicating that the extra days provide no new information (e.g., as would be expected from automobile ownership). A 0% a value would indicate no correlation across travel days, such that sampling additional days adds as much value as sampling additional households. CHApTER 3 BIAS CHECKS 3.1 GpS-With-prompted-Recall vs. GpS-Only Data One very important partition is for GPS-only observations vs. observations that supplement the GPS outcomes with recall interviews (GPS-with- prompted-recall observations). The sample includes 2,775 GPS-only households and 1,312 GPS-with- prompted-recall households for a total of 4,087 households with GPS tracking. There were also 453 Log-only households where a travel log was taken with no GPS tracking. These Log-only households were households with all persons aged 75 and older. Whereas the GPS-with-prompted-recall observa- tions were able to collect important travel infor- 3Kish (1965) in Section 5.4 defines a quantity “a” which stands for “rate of homogeneity.” This represents the effects of within- cluster homogeneity on the design effect in single-stage cluster sampling as per the formula deff = 1 + a (B - 1), where B is a within-cluster sample size, deff is the computed design effect, and a becomes a synthetic quantity which represents the effect of cluster homogeneity on the sampling variance.

14 bias checks from picking up this effect, at least in the full data set. The research team compared the following travel characteristics across collection days: • Mean trips per person • Mean trips per person by trip purpose • Mean trip length and trip duration by person- trip (and also by trip-purpose domain) • Mean percentage by trip mode by automobile sufficiency • Average number of tours per person by tour type by person type There were differences between Day 1 and Days 2 and 3. There was considerably less difference between Days 2 and 3. There seems to be a drop-off in data col- lection between Day 1 and the other collection days. The greatest differences between Day 1 and Days 2 and 3 were in mean trips per person, both overall and separately by trip purpose. There were more trips in Day 1 than for Days 2 and 3. This shouldn’t be the case in reality. It may be that the GPS participant protocol was not followed as diligently in the out- days as for the first day. There are “missing trips” for out-days Day 2 and Day 3. (See Appendix D for complete details.) The next question is whether the “missing trips” in Days 2 and 3 are “missing at random” or not. “Missing at random” means that the missing trips have the same distribution with regard to their char- acteristics as those not missing. The mechanism leading to the missing trips is, from the standpoint of the types of trips, entirely random. The loss of trips is not causing any bias, but simply a loss in precision. The research team’s evaluation of the evidence collected for the research is that this “missing at random” assumption cannot be made in this case. Mean trip length is not significantly different across days, but mean trip duration is significantly differ- ent. If trips are allocated across modes (e.g., drive- alone, shared ride, and walk) in percentage terms (not totals), the differences between Day 1 and Days 2 and 3 is much less, but there are significant differences. The research team’s goal in data col- lection and processing was to have no differences of this kind. The last comparison was of average number of tours per person across tour type for differing per- son types. Again there were significant differences in tour mean values across days, and these varied in a complicated way among tour types for the various first collected travel day. The randomized selection guarantees the household sets are equivalent. Thus any significant difference between the subgroups in either case has to be an artifact of aspects of data col- lection and/or of data processing. 3.2 Comparison Across Collection Days An important advantage to using GPS loggers as compared to the old diary studies is that data collection across multiple days should be more feasible. The later collection days (beyond the first day) in many of the old diary studies showed a very strong drop-off in reporting (see Appendix A-2). A major goal of using GPS loggers is to eliminate this bias between collection days. However, GPS loggers are also not immune from biases, given that they depend on proper participant protocol adherence (i.e., they need to be carried along and recharged daily). To study significant differences across collection days (Days 1, 2, and 3), the research team studied two data sets: the full data set and a Monday-Tuesday- Wednesday (MTW) data set. The MTW data set con- tains only records where the data collection starts on a Monday, Tuesday, or Wednesday (although it does include records where the second or third travel day is on a Thursday or Friday). In both cases, the sole data set was the GPS-only data set (the GPS-with- prompted-recall data set only had fully processed data for Day 1). For the full data set, the research team compared Day 1 and Day 2 (the first day of data collection and the second weekday of data col- lection). The third day was left out because there was no third weekday of data collection for those house- holds with Thursday and Friday as the first days of data collection. For the MTW data set, the research team compared Day 1, Day 2, and Day 3 (the first, second, and third weekdays of data collection). The two data sets gave similar results. The full data set had a larger household sample size and allowed for greater power, but only Day 1 and Day 2 were com- pared. The MTW data set had fewer households, but 3 days could be compared (Day 1, Day 2, and Day 3). Some systematic differences in the underlying behavior should be expected between days of the week (e.g., Monday and Tuesday) and would not necessarily be biased. For example, people may be more likely to participate in social activities on a Friday after work than earlier in the week. That the start day was randomly assigned should prevent the

15 4.1 Automobiles Owned by County The number of automobiles owned for each household did not differ across day at all—this question was only asked in the household interview. The estimates for totals and percentages are given in Appendix E-1. For this part of the research, all 4,540 households in the 2012 Northeast Ohio Regional Travel Study were included, because there was no issue of GPS collection. The tables for automobile ownership are given in Appendix E-1. The jackknife standard errors are roughly aligned with variances based on the number of sampled households and the effects of differential sample weights and the sample stratification. 4.2 Average Trip length and Trip Duration by Trip purpose The second type of analysis is of the jack- knife standard errors for average trip length and trip duration, based on the three sets of files (i.e., 1-day, 2-day, full-file). These are done separately by trip-purpose domain (e.g., home-based school, home-based work). The jackknife standard errors were sometimes unstable, so the research team selected only four domains for trip distance and three domains for trip duration to present results. The research team estimated degrees of freedom to judge the stability of the jackknife standard errors and only presented results where each of the three sets of files had at least 30 estimated degrees of freedom (where the estimated degrees of freedom were less, the clear instability in the standard errors precluded analysis). Table 4-1 summarizes the detailed results given in Appendix E-2. The computation of the estimated person types. None of this should be the case in real- ity, so the goal in data collection and processing was not to have this difference. CHApTER 4 RESulTS fOR TABulAR DATA SuMMARIES The y-variable characteristics for which the research team computed weighted means, jackknife variance estimates, and design effects as given in Chapter 1 are • Number of households by automobiles owned (0, 1, 2, 3+), by county4 • Average trip length and trip duration per trip, by trip-purpose domain • Tour generation—frequency of tours by pur- pose (work, shopping, social/recreation, other) made by each person type (workers, students, non-workers) • Destination choice—county-to-county flows • Mode choice—trips by mode and automobile ownership, for selected purposes The research team computed the jackknife vari- ance estimates for these characteristics for the three data sets and compared the results. The empirical design effect factors were used to estimate a values for the 2-day and full files. The research team would expect a values equal to 1 for the automobile owner- ship values, because the new days do not provide extra information. The research team would expect a values closer to 0 for non-work tour generation and mode choice, because the extra days would tend to differ across households in these cases. If the Pas (1986, 1987) framework fits the data, the a values should be the same for the 2-day and full file, although the design effects would differ. Except for the automobile ownership analysis, the data set used was the GPS-only data only. The GPS-recall data was only fully processed for the first day. Only the GPS-only data was processed for all 4 days. The research team compared the GPS-only and GPS-recall data for Day 1 (these analyses are described in Section 3.1 and Appendix C). 4The research team didn’t expect much difference among the 1-day, 2-day, and full-file estimates here. The a factors for the 2-day and full-file estimates should be close to 1. This was a good “extreme” in one direction and a benchmark for other analyses. Table 4-1 Estimated within-person a values for seven trip-purpose/variable pairs. Trip-purpose domain Variable Estimated within-person a Home-based school Trip distance 100.0% Home-based shopping Trip distance 42.9% Home-based work Trip distance 66.6% Non-home-based work Trip distance 51.8% Home-based school Trip duration 48.5% Home-based work Trip duration 51.3% Non-home-based work Trip duration 42.6%

16 4.3 Average Tours by Tour purpose (by person Type) The third type of analysis is of the jackknife standard errors for average number of tours per per- son per day. The evaluated tour types are • Total tours per person per day • Work tours per person per day • School tours per person per day • University tours per person per day • Shopping tours per person per day • Social/recreational tours per person per day • Work-based subtours per person per day These mean tours per person per day are done separately for the following person types: • Full-time workers • Part-time workers • University students • Non-workers • Retirees • Driving-age children The jackknife standard errors were sometimes unstable, so the research team presented results where each of the three sets of files had at least 30 esti- mated degrees of freedom. In cases where the esti- mated degrees of freedom were less, there was a clear instability in the standard errors that would preclude analysis. Tables 4-2 and 4-3 summarize the research team’s detailed results as given in Appendix E-3. Included are the estimated a and the smallest degrees of free- dom for the three variance estimates that go into the estimated a. Table 4-2 presents the estimates for total tours (where there were enough degrees of within-person a (rate of homogeneity) is docu- mented in Appendix E-2, but in brief it is a measure of how consistent person-days are within persons as measured by the jackknife variance estimates. For example, the 100% a values for trip distance for the home-based school trip-purpose domain are con- sistent with the jackknife variances being slightly larger for the 2-day and the 3-day file than the 1-day file, indicating that in terms of variance, the extra days provided no new information (roughly stated). All of the within-person a’s are fairly high (42.6% to 100.0%). This is consistent with a consistency in travel behavior across travel days within persons. The a’s for trip distance are higher than those for trip duration for the three common trip-purpose domains (i.e., home-based school, home-based work, and non-home-based work), indicating possibly a greater consistency of trip distance as compared to trip duration. The a for home-based work trips is only 66.6%. Assuming that someone works in the same place every day, it is to be expected that this would be close to 100%. Some people work at more than one job and thus would have different values day to day. Others (e.g., construction workers and realtors) work in different places, so there are legitimate rea- sons why the value might be lower. However, GPS imputation error could contribute to lower-than- expected within-person correlations. In the imputa- tion process, a trip could be coded one way on Day 1, but a different way on Day 2, thereby arti ficially increasing the within-person variance and reduc- ing the within-person a. This is a limitation of the data that cannot be detected here, but it suggests that the true values, assuming perfect data collection, might be higher. Table 4-2 Estimated within-person a values for mean tours per person per day for total tours and by person type with variance estimators with at least 30 degrees of freedom for each file. Person type Tour type Estimated within-person a Minimum degrees of freedom Full-time Workers Total Tours 31.1% 142 Part-time Workers Total Tours 12.4% 35 Non-Workers Total Tours 17.4% 72 Retirees Total Tours 6.4% 83 Driving-Age Children Total Tours 20.8% 65

17 4.4 County-to-County Trip percentages The fourth type of analysis is of the jackknife standard errors for percentages of trips by start-county/ end-county pair, based on the three sets of files (i.e., 1-day, 2-day, and full). There are five counties in the Northeast Ohio Region (Cuyahoga, which contains the city of Cleveland, Geauga, Lake, Lorain, and Medina), and there is an “unknown” county which includes points generally outside of the Northeast Ohio Region. This results in 36 possible pairs, but about 60% of the trips are for Cuyahoga to Cuyahoga and about 11% are for Lorain to Lorain. Many of the percentages are small. These tabulations are done separately by trip- purpose domain (e.g., home-based school and home- based work). The jackknife standard errors were sometimes unstable, so the research team selected only four domains for trip distance and three domains for trip duration to pre sent results. The research team estimated degrees of freedom to judge the stability of the jackknife standard errors and only presented results where each of the three sets of files had at least 30 estimated degrees of freedom. Where the estimated degrees of freedom were less, the clear freedom); Table 4-3 presents the estimates for other tour types (again where there were enough degrees of freedom). The results for total tours show a range of rate of homogeneity values: 6% to 31%. The within- person correlations are highest for full-time workers, because daily work can be expected to impose con- sistency on their travel patterns. The results for tour means by tour type range from -25% to 40%. The 40% result for univer- sity students may reflect too-small sample sizes. For the person types with larger sample sizes and degrees of freedom, the range of a values is more limited (small negative to 30% positive). The inter- pretation of positive values is that if that type of tour happens on one day, it is more likely to happen on the next (e.g., full-time workers going to work and making work-based subtours). The interpreta- tion of negative values is that if that type of tour happens on one day, it is less likely to happen on the next. The negative values on shopping tours are logical in this context, because many people can fulfil their weekly shopping needs with a trip on a single day. Table 4-3 Estimated within-person a values for mean tours per person per day by tour type and person type with jackknife variance estimators with at least 30 degrees of freedom for each file. Person type Tour type Estimated within-person a Minimum degrees of freedom Full-time Workers Work 28.1% 154 Full-time Workers Shopping -13.4% 33 Full-time Workers Other Home-based -4.6% 140 Full-time Workers Work-based Subtour 22.1% 77 Part-time Workers Work -5.7% 78 Part-time Workers Shopping -8.0% 61 Part-time Workers Social/recreational 14.8% 37 Part-time Workers Other Home-based 8.2% 57 University Students Social/recreational 39.6% 35 University Students Other Home-based -22.8% 30 Non-Workers Shopping 26.0% 49 Non-Workers Other Home-based -2.4% 148 Retirees Shopping -11.9% 125 Retirees Social/recreational 36.1% 41 Retirees Other Home-based -24.5% 140 Driving-age Children School -3.9% 42

18 here. Table 4-6 shows the number of observed trips by mode, for the GPS-with-recall segment and each day of the GPS-only segment. Table 4-7 shows the weighted mode shares with the same break-outs. A few observations follow. First, there are few observations beyond the first four rows. These tables are not segmented by trip purpose or automobile suf- ficiency, so with those segmentations added, the data would be even thinner. In itself, this addresses one important issue: the sample size—even with the full 3-day sample (and potentially with the GPS-with- recall sample included)—is insufficient to provide a trustworthy observation of the mode shares in the Cleveland region. This is not unusual for household travel surveys, especially in a region with low tran- sit mode shares, but it illustrates the importance of collecting an onboard transit survey if understanding instability in the standard errors precluded analysis. Table 4-4 summarizes the detailed results as given in Appendix E-4. Included are the estimated a and the smallest degrees of freedom for the three vari- ance estimates that go into the estimated a. Within-person rates of homogeneity (across days) range from negative values to 100%. Despite the noise in the variance estimates, this range indicates that travel behavior can be consistent across days (in its estimates) and sometimes inconsistent (close to independence or beyond). All possible values can be seen in one travel survey and in one measured characteristic (i.e., percentage of trips going near or far). 4.5 Mode Choice by Automobile Sufficiency The results for mode choice by automobile suf- ficiency are given in Appendix E-5. The sample size is insufficient to obtain reasonable results for most segments, but for completeness, the research team retained the results for the calculations for house- holds with at least as many automobiles as workers (where the degrees of freedom exceeded 30); these are given in Table 4-5. The results in Table 4-5 should be treated with skepticism, given the low number of observations for many modes and potential bias in the GPS-imputation of modes. To illustrate these issues, two tables from Appendix E-5 and related discussion are repeated Table 4-4 Estimated within-person a values for county-by-county pairs with jackknife variance estimators with at least 30 degrees of freedom for each file. Start-county to end-county Estimated within-person a Minimum degrees of freedom Cuyahoga to Cuyahoga 61.4% 309 Geauga to Geauga 47.0% 141 Lake to Lake 95.1% 34 Lorain to Lorain 100.0% 52 Medina to Medina 69.6% 56 Cuyahoga to Geauga 67.5% 49 Cuyahoga to Lorain 37.6% 68 Geauga to Cuyahoga 76.1% 45 Lorain to Cuyahoga 38.6% 64 Cuyahoga to Unknown 90.3% 30 Geauga to Unknown -8.2% 47 Unknown to Geauga -21.2% 69 Unknown to Lake 37.7% 33 Table 4-5 Estimated within-person a values for mode choice percentages for households with at least as many automobiles as workers, with jackknife variance estimators with at least 30 degrees of freedom for each file. Household group Mode Estimated within-person a Automobiles ≥ Workers Drive-Alone -2.0% Automobiles ≥ Workers Walk -11.9%

19 CHApTER 5 RESulTS fOR MODEl ESTIMATION Chapter 2 introduced the framework used to measure the effect of using the 1-, 2-, or 3-day sam- ple in a statistically robust way. This was done for the case where the measures of interest were mean values from a survey, as might be used to summarize travel in a region or as targets against which to calibrate a travel model. In this chapter, the research team exam- ines the more complex case of estimating travel mod- els. The practical effect of trying to estimate travel models with a sample too small is that some variables that the analyst expects to be important show up with a high standard error on the esti mated parameter or as statistically insignificant. Appen dix F provides transit demand and ridership markets is a planning priority. Second, the mode shares are very different across the samples. The GPS-with-recall has a drive-alone mode share of 45% across all purposes, compared to Day 1 of the GPS-only sample which has a drive- alone mode share of 79%. Days 2 and 3 are even higher—over 90%. This is similar to the findings of Chapter 2 (where the research team found a bias related to the GPS-with-recall and Day 1 of the GPS-only sample) and of Chapter 3 (where the research team found a bias related to the first and subsequent days within the GPS-only sample). It is likely that a limitation of the GPS mode imputation process means it does not pick up non-drive-alone modes very well. Table 4-6 Number of trip observations by mode for each sample type and day number. Mode GPS-with-recall GPS-only Day 1 Day 1 Day 2 Day 3/4 Drive-Alone 6,503 21,864 13,431 12,475 Shared Ride 2 3,141 717 213 269 Shared Ride 3+ 1,706 295 118 12 Walk 1,445 3,713 852 615 Bike 85 5 0 0 Local Bus 270 69 14 11 Express Bus 12 0 0 0 Rail 38 8 4 2 Other 337 9 1 0 Total 13,537 26,680 14,633 13,384 Table 4-7 Weighted mode shares for each sample type and day number. Mode GPS-with-recall GPS-only Day 1 Day 1 Day 2 Day 3/4 Drive-Alone 45.10% 79.30% 90.60% 93.20% Shared Ride 2 22.40% 4.10% 1.60% 2.30% Shared Ride 3+ 16.50% 1.40% 1.90% 0.10% Walk 9.80% 15.00% 5.70% 4.40% Bike 0.60% 0.00% 0.00% 0.00% Local Bus 1.60% 0.20% 0.20% 0.10% Express Bus 0.10% 0.00% 0.00% 0.00% Rail 0.30% 0.00% 0.00% 0.00% Other 3.50% 0.00% 0.00% 0.00% Total 100.00% 100.00% 100.00% 100.00%

20 about their automobile ownership. The results of the model estimation are given in Appendix G-1. The jackknife variance estimates closely follow the variances coming from the ALOGIT program, which assumes a weighted simple random sample from a superpopulation model. The results indicate that the sample design with its stratification structure does not contribute that much to variance for this model esti mation (i.e., they do not change much what one might expect from simple random sampling). 5.2 Tour Generation Models The tour generation models predict the number of tours and purpose of tours that a person partici- pates in on the travel day. The models are applied separately for each person type, with the person type restricting which types of tours the person can par- ticipate in. For example, only workers are allowed to make work tours, and only university students are allowed to make university tours. Two types of tour generation models are analyzed here: one for workers and one for non-workers. It was expected that non- workers would see greater variability across days as compared to workers, because workers should have an enforced consistency from the constancy of the work destination. Non-workers, on the other hand, could be very different day to day. Appendix G-2 provides detailed results on tour generation for non-workers and Appendix G-3 on worker tour generation. The jackknife standard er- rors were relatively stable (higher degrees of free- dom), allowing a more reliable analysis, although there was still considerable measured variability in the jackknife standard errors. The a calculations were carried through for both models, and an eigenvalue- eigenvector analysis on the variance ratio matrices was carried through. Table 5-1 summarizes the distribution of variance ratios over the parameters in the parameter vector for the non-work tour generation models as a broad summary of the variance comparison between the 1-day file (the benchmark), the 2-day file, and the full file. The range of ratios is 45.3% to 79.7% for the 2-day file variance to the 1-day file variance, with a median of 60.3%. This indicates a median 40% drop in variance, which translates to a rate of homo- geneity of about 20%. For the full-file compared to the 1-day file, the range of ratios is 39.7% to 76.2% with a median of 47.9%. This is a much larger drop a theoretical basis for this work; Appendix G pro- vides the detailed results for each model. In the sub- sections of this chapter, overviews are provided of the lessons learned from the research as documented in Appendix G. The research team evaluated the following key travel choices in detail. The key travel choices include • Automobile ownership – Low or no day-to- day variation, but the number and spatial loca- tions of zero-automobile households is a major driver of downstream travel choices. • Tour generation – It is expected that non- workers will have a high level of day-to-day variation, and workers a more modest level of variation. Demographic and socio-economic variables are of particular importance, provid- ing the potential to highlight sample size issues for some segments of the population. • Mode choice – The choice of modes is central to the modeling process in many locations, allowing issues related to this key travel choice to be highlighted. • Destination choice – Destination choice in- corporates measures both of the attractiveness of locations and of the impedance to travel to those locations, in that way incorporating a measure of trip length. For purposes like work and school, the destination is likely the same every day, but for other purposes there may be a greater level of variation between days for the same individuals. Although it is not possible to select a single “typ- ical” model, the forms of these models are selected to be broadly relevant to a number of model types. In each case, the analysis was limited to selected trip purposes. Their expected forms are described below. For each travel choice, the research team com- puted the estimated model coefficient vector for the 1-day, 2-day and full-sample files. The research team also calculated the jackknife variance matrices and variance ratio matrices for each file. 5.1 Automobile Ownership Models Automobile ownership is the one model in which the y-variable did not vary over days. It is theoreti- cally possible that automobile ownership can vary over days, but the form of the questionnaire pre- cluded this, as the household was asked only once

21 as added information. The rates of homogeneity are higher for workers, which is expected, but the dif- ference is not that great. One might initially expect non-workers to be entirely independent across days (0% a) with the workers closer to entirely correlated across days (100%), but the real difference is not that great (15–20% for non-work tours, 27–30% for work tours). The research team suspects that this is because the models predict the number of tours by purpose for each day. Even if a worker goes to work 2 days in a row, the additional non-work tours that they make may vary. Also, the category includes people who are part-time workers or otherwise may have irregular work schedules. For non-workers, there appears to be some level of consistency for the people who travel more, versus those who travel less. For example, a retired person may not need to travel on a particular day, while a non-working par- ent may frequently shuttle children between activi- ties. The range of ratios is also larger for the worker tour generation models as compared to the non- worker tour generation models. The reason for this is not apparent. in variance—about 52%—which translates to a rate of homogeneity of about 15%. There is correlation within persons, but not a large amount. Table 5-2 summarizes the distribution of vari- ance ratios over the parameters in the parameter vector for the worker tour generation models as a broad summary of the variance comparison between the 1-day file (the benchmark), the 2-day file, and the full file. The range of ratios is -8.7% to 97.5% for the 2-day file variance to the 1-day file variance, with a median of 65.4%. This indicates a median 35% drop in variance, which translates to a rate of homogene- ity of about 30%. For the full file compared to the 1-day file, the range of ratios is -2.0% to 100.0% with a median of 55.3%. This is a larger drop in variance—about 45%—which translates to a rate of homogeneity of about 28% (almost the same as the 2-day file comparisons). In both Tables 5-1 and 5-2, the rates of homo- geneity are similar across the 2-day and full-file comparisons, indicating that the third day is like the second day in both cases in that it is contributing Table 5-1 Distribution of parameter variance ratios for the non-worker tour generation models. Percentile VR parameter estimates 2-day to 1-day Corresponding 2-day to 1-day a factor VR parameter estimates full-file to 1-day Corresponding full-file to 1-day a factor Minimum 0.453 -9.35% 0.397 2.06% 10th 0.505 0.93% 0.421 5.91% 25th 0.552 10.37% 0.445 9.80% Median 0.603 20.60% 0.479 15.35% 75th 0.650 30.02% 0.535 24.49% 90th 0.724 44.75% 0.597 34.49% Maximum 0.797 59.33% 0.762 61.30% Table 5-2 Distribution of parameter variance ratios for the work tour generation models. Percentile VR parameter estimates 2-day to 1-day Corresponding 2-day to 1-day a factor VR parameter estimates full-file to 1-day Corresponding full-file to 1-day a factor Minimum 0.457 -8.69% 0.372 -2.00% 10th 0.538 7.51% 0.421 5.95% 25th 0.618 23.62% 0.450 10.69% Median 0.654 30.76% 0.553 27.39% 75th 0.725 45.01% 0.686 49.01% 90th 0.780 55.94% 1.440 99.22% Maximum 0.988 97.54% 1.564 100.00%

22 the 1-day file (the benchmark), the 2-day file, and the full file. As can be seen, more than half of the parameters have variance estimates for the 2-day file which are larger than those of the 1-day file. The research team estimates a in this case as 100%. The noise in the variance estimates is great, but the pattern definitely seems to be for 2-day file variances which are not smaller in general than the 1-day file variances. The information being provided from the second day is not large in this case. The results are consistent with work tour mode choices being consistent across the 2 days. For the full file though, the ratios of the full-file to the 1-day file variances are lower than those of the 2-day to 1-day file, being consistent with large but not 100% a values (the median a is 48.3%, with 50% of the parameters being between 30% and 78%). Again, there is considerable noise in the variance es- timates, but the general pattern of large a values for these work tours seems to be borne out (though to a lesser extent than the 2-day file showed). Table 5-4 summarizes the distribution of vari- ance ratios over the parameters in the parameter vec- tor for the social/recreation tour mode choice models as a broad summary of the variance comparison be- tween the 1-day file (the benchmark), the 2-day file, and the full file. In general, the variance ratios and corresponding a factors are smaller than the corresponding ones for the work tours (Table 5-3), but not that much smaller. There is evidence of greater across-day differences for social/recreation tours, but not as much as might be expected. The mode choice results should be viewed with some level of skepticism given the challenges of imputing modes from the GPS-only sample. The eigenvalues from the variance ratio analysis have the same geometric mean as the simple ratios of variance estimates. The eigenvalues have a greater range, which one might expect. The analysis here presents a summary of the overall effect across all model parameters. Some ana- lysts may place a higher importance on some model parameters than others. Those interested in a more detailed explanation of how the parameter estimates vary across the three files are referred to the tables in Appendixes G-2 and G-3. 5.3 Mode Choice Models The mode choice models predict the primary mode of the tour. Two types of mode choice models are considered here: those for work tours and those for social/recreation tours. It was expected that the mode choice models for the social/recreational tours would see greater variability across days as com- pared to work tours, as again work tours should have an enforced consistency of modes from the con- stancy of the work destination. Appendix G-4 provides detailed results on mode choice for work tours. Appendix G-5 provides de- tailed results on mode choice for social/recreation tours. The jackknife standard errors were less stable (lower degrees of freedom) than those for tour gen- eration (Section 5.2), so there needs to be greater skepticism with regard to the analysis. The a cal- culations were carried through for both models, but the eigenvalue-eigenvector analysis was not done in this case. Table 5-3 summarizes the distribution of vari- ance ratios over the parameters in the parameter vector for the work tour mode choice models as a broad summary of the variance comparison between Table 5-3 Distribution of parameter variance ratios for work tour mode choice models. Percentile VR parameter estimates 2-day to 1-day Corresponding 2-day to 1-day a factor VR parameter estimates full-file to 1-day Corresponding full-file to 1-day a factor Minimum 0.500 -0.06% 0.023 -58.69% 10th 0.635 26.90% 0.441 9.18% 25th 0.851 70.16% 0.567 29.63% Median 1.129 100.00% 0.682 48.29% 75th 1.331 100.00% 0.866 78.18% 90th 1.443 100.00% 0.999 99.67% Maximum 1.652 100.00% 1.067 100.00%

23 1-day file (the benchmark), the 2-day file, and the full file. There are only five parameters, so the five per- centiles correspond to the sorted parameter values. The 2-day variance ratios are high (0.66 to 0.94), corresponding to large a values (31% to 87%). There is a sizeable degree of within-person across-day correlations. For the full file, the ratios of the full-file to the 1-day file variances are in the same general range as for the 2-day to 1-day ratios, but slightly larger in general (68% to over 1 in two cases). These cor- respond to a range of a factors of 48% to 100%. In general, the a factors can be seen to be large for the work tour destination choice parameter estimates. Table 5-6 summarizes the distribution of vari- ance ratios over the parameters in the parameter vec- tor for the social/recreation tour destination choice models as a summary of the variance comparison between the 1-day file (the benchmark), the 2-day file, and the full file. In general, the variance ratios and correspond- ing a factors are smaller than the corresponding ones for the work tours (Table 5-3). There is evidence of 5.4 Destination Choice Models The destination choice models predict the traffic analysis zone (TAZ) for the primary destination of the tour. Two types of destination choice models are con- sidered here: destination choice models for work tours and mode choice models for social/recreation tours. As with the mode choice models, it was expected that the mode choice models for the social/recreational tours would see greater variability across days as compared to work tours, as again work tours should have an enforced consistency of destination choices from the constancy of the work destination. There is no reason again why social/recreation tours on the other hand shouldn’t be very different day to day. Appendix G-6 provides detailed results on des- tination choice for work tours. Appendix G-7 pro- vides detailed results on destination choice for social/recreation tours. The a calculations were car- ried through for both models, but the eigenvalue- eigenvector analysis was not done in this case. Table 5-5 summarizes the distribution of variance ratios over the parameters in the parameter vector for the work tour destination choice models as a summary of the variance comparison between the Table 5-4 Distribution of parameter variance ratios for social/recreation tour mode choice models. Percentile VR parameter estimates 2-day to 1-day Corresponding 2-day to 1-day a factor VR parameter estimates full-file to 1-day Corresponding full-file to 1-day a factor Minimum 0.180 -63.90% 0.166 -35.56% 10th 0.246 -50.71% 0.222 -26.50% 25th 0.478 -4.43% 0.445 9.76% Median 0.799 59.75% 0.638 41.14% 75th 0.898 79.57% 0.758 60.61% 90th 1.057 99.79% 0.995 98.79% Maximum 1.305 100.00% 1.130 100.00% Table 5-5 Distribution of parameter variance ratios for work tour destination choice models. Percentile VR parameter estimates 2-day to 1-day Corresponding 2-day to 1-day a factor VR parameter estimates full-file to 1-day Corresponding full-file to 1-day a factor Minimum 0.655 30.97% 0.681 48.23% 25th 0.804 60.81% 0.826 71.76% Median 0.848 69.52% 0.996 99.36% 75th 0.905 81.06% 1.934 100.00% Maximum 0.935 86.98% 10.202 100.00%

24 dixes E and G. An average a factor was computed for each of the tabular summaries referenced and each of the model estimation summaries referenced. For each, the median a factor is reported, as is the 75th percentile a factor. This is done because some users may wish to take a more conservative approach in terms of trading off additional survey days versus additional households, and those users should con- sider the 75th percentile value. greater across-day differences for social/recreation tours. CHApTER 6 SuMMARy Of RESulTS 6.1 Summary of variance Analysis Table 6-1 summarizes the a factors computed for comparing the 1-day to 2-day, and 1-day to full files as given in detail in Chapters 4 and 5 and Appen- Table 6-1 Summary of variance analysis. Category of estimate Median a value 75th percentile a value Importance Table: automobile ownership by county 100.00% 100.00% 10% Table: total tours by person type1 17.6% 25.6% 15% Table: total tours by person type/tour purpose2 4.4% 7.6% 15% Table: trip distance by trip purpose3 48.2% 58.4% 10% Table: trip duration by trip purpose4 45.1% 49.7% 10% Table: percentage trips county to county5 54.2% 74.5% 10% Table: mode choice by automobile sufficiency6 -7.0% -4.5% 30% Mean for tabular values 26.0% 31.9% 100% Model: automobile ownership 100.00% 100.00% 10% Model: worker tour generation7 29.1% 47.0% 15% Model: non-worker tour generation7 18.0% 27.3% 15% Model: work tour destination choice7 84.4% 90.5% 15% Model: social/recreation tour destination choice7 23.4% 35.4% 15% Model: work tour mode choice7 74.1% 89.1% 15% Model: social/recreation tour mode choice7 50.4% 70.1% 15% Mean for model estimation values 51.9% 63.9% 100% Grand mean 39.0% 47.9% 1Weighted percentiles of five Table 4-2 total tour a values (weighted by full-file total tours for the person type). 2Weighted percentiles of eight (full-time and part-time worker) Table 4-3 tour a value simple means (weighted by full-file total tours for the person type). 3Weighted percentiles of four Table 4-1 trip distance a values (weighted by the full-file total trips for the trip purpose). 4Weighted percentiles of three Table 4-1 trip duration a values (weighted by the full-file total trips for the trip purpose). 5Simple percentiles of thirteen Table 4-4 county-to-county pct a values. 6Simple percentiles of two Table 4-5 a values. 7Mean values (over 2-day to 1-day, and full file to 1-day) of simple percentiles over parameter a values. Table 5-6 Distribution of parameter variance ratios for social/recreation tour destination choice models. Percentile VR parameter estimates 2-day to 1-day Corresponding 2-day to 1-day a factor VR parameter estimates full-file to 1-day Corresponding full-file to 1-day a factor Minimum 0.464 -7.29% 0.322 -10.17% 25th 0.537 7.44% 0.431 7.60% Median 0.674 34.81% 0.459 12.02% 75th 0.739 47.86% 0.526 22.90% Maximum 0.877 75.42% 0.646 42.48%

25 (such as from the Census Transportation Planning Package). In this way, the stability of the ASCs is of somewhat less importance than the other parameter estimates if they may be adjusted anyway. However, they also represent a bias or underlying preference for certain alternatives, and it is interesting to con- sider the consistency of such biases across days. For example, in mode choice, does someone always prefer automobile over transit by the same amount, or does it depend on whether the person happens to have child care responsibilities on that particular day? These are interesting questions as more com- plex model structures become common, and they relate to the work of Cherchi and Ortúzar (2008), who found that SP data with multiple responses per individual may provide an enhanced ability to estimate models that account for taste heterogeneity. The details of such differences can be examined by interested readers in the results presented in Appendix G. 6.2 Equivalent Sample Size The a values in Section 6.1 can be used to cal- culate the equivalent sample size of the Cleveland HTS, if the survey were a 1-day survey. This is done using Equation 1.1 (from the Vovsha method), repeated here, and its equivalency: ( ) ( ) ( )= + + = + − − 1 1 1 Eq 6 10 0S S R D R D S a D D N    where • SN is the new (reduced) sample size, which in this case is 2,780 households in the GPS-only sample of the Cleveland survey, • S0 is the equivalent sample size for a 1-day survey, which is calculated here, • R is the ratio of day-to-day (intra-person) vari- ability s2e to inter-person variability s2ju, • D is the sample length in days, in this case 3, and • a is the correlation coefficient, as taken from Table 6-1. Table 6-2 shows the calculated results for the 2,780 households in the 3-day GPS-only sample of the Northeast Ohio Survey. The equivalent sample sizes are higher for the tabular values than for the model estimation values because the a values are lower. A level of judgment is required in selecting among these, depending on One can see the great differences in a values. Some of this may be noise in the variance esti- mates, but there appear to be inherent differences. The median a values and 75th percentile a values are gen- erally weighted medians or weighted 75th percen tiles by total “size” (total tours, for example, for a person type). Following the Vovsha method (Appendix A-6), the means over the tabular values (model estimation values) are weighted means of the corresponding median and 75th percentiles for the a values, using as weights the importance weights as given in the last column of Table 6-1. The grand means are the simple means of the mean for tabular values and the mean for model estimation values. The two means of medians are 26.0% and 51.9% for a grand mean of 39.0%. The two means of 75th percentiles are 31.9% and 63.9% for a grand mean of 47.9%. The following points should be considered. First, as can be observed in the summary val- ues for model estimation, there are generally higher intra-person day-to-day correlations for work travel than for non-work travel. This is what the research team would expect. The work activity imposes both a constraint and a level of consistency on travel behavior that persists across days, while non-work travel is significantly more variable, presumably because the activities that drive it are more variable. Second, it appears that the model estimations may have higher intra-person day-to-day correla- tions than the tabular values. This is not definitive, as it is influenced by the negative values for the mode choice by automobile sufficiency table, which should be viewed with skepticism due to the low number of observations for many modes, and potential bias in the GPS-imputation of modes. However, the same pattern can be observed for tour generation, where the tour generation estimation results have higher a values than the tour generation tabulations. While this would mean that there is less value to collect- ing additional travel days for model estimation, it may reflect positively on the predictive ability of the models. Related to that are questions of the a values for different types of estimated model parameters, spe- cifically alternative-specific constants (ASCs) ver- sus other terms. The ASCs account for all factors not otherwise included in the models. Usually, dur- ing model calibration, the ASCs will be adjusted to match aggregate choice shares, as generated either by the tabular data or by an external data source

26 The bias checks reveal that it is important to avoid bias in the later collection days, and it appears that this will require commitment of resources to make sure there is no fall-off between the first col- lected travel day and later collected travel days. So it appears that the ratio q/p should not become too small (i.e., the later collection days should not be too inexpensive). If a ratio of 0.075 is used, then the optimal number of travel days is 6, 5, 4, or 3, de- pending on the correlation factor a. The household sample size should be large enough to allow for suf- ficient precision for those measured characteristics with a higher correlation (50% to 100%). There will certainly be characteristics of interest at the house- hold level which have 100% or near 100% correla- tion (e.g., automobile ownership). To be relatively conservative, the 62.5% correlation may be the right factor. In this case, the optimal number of travel days is 2, 3, or 4, depending on the q/p factor. CHApTER 7 CONCluSIONS AND RECOMMENDATIONS 7.1 Discussion of Research Methods The jackknife variance estimation system devel- oped for the 2012 Northeast Ohio Regional Travel Survey was very successful in that it provided vari- ance estimates that accurately measured the sampling error for a wide variety of estimated statistics and easily handled the multi-day, multi-person household structure without complex and specialized variance components modeling. It could also handle accurate which uses are more highly valued and how conser- vative one wants to be. In the best case, conduct- ing a 3-day survey is equivalent to nearly doubling the sample size when compared to households sur- veyed for a single day. In the most conservative case, it is equivalent to increasing the sample size by only 32%. 6.3 Optimal Number of Survey Days The Pas (1986) paper has a good framework for finding an optimum number of collected travel days that is a function of the ratio q/p between the cost of recruiting a sampled household (and of collecting the first interview) p and the cost of collecting each additional day q, the within-person variance a fac- tor, and the number of travel days T. This framework is described in Appendices A-4 and H. The a factor varies over the various characteris- tics and parameters, as is evident in this digest. The research team used the values 25%, 37.5%, 50%, and 62.5% reflecting the range of a values seen in Table 6-1. Table 6-3 presents the optimal number of travel days for these four a values and for five different q/p ratios. The q/p ratios are not calcu- lated from this study and will be dependent on the specific survey implementation. As a point of refer- ence, the q/p ratios derived from the costs reported Stopher et al. (2008) are 0.063 and 0.124, so the values reported in Table 6-3 reasonably bound that range at least. (Appendix H provides detailed tables outlining the scenarios.) Table 6-3 Distribution of parameter variance ratios for social/recreation tour mode choice models. Correlation a q/p 0.05 q/p 0.075 q/p 0.10 q/p 0.15 q/p 0.20 25.0% 8 6 5,6 4,5 4 37.5% 6 5 4 3 3 50.0% 4,5 4 3 3 2 62.5% 3,4 3 3 2 2 Table 6-2 Equivalent sample size for 2,780 GPS-only households in Cleveland HTS. Category of estimate Median a value S0 for median 75th percentile a value S0 for 75th percentile Mean for tabular values 26.0% 5,487 31.9% 5,092 Mean for model estimation values 51.9% 4,092 63.9% 3,661 Grand mean 39.0% 4,685 47.9% 4,259

27 lar estimates when the extra (i.e., third) travel day was included. This may or may not hold true when larger numbers of days are included (e.g., up to 5 full weekdays). Inclusion of weekend days is likely to diminish the a factor. The answer to the constant a factor appears to be “no” when the issue is types of estimates. There was noise in the variance estimators as documented in this digest, but, even with this noise, the evidence seems pretty firm that different types of estimates have different underlying a factors. These a factors can range from negative rates of homogeneity (as low as minus 60%) all the way to nearly perfect positive homogeneity (positive 100%). Negative homogene- ity may result for travel behavior that is less likely on a successive day if it occurs on an earlier day. For example, grocery shopping for some people may be a once-a-week event, so that if it occurs on Monday, there will be less chance of it also occurring on Tues- day. Near 100% homogeneity might occur for near constant travel patterns, such as trips to work places from home for full-time workers. For the models estimated, the jackknife variance- covariance matrices for the estimated parameter vec- tors appeared to broadly follow the relatively simple model as found in Pas (1986) and Koppelman and Pas (1984) where the variance-covariance matrices for the various day files differ only by a fixed design effect coefficient 1 + a * (T - 1). But the a values differ across the various parameters and have a wide range, as is the case for the Chapter 4 table estimates. A larger data set with more days may allow for a more complex relationship, but the simplest model is con- sistent with the parameter vector variance-covariance matrices the research team studied in this project. The technical machinery developed here can be used for other data sets and other models. The simpler approach of taking the ratios of the jackknife vari- ance estimates for each parameter estimate one by one and analyzing those using the design effect framework may suffice in most situations to capture the necessary design effect structure. The variance- covariance eigenvector-eigenvalue approach allows for studying fully the covariance structure between parameter estimates, rather than just the variances of the parameter estimates, but this may not add anything of genuine relevance in most situations. 7.2 Evaluation of Survey Method An objective of this study was to evaluate whether GPS-enabled multi-day surveys overcome the survey variance estimation for a wide variety of parameter vectors for the complex models that were estimated on the 2012 Northeast Ohio Regional Travel Survey. The ease of the jackknife approach, once the jack- knife replicate weights are put into place, makes it of tremendous value to travel surveys with their many models and estimates and can generate unbiased variance estimates for the within-person, across-day correlations that will be needed under this new GPS regime. Variance estimation is normally difficult to do with high precision, especially for skewed or high-kurtosis continuous distributions. A measure of precision for a variance estimate is the “degrees of freedom.” This terminology comes from the approximate chi-square distribution for variance estimates: the appropriate ratio of a variance estimate to the true variance has an approximate chi-square distribution, and the degrees of freedom of this distribution is linked to the variance of the variance estimator (higher degrees of freedom mean lower variance of the variance). For the jack- knife variance estimator, the degrees of freedom are generally bounded above by the number of replicates minus the number of strata. In this case, the maximum degrees of freedom were 432 (453 replicates minus 21 strata). This is certainly sufficient for highly pre- cise variance estimates, but when estimates become depen dent on a small number of strata, or in the presence of outliers in the data, the actual degrees of freedom can become much less. The research team developed a way to estimate degrees of freedom and used this broadly to separate out much less precise variance estimates and only analyzed variance esti- mates that passed the degrees-of-freedom test. One could see the instability of the variance estimates (as seen in a wildness and lack of pattern in the esti- mates) that failed that test in general. More research is needed to refine this method to provide stability in such situations, but the overall approach appears to work reasonably well. One fundamental hypothesis to be tested in this research was whether the effects on variance of the additional days of data collection could be summa- rized by a relatively simple design effect factor 1 + a * (T - 1), where T is the number of days and a is a “universal” rate of homogeneity, that is constant across estimates, parameters in a regression param- eter vector, and subsets of collection days. The answer appears to be “yes and no.” It appears to be “yes” when the issue is the subset of collection days. The a factor, with some exceptions, did not appear to change for particu-

28 the GPS-only approach needs further development before it can be relied on as a stand-alone paradigm. The GPS-with-recall-interview approach is certainly a much more reliable way of collecting much more data about the travel behavior, but the necessity of a special extra recall interview is burdensome and limits the number of collection days that can be supplemented with a recall interview. An ideal approach might be a combination of GPS technology and prompted-recall methods for multi-day studies. This approach has been used in recent travel surveys conducted by RSG5 with the use of GPS-equipped smartphones that prompt sampled persons with limited and simple questions about their travel behavior as they travel (Greene et al. 2015). If a relatively non-burdensome method for collecting a limited amount of extra information on the spot is used (why are you traveling here, where are you going), this could be enough to allow for a very large number of collection days. This would be a “GPS-with-on-the-spot-prompts” approach and might be the necessary improvement over the GPS- only approach. A GPS-only approach which is not updated to a GPS-with-on-the-spot-prompts approach prob- ably has to be supplemented by GPS-with-recall- interview households, with random selection into the two groups, as was done for the 2012 Northeast Ohio Regional Travel Survey. The imputations in the GPS-only households need to be calibrated so that the estimates from these households are in rea- sonable alignment with the data in the GPS-with- recall-interview households. There should not be significant differences if the household groups are randomly selected, and the survey should, as a mini- mum, have to pass this basic quality test. Another possible approach is to have particular collection days GPS-with-recall-interviews and other collection days GPS-only. The more burdensome GPS-with-recall- interview is only done for a limited set of randomly selected days: enough to calibrate the imputations for the GPS-only days. Some mix of randomly selected households and randomly selected days for GPS-only and GPS-with-recall-interview data might allow for the right balance between ensuring data quality in the GPS-only data and not having a protocol too burdensome on the sampled households. As the fatigue challenges faced by multi-day diary surveys. The research team’s conclusion is that, at least in the specific case of the 2012 Northeast Ohio Regional Travel Survey, they do not. The use of GPS trace data in travel surveys makes multi-day data collection much more feasible than in the past, based on the assumption that using a GPS device is less burdensome than completing a travel diary, so there should be less drop-off in response rate for the subsequent days. Getting trip data that is com- pletely accurate in an almost completely automated way could increase the potential range and power of travel surveys, but GPS-only data requires careful protocols with sufficient resources, and the imputa- tion methods and technology have to be developed beyond the current state of the art. At a minimum, travel surveys need to pass the test of no significant bias between collection days. If the initial collection day is selected at random, and there is a mix of initial collection days (e.g., every weekday is equally likely to be the first collection day), then there is no reason why there should be significant differences in the esti- mates across these collection days. The research team expected each day of the week to have a different result, but if Collection Day 1 has a randomized mix of days of the week, then there should be no sig- nificant difference across collection days. Any sig- nificant difference shows that data collection is not uniform across the collection days and brings into question the unbiasedness of the estimates. This hap- pened for the 2012 Northeast Ohio Regional Travel Survey, as can be seen in Chapter 3 and Appendix D (this travel survey was not designed to focus on the later days: the focus of this study was primarily on the first day). For future travel surveys, the protocols and incentives should be set up so that data collec- tion does not vary across collection days. The ex- istence of the GPS technology brings this into the realm of possibility, but the possibility still needs to be made into a reality by careful work. The ultimate goal from the viewpoint of inexpen- sive data collection is the GPS-only paradigm. If it can be made possible to collect reliable, useful travel sur- vey data by making sure people carry an instrument with a GPS (GPS logger, or smartphone, or some- thing else), then a relatively small number of sampled households can be leveraged into a rich data source, as a large number of travel days can be collected. This is certainly the ideal. But from the mode differences seen in Chapter 2 and Appendix C between the GPS-only and GPS-with-recall-interview data, it is clear that 5http://docs.trb.org/prp/16-6274.pdf, http://mccog.net/MP015/ assets/wertman_a-multi-day-smartphone-based_2015-mpo- conference.pdf.

29 should be subject to similar bias checks before they are used for travel model development. Assuming the data pass the bias checks, the key factor to consider when using multi-day survey data is the repeated measurement problem, as discussed in Section 1.1.5. This issue has previously been identified in the literature and is reported here. The issue is that when repeated measurements are taken from the same person or household, there will usu- ally be some correlation across those measurements, and thus they provide less information than taking a measurement from a new household. The litera- ture has shown that parameter estimates from such data is unbiased, but that variance measures will be flawed. This means that when estimating models, the analyst should expect the reported t-statistics to be inflated. Similarly, statistical tests or other measures that rely on the variance can result in incorrect inferences. The same issue occurs when there are repeated observations within a single travel day (e.g., when outbound and return trips are treated as separate observations). This research demonstrated how the jackknife can be used to estimate the true variance. So, how should a travel model developer work- ing with a multi-day survey proceed in practice? Recommendations are as follows: 1. Estimate the models using the full data set in the same manner as would be done with a single-day survey. 2. After selecting a preferred model specifica- tion, use the jackknife to calculate the variance on each coefficient, and use that variance to re-calculate the standard error and t-statistic associated with each coefficient. 3. If these revisions change the conclusions about which terms should be included in the model, adjust the model specification and repeat. The procedure will be somewhat different for other statistical measures, but the approach is the same in that the jackknife variance can be substi- tuted any time a variance is used. To facilitate the future application of this ap- proach, the Python code used to calculate the jack- knife variance estimates for the estimated choice models is included in Appendix J. In this case, ALOGIT was used to estimate the models. Repli- cate weights are included as a series of columns on the estimation file, and the script repeatedly calls industry gains experience in imputing complete travel data from raw GPS data, the degree of sup- plementation from interviews can gradually decline, although it may never be reduced to zero. As these surveys are further developed, it is worth considering the risk of self-selection bias posed by different types of survey technology and different numbers of days. Are there certain catego- ries of users less willing or less able to use a GPS, and are there other categories less willing or able to participate in multiple days of data collection? In addition, there may be reasons to extend the num- ber of days of data collection if additional days help with the imputation of trip purposes and modes from GPS traces. In its current form, however, there is a substantial cost associated with processing and cleaning those traces, so the additional days may be of limited value for data imputation. The quality of the mode, purpose, and other imputations from GPS-only data may depend, in part, on the GPS technology itself. If the positional accuracy of the GPS improves, it may improve the quality of the imputations. NCHRP Report 775 dis- cusses related issues. All of these options to supplement and support the GPS data require greater cost, whether they are recall interviews to supplement the GPS data or prompts from smartphones and the like which can collect in- formation on the spot. In addition, incentives need to be provided to make sure that the sampled households treat the following days in the same way as the first day of data collection. All of these extra costs will certainly lead to larger project budgets, but overall the data collection regime can still collect more data for the same cost as more traditional diary methods. 7.3 using Multi-Day Surveys in Travel Model Development A second project objective was to investigate the effects of using multi-day household travel surveys for developing travel demand models and demonstrate a method of correcting for the repeated measurement problem. A prerequisite for the use of multi-day survey data for developing an operational travel model is that the data be consistent across collection days and generally representative of the underlying travel be- havior. As discussed in the previous section, the GPS- only sample in the 2012 Northeast Ohio Regional Travel Survey does not meet this prerequisite. Future GPS-only and/or multi-day household travel surveys

30 a range of cost factors. This research does not pro- vide new evidence in relation to the cost factors, but only reports them based on a limited number that have appeared in the literature. Assuming one wants to be reasonably conservative, it appears reasonable to conduct surveys with 2, 3, or 4 days of data col- lection per household. This evidence would argue against surveys with a week or more of data collec- tion, unless the cost factor is very low or the intended purpose is very specific. Furthermore, regardless of the number of days, enough households must be sampled to meet the needs of models with little or no day-to-day variation. How many households should be surveyed in total? This research cannot answer the question definitively, but provides some evidence to inform the decision. The tables in Appendix E show how the data tabulations change when calculated using a 1-, 2-, or 3-day sample, and tables in Appen dix G show how the model estimation results change for those three samples. In the latter case, the param- eters that become statistically insignificant are high- lighted in red. It is possible to calculate the sample size based on a tolerance for error in any particular measurement, but again, the challenge is in how to combine these in a model system with hundreds of parameters. It is ultimately based on the analyst’s tolerance for noise in both the model estimation results and in the tabular data. It is also dependent on the desire for more market segmentation (e.g., more detailed trip purposes) and a longer list of descrip- tive variables which may be supported by a larger sample. Here, the research team drew on its experience developing travel demand models to provide some level of guidance. This should not be construed as a definitive finding of this research, but rather an an- swer to the question: If you had to make a decision, what would you do? In reviewing the model estima- tion results in Appendix G, the research team noted that some variables included in the full-sample es- timation results became insignificant in the 1-day or 2-day estimations. These variables were initially included because they were deemed to be of value, so their loss is noteworthy. In addition, some vari- ables included in the full-sample estimations were of marginal significance (e.g., out-of-vehicle time and household size in the work mode choice model). These terms were retained in the models because they should be important and because their reten- tion allowed the research team to estimate the vari- ances and design effects for those terms. Reviewing ALOGIT, using the appropriate weight each time. After all estimation runs are complete, a separate file is read containing the replicate factors, and they are used to combine the results and calculate the jack- knife variance. The process may need to be adapted to the circumstance, but the research team hopes that it provides a relatively simple model to follow. 7.4 Evidence on Sample Size Equivalency A third objective of this study was to provide empirical evidence on the sample size equivalency of single-day and multi-day surveys and on the rela- tionship between sample size and travel model esti- mation results and data summaries. This evidence is based on the design effects and the resulting a factors reported throughout this digest. The correlation coefficient a provides an indication of how much responses are correlated across days for the same respondents: a values of 100% indicate near perfect correlation (e.g., with automobile ownership), and a values of 0% indicate no correlation. In the former case, there is no value to collecting additional survey days, and in the latter there is as much value in collecting 1 additional day for the same household as there is in collecting one additional household. The a values are specific to the travel choice and parameter of interest, and on a full-scale travel de- mand model, there can easily be hundreds of param- eters. They are summarized in Table 6-1, providing a median and a 75th percentile value for each model or tabulation and a weighted average given some relative importance. The importance assigned in combining these values is inherently a judgment call, and others are free to assign different levels of impor- tance given their situations. There is some question as to whether these values should be averaged at all, or whether it would be equally appropriate to base decisions on the “limiting factors”—essentially those specific model components and parameters that the analyst is not willing to sacrifice. Assuming one proceeds with the importance weights assigned, the analysis shows that the 3-day GPS-only sample of 2,780 households in the North- east Ohio Survey would be equivalent to a single- day sample of between 3,631 and 5,487 households (32% to 97% higher), depending on how conserva- tive one wants to be. Extending this further, in Section 6.3, the re- search team calculated the optimal number of sur- vey days for the range of a values observed and for

31 REfERENCES Note: Includes references both for main text and Appendix A. 2009 National Household Travel Survey User’s Guide (2011). USDOT. FHWA. http://nhts.ornl.gov/ publications.shtml. Bhat, C. R., Srinivasan, S. and Axhausen, K. W. (2005). An Analysis of Multiple Interepisode Durations Using a Unifying Multivariate Hazard Model. Transportation Research Part B: Methodological 39 (9): 797–823. Cherchi, E. and Ortúzar, J. de D. (2008). Empirical Iden- tification in the Mixed Logit Model: Analysing the Effect of Data Richness. Networks and Spatial Eco- nomics 8 (2-3): 109–24. Cirillo, C., Daly, A., and Lindveld, K. (1998). Eliminat- ing bias due to the repeated measurements problem in SP data, in Stated Preference Modelling Tech- niques: PTRC Perspectives 4. PTRC Education and Research Services Ltd. Dill, J. and Broach, J. (2015). Using Vehicle GPS Data to Understand Travel Time Reliability. Presented at the TMIP Webinar: Using Multiday GPS Data, July 16. https://connectdot.connectsolutions.com/ p6awfg98tk1/. Erhardt, G. D., et al. (2008). Enhancement and Appli- cation of an Activity-Based Travel Model for Con- gestion Pricing. Presented at the Transportation Research Board Innovations in Travel Modeling Conference, Portland, Oregon. Fuller, W. A. and Battese, G. E. (1974). Estimation of linear models with crossed-error structure. Journal of Econometrics 2, 67–78. Golob, T. T. and Meurs, H. (1986). Biases on response over time in a seven-day travel diary. Transportation 13, pp. 163–181. Greene, E., et al. (2015). A seven-day smartphone-based GPS household survey in Indiana. RSG. Salt Lake City, Utah. Kang, H. and Scott, D. M. (2009). Modelling day-to-day dynamics in individuals’ activity time use consider- ing intra-household interactions. Transportation Research Board 88th Annual Meeting. Kish, L. (1965). Survey Sampling. New York: John Wiley & Sons. Koppelman, F. S. and Pas, E. I. (1984). Estimation of disaggregate regression models of person trip gen- eration with multiday data. Proceedings of the Ninth International Symposium on Transportation and Traffic Theory (eds. J. Volmuller, R. Hamerslag), Utrecht, Netherlands: VNU Science Press, 513–531. Meurs, H., Van Wissen, L., and Visser, J. (1989). Mea- surement bias in panel studies. Transportation (Netherlands) 16 (2), 179–194. the tabular data, the research team observed notice- able differences in average trip lengths between the 1-day, 2-day, and full-sample estimates. Review- ing this evidence, the research team would not be comfortable recommending a sample any smaller than the full sample (2,775 GPS-only households with 3 weekdays of data collection) for the purpose of travel model development. The research team would prefer to work with a sample larger than this. To be fair, it appears that the designers of the 2012 Northeast Ohio Regional Travel Survey agree with the research team, given that the survey collected data for a total of 4,540 households. The remain- ing households were not included in the analysis because they involved only 1 day of data collection by either prompted-recall or diary methods. There are two related factors to consider. First, regardless of the sample size, it would be benefi- cial to improve the data quality, as discussed in Sec- tion 7.2. Second, if transit analysis is of interest, an onboard transit survey may be necessary to allow for mode choice model estimation and detailed cali- bration. Even though the survey included a targeted oversample in areas likely to have high transit rider- ship, the resulting data contain too few transit obser- vations to do much with. 7.5 limitations and Recommended future Research This research has two important limitations: (1) the analysis was based on a single travel survey and surveys using different methods or in different regions may produce different results, and (2) the biases observed in the GPS-only data and in the later survey days are a wildcard in this analysis. There is no way to know the effect that this might have on the intra-person day-to-day variation without knowing the true underlying behavior or repeating the experi- ments with a different data set. Recommendations for future research flow directly from these limitations. To verify the results, the experiment could be repeated elsewhere using a survey demonstrated to have no bias for the later data collection days, possibly with a multi-day GPS- with-prompted-recall survey. In addition, the research team recommends fur- ther development of the methods for GPS-enabled travel surveys, whether it be a stronger reliance on GPS-with-recall-interview or through some form of GPS-with-on-the-spot-prompts as a means to improve data quality.

32 Stopher, P. R., et al. (2008). Reducing burden and sample sizes in multi-day household travel surveys. Trans- portation Research Record 2064: 12–18. Valliant, R., Dever, J. A., and Kreuter, F. (2013). Practical Tools for Designing and Weighting Survey Samples. New York: Springer (www.springer.com). Wilhelm, J., et al. (2013). 2012 Northeast Ohio Regional Travel Survey Final Report. Northeast Ohio Area- wide Coordinating Agency, Cleveland OH. Wolf, J., et al. (2013). 2012 Northeast Ohio Regional Travel Survey Draft Technical Compendium. Northeast Ohio Areawide Coordinating Agency, Cleveland OH. Wolf, J., et al. (2014). NCHRP Report 775: Applying GPS Data to Understand Travel Behavior, Volume I: Background, Methods and Tests. Prepared for the National Cooperative Highway Research Program, Transportation Research Board, Washington, DC. Wolter, K. M. (2007). Introduction to Variance Estimation, 2nd ed. New York: Springer (www.springer.com). Xu, Y. and Guensler, R. (2015). Capturing Personal Modality Styles Using Multiday GPS Data. Pre- sented at the TMIP Webinar: Using Multiday GPS Data, July 16. https://connectdot.connectsolutions. com/p6awfg98tk1/. AppENDIxES The following appendixes are available on line at www.trb.org by searching for NCHRP Research Results Digest 400. • Appendix A Detailed Literature Review • Appendix B Jackknife Variance Estimation • Appendix C Results Comparing GPS-Only and GPS-With-Prompted-Recall Data • Appendix D Results Comparing Collection Days • Appendix E Results for Tables • Appendix F Estimates and Design Effects for Model Estimation: Technical Details • Appendix G Model Estimation Results: Parameter Estimates, Variances, and Design Effects • Appendix H Cost-Benefit Analysis for Multi- Day Studies • Appendix I Survey Data Processing • Appendix J Python Code for Jackknife Application Murakami, E. and Watterson, W. T. (1992). The Puget Sound transportation panel after two waves. Trans- portation 19(2), 141–158. Ortúzar, J. de D. and Willumsen, L. G. (2001). Modelling Transport. Third edition, John Wiley & Sons Inc., New York. Parsons Brinckerhoff, Inc., Westat, and Dunbar Trans- portation Consulting (2014). Activity-Based Mod- eling Framework: Final Project Report. Prepared for North Central Texas Council of Governments, Arlington, TX. Pas, E. I. (1986). Multiday samples, parameter estimation precision, and data collection costs for least squares regression trip-generation models. Environment and Planning A, 18, 73–87. Pas, E. I. (1987). Intrapersonal variability and model goodness-of-fit. Transportation Research A, Vol 21A (6), 431–438. Pendyala, R. M. (2014). Measuring day-to-day variability in travel behavior using GPS data final report. FHWA Office of Highway Policy Information. http://www. fhwa.dot.gov. Pendyala, R. M. and Pas, E. (2000). Multi-day and multi- period data for travel demand analysis and model- ing. TRB Transportation Research Circular E-C008: Transportation Surveys: Raising the Standard. Picado, R. (2014). Technical Memorandum #3, Upstream Model Updates, Technical Memorandum prepared by Parsons Brinckerhoff for NOACA Model Update Project Team, October 8, 2014. Rose, J. M., et al. (2009). The Impact of Varying the Num- ber of Repeated Choice Observations on the Mixed Multinomial Logit Model. In European Transport Conference, Leeuwenhorst, The Netherlands, 5–7. Rust, K. F. and Rao, J. N. K. (1996). Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical Research 5, 283–310. Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. New York: Springer-Verlag. Simas Oliveira, M. and Gupta, S. (2013). NOACA HTS Data Imputation Process, Technical Memorandum prepared by Westat and PB, August 30, 2013. Skinner, C. J., Holt, D., and Smith, T. M. F. (1989). Analysis of Complex Surveys. New York: John Wiley and Sons. Southeast Florida Transportation Council (2014). 2015 Southeast Florida Household Travel Survey: White Paper, Modeling Subcommittee, Regional Trans- portation Technical Advisory Committee, January 2014. http://www.fsutmsonline.net/images/uploads/ mtf-files/Southeast_Florida_Household_Travel_ Survey_0205_2014.pdf

Transportation Research Board 500 Fifth Street, NW Washington, DC 20001 These digests are issued in order to increase awareness of research results emanating from projects in the Cooperative Research Programs (CRP). Persons wanting to pursue the project subject matter in greater depth should contact the CRP Staff, Transportation Research Board, National Academies of Sciences, Engineering, and Medicine, 500 Fifth Street, NW, Washington, DC 20001. COPYRIGHT INFORMATION Authors herein are responsible for the authenticity of their materials and for obtaining written permissions from publishers or persons who own the copyright to any previously published or copyrighted material used herein. Cooperative Research Programs (CRP) grants permission to reproduce material in this publication for classroom and not-for-profit purposes. Permission is given with the understanding that none of the material will be used to imply TRB, AASHTO, FAA, FHWA, FMCSA, FRA, FTA, Office of the Assistant Secretary for Research and Technology, PHMSA, or TDC endorsement of a particular product, method, or practice. It is expected that those reproducing the material in this document for educational and not-for-profit uses will give appropriate acknowledgment of the source of any reprinted or reproduced material. For other uses of the material, request permission from CRP. ISBN 978-0-309-44600-6 9 780309 446006 9 0 0 0 0 Subscriber Categories: Planning and Forecasting • Public Transportation

Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys Get This Book
×
 Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

TRB's National Cooperative Highway Research Program (NCHRP) Research Results Digest 400: Sample Size Implications of Multi-Day GPS-Enabled Household Travel Surveys summarizes an NCHRP project that studied the design of household travel surveys. Multi-day travel surveys are now more feasible, given global positioning system (GPS) technology. This project explores if surveys using a GPS device provides less drop-off in response compared to travel diaries. This project also investigates the effects of using multi-day data for developing travel demand models and explores the impact of sample size on multi-day versus single-day surveys.

Appendixes A through J are available online.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!