Read "Public Transit Rider Origin–Destination Survey Methods and Technologies" at NAP.edu

« Previous: Chapter 1 - Introduction

Page 9

Suggested Citation:"Chapter 2 - Literature Review." National Academies of Sciences, Engineering, and Medicine. 2019. Public Transit Rider Origin–Destination Survey Methods and Technologies. Washington, DC: The National Academies Press. doi: 10.17226/25428.

Page 10

Page 11

Page 12

Page 13

Page 14

Page 15

Page 16

Page 17

Page 18

Page 19

Page 20

Page 21

Page 22

Page 23

Page 24

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

9 This literature review is intended to provide background on the current state of practice in transit technologies and examines a range of sources, including federal guidance, academic papers, policy white papers, existing survey documentation, and past TCRP syntheses. These sources were primarily identified using Google Scholar and TRBâs Transportation Research Information Documentation (TRID), as well as Internet research and reviews of federal guid- ance from the U.S. Department of Transportation. The literature review is organized into seven parts for easier navigation of topics: 1. Federal guidance and regulations affecting surveys. 2. Overview of survey methods. 3. Factors affecting response rates. 4. Factors affecting completion rates. 5. Survey sampling plans. 6. Expansion of survey results. 7. Emerging role of passive data. Federal Guidance and Requirements The federal government does not directly regulate how public agencies conduct OD sur- veys, but the Federal Transit Administration (FTA) of the U.S. Department of Transporta- tion (U.S. DOT) offers guidance on the following federal laws and regulations that affect the frequency, content, and delivery of passenger surveys: â¢ Title VI Requirements: Title VI of the 1964 Civil Rights Act protects people from discrimi- nation based on race, color, and national origin in programs and activities receiving federal financial assistance. The FTA Circular 4702.1B: Title VI Requirements (FTA, 2012) outlines current Title VI guidance and regulations for transit providers. The circular provides guid- ance on the types of information that transit agencies should collect, including information through passenger surveys, to ensure that they are compliant with Title VI. â¢ Capital Investment Grants Program: This FTA discretionary grant program funds transit capital investments, including heavy rail, commuter rail, light rail, streetcars, and bus rapid transit. OD surveys are used to support transit ridership forecasting for both New Starts and Small Starts projects (https://www.transit.dot.gov/CIG), in addition to the preparation of before-and-after studies for completed projects. â¢ ADA/Accessibility Requirements: FTA Circular 4710.1: Americans with Disabilities Act: Guidance (FTA, 2015) outlines ways in which transit providers should accommodate people with disabilities. The circular includes guidance on how to accommodate disabilities in written and oral communication. C H A P T E R 2 Literature Review

10 Public Transit Rider OriginâDestination Survey Methods and Technologies Overview of Survey Methods Transit providers and metropolitan planning organizations (MPOs) use several methods for developing and conducting rider surveys. Historically, the most common survey practice has been to distribute a self-administered paper survey to riders on board transit vehicles or at transit stops. However, advances in technology have made other survey modes, such as elec- tronic tablets and online surveys, increasingly common. Research suggests that tablets have benefits such as reduced data cleaning requirements and higher data quality (Schmitt, 2012; Cummins et al., 2013; Agrawal et al., 2015). The following are the most common methods for conducting OD surveys: Self-Administered Written Questionnaires Self-administered paper-based surveys have historically been the most common method for conducting an OD rider survey (Schaller, 2005; AECOM, 2009; Schmitt, 2012). In this method, paper surveys are distributed to passengers on transit routes or stops/stations by either a sur- veyor or a driver. Respondents return the completed form either as they disembark or by mail. â¢ The primary benefit of this method is cost. With self-administered surveys, staff members can distribute and collect surveys en masse, reducing staffing needs and hours (Schaller, 2005). â¢ Though this method is efficient to administer, it can result in poorer data quality than other methods. The nature of the information requested on the survey can be complex (e.g., trip- making behavior) and respondents may misinterpret the questions that are being asked (Baltes, 2002; Schmitt, 2012). â¢ In the absence of a surveyor to clarify questions or automatic answer validation to screen out erroneous answers, questionnaire design and wording are especially important with a self- administered paper survey (Schaller, 2005). This is especially problematic when respondents enter incorrect origin and destination addresses that cannot be geocoded properly (Schmitt, 2012). The Pew Research Center (2018c) outlines best practices for survey question design and wording. Some recommendations include avoiding open-ended questions and limiting the number of response options. Pew notes that even small changes in question wording or response order can have a big impact on survey results (Pew Research Center, 2018c). â¢ Finally, surveyors who conduct a paper-based self-administered survey are not able to easily control who is invited to participate and cannot track responses in real time. With other survey methods, such as two-step surveys, surveyors can choose to oversample riders from certain population groups that historically are underrepresented in survey responses. Respondents with limited English proficiency or disabilities may lack the ability to complete a written form without assistance, which can leave members of these groups out of the pool of respondents entirely. Personal Interview and Written Questionnaires Interview-administered surveys are a strategy for overcoming the shortcomings of self- administered rider surveys (Schaller, 2005; Schmitt, 2012; Cummins et al., 2013). Using this method, a surveyor fills out a written survey questionnaire while walking the respondent through each question, providing clarification and guidance as needed. Interview methods can reduce misunderstanding of questions, increase response rates, and ensure that those with disabilities and limited English ability can participate (Schaller, 2005). This method is sometimes used in conjunction with self-administered surveys to accommodate users unable to fill out the survey on their own (Schaller, 2005). â¢ Interview-based written questionnaires are costlier to conduct than self-administered surveys because of the need for surveyors to spend time with every respondent (Schaller, 2005).

Literature Review 11 â¢ Privacy concerns may prevent individuals or groups from participating if they do not want to share personal information with a surveyor (Agrawal et al., 2015). â¢ In addition, interview-based surveys are affected by the biases of the interviewer. Whereas a self-administered survey can simply be passed out to everyone on a transit vehicle, a sampling technique needs to be employed to select participants to be interviewed to ensure that surveys are randomly distributed (McHugh et al., 2017). Personal Interview and Handheld Tablets In recent years many transit providers have opted to use handheld tablet devices to conduct on-board surveys with transit passengers. Under this method, surveyors engage in personal interviews with individual transit riders and fill out electronic survey instruments on handheld devices (Tablet PCs, PDAs, iPads, etc.) (AECOM, 2009). This method has been recognized by FTA as a âlegitimate and preferredâ method for collecting survey data, and some researchers have recommended utilizing tablet devices and personal interviews over self-administered survey techniques (Schmitt, 2012; Agrawal et al., 2015; McHugh et al., 2017). â¢ Interview-administered tablet surveys can have some cost benefits over other survey modes, especially in terms of printing, postage costs, and data cleaning costs (McHugh et al., 2017). â¢ In a paper outlining how the Tri-County Metropolitan Transportation District of Oregon (TriMet) migrated its survey approach from paper to tablets, McHugh et al. found that using tablets âdecreases time and cost significantly, generates more accurate and reliable data, improves customer relations, and is friendlier to the environmentâ (McHugh et al., 2017, p. 19). Tablets allow surveyors to integrate automatic validation of responses, monitor progress in real time, and reduce data entry errors, all contributing to higher data quality (Schmitt, 2012). â¢ There are downsides to a tablet-survey approach. Though many respondents may be recep- tive to surveyors using electronic devices, there may be generational differences in the level of comfort in such a process (Agrawal et al., 2015). Electronic interview surveys are affected by many of the same concerns affecting the paper interview method (e.g., privacy concerns, sample bias, quality of survey staff). In areas with large limited English proficiency (LEP) populations, passengers may be reluctant to answer personal questions about their daily activ- ities (Agrawal et al., 2015). Having participants fill out demographic and income questions themselves may alleviate privacy concerns (B. Dong, B. McHugh, and V. Shank, personal communication, April 2018). In reviewing literature on tablet-based surveys, it is important to note that the experience of the survey contractor and the design of the survey instrument greatly influence effective ness of tablet-based surveys. Agrawal et al. (2015), unlike other researchers, found that tablet-based surveys yielded lower response rates at a higher cost than self-administered paper surveys. These results could be due to a high software error rate and the fact that the survey used in the study did not incorporate features such as survey logic, automatic data validation, and OD mapping, features demonstrated in other cases to result in higher data quality (McHugh et al., 2017; B. Dong, B. McHugh, and V. Shank, personal communication, April 2018; S. Israel, tele- phone interviews, March 30 and June 15, 2018). Two-Step Methods A less common method of randomized rider surveys is the two-step method, in which survey teams recruit riders on board vehicles or at transit stations in order to take a second more detailed surveyâeither online or through a computer-aided telephone interview (CATI) (Schmitt, 2012; Cummins et al., 2013; Agrawal et al., 2015). In some cases, the first step includes

12 Public Transit Rider OriginâDestination Survey Methods and Technologies a basic questionnaire that can be filled out and returned to a surveyor. In others, the first step (card distribution) simply serves as an invitation to participate. For surveys that include a basic questionnaire as the first step, surveyors can use a respondentâs address, demographic, or income data to guide their sampling plan for the second step. Major two-step methods include: â¢ Web-Based Surveys: Many transit providers are attracted to online surveys because they are often seen as faster and cheaper to complete, compared to self-administered or interview- supported paper and tablet-based surveys. When examined for effectiveness, however, web- based surveys that aim to collect OD information have generated mixed results. One of the greatest concerns about using online and web-based survey tools is the issue of coverage error (Schaller, 2005; Spitz et al., 2006). Although almost 90 percent of adults report using the Internet, use is higher among younger, wealthier, highly educated urban popu- lations (Pew Research Center, 2018a). There is legitimate concern that the total population of transit riders may not be able to access the Internet on a regular basis and thus will not be able to participate in an online survey. But even for those who have regular access to the Internet, there is also a concern that the people who ultimately choose to complete such a survey may be a biased representation of the transit users (K. Cervenka, telephone interview, June 13, 2018). â¢ Computer-Aided Telephone Interview Surveys: In CATIs, surveyors review the responses in the first step, and then contact riders to ask more detailed questions about their trip. One of the great advantages of the CATI method is that the surveyor can contact riders in their pre- ferred language (Agrawal et al., 2015). The Los Angeles County Metropolitan Transportation Authority (LA Metro) found that CATIs yielded a higher response from LEP riders; over 35 percent of respondents chose the Spanish-language option, increasing participation from a population that is often underrepresented in public transit surveys (Schmitt, 2012). One concern with this method is determining whether those who get called and complete a ques- tionnaire are a sufficiently random representation of the population that is being studied. Choosing a Survey Method Research has shown that different survey methods are appropriate in different contexts (Schaller, 2005; Schmitt, 2012; Agrawal et al., 2015). The FTA has cited tablet surveys as a preferred method of data collection (McHugh et al., 2017) and interview-administered tablet surveys (when conducted correctly) yield higher quality data than self-administered surveys (Schmitt, 2012). Depending on the transit system and type of survey, however, other methods may be more appropriate (see Table 3 for key strengths and weaknesses). In selecting a method, organizations should consider factors such as cost, technical know-how and capacity of the contractor/managing department, sociodemographics of the ridership population, and survey content. An on-board survey is only as valuable as the quality of data that it provides on travel behav- ior. Survey approach and design can have a significant effect on data quality and it is critical that these surveys be designed to collect accurate, reliable, and useful data regardless of the survey instrument used. Response Rate Influences The response rate of a survey is generally defined as the ratio of the number of returned surveys to the number of questionnaires distributed, but the specific way this rate is calculated varies from one agency to another (Agrawal et al., 2015). In some instances, the response rate is calculated as the ratio of the number of surveys returned to the total number of passengers receiving the survey. In others, it is calculated as a ratio of the number of surveys returned to the total number of passengers approached, including those that refused to participate in the survey.

Literature Review 13 Under either of these measures, a high response rate increases the likelihood that the sample accurately reflects the actual transit riding population by indicating good coverage across a population and reduced nonresponse bias (Agrawal et al., 2015). Typical response rates can vary widely. According to Baltes (2002), a good response rate has historically been between 20 and 40 percent for on-board transit surveys. In the 2005 TCRP synthesis of intercept passenger surveys, transit providers reported response rates of between 33 and 67 percent (Schaller, 2005) while the survey conducted in this synthesis found an average response rate of 49 percent (this figure varied from 3 percent to 96 percent). A range of factors influence the response rate of a given survey, and can account for this variability, including design, environment, and demographics. Method Strengths Weaknesses Self-administered paper surveys â¢ Most contractors are familiar with this method â¢ Can quickly distribute surveys to many participants â¢ Low cost per survey distributed â¢ Survey length does not directly correspond with staff time per complete survey; still has an indirect impact as longer surveys lower response rates, requiring more fielding of responses â¢ Allows rider to complete survey on their own time; may be appropriate for high-traffic locations (e.g., subway stations) where riders do not have the time to participate in an interview â¢ Poor data quality can result in high cost per usable survey response â¢ Lower response rate (surveys received/surveys distributed) â¢ Greater potential for question misinterpretation â¢ Persons with limited English ability or literacy may struggle to complete the survey â¢ Susceptible to sample and completion biases â¢ Inability to track survey responses and data quality in real time Interview- administered tablet surveys â¢ Better data quality â¢ Ability to incorporate auto- validation and skip logic into survey instrument â¢ Allows for tracking of responses in real time; can identify systematic biases or sampling issues more easily â¢ Streamlined data entry and data cleaning â¢ Interview approach can more flexibly accommodate users with low literacy or limited English- speaking ability â¢ May result in higher costs due to increase in staff hours to administer survey â¢ Some users may be uncomfortable sharing personal details in an interview setting â¢ Challenging to conduct in a crowded setting or place where user has limited time to participate in an interview Interview- administered paper surveys â¢ Can be conducted alongside self- administered surveys to accommodate persons with certain disabilities, limited English proficiency, or limited literacy â¢ Does not benefit from the automation available in tablet surveys Two-step surveys â¢ Effective in sampling short trips because full survey is completed later â¢ Two-step method allows for targeted sampling of underrepresented groups â¢ Can accommodate a wider range of languages than feasible with an in-person interview â¢ Participants may not accurately recall past trip â¢ Distribution of survey during first step can amplify sampling bias â¢ Lack of online access or phone may hinder response rate Table 3. Key strengths and weaknesses associated with common survey methods.

14 Public Transit Rider OriginâDestination Survey Methods and Technologies Design Factors Affecting Response Rate The design of a survey and the method by which it is conducted can have a large influence on its response rate. â¢ Survey Method: As discussed above, survey methods can have a significant impact on response rate, with interview-based approaches frequently yielding a higher response rate than self- administered methods (Schmitt, 2012). Certain survey methods can also make it difficult to ensure an adequate rate of response from people with disabilities. In a self-administered method, a physically or visually disabled transit rider may be unable to complete the survey (Schaller, 2005). Methods that employ personal interviews, CATI responses, or online survey tools that are programmed to be compatible with screen reader technologies can overcome this obstacle. â¢ Surveyors: The use of surveyors, and their level of enthusiasm, diligence, and language abil- ity, can also make a difference in response rates (Schaller, 2005). Multiple studies have found that using surveyors instead of a box to collect completed survey questionnaires increases the response rate (Schaller, 2005; Memarian et al., 2012). Transit providers have also reported that the quality of the surveyors hired was a significant factor in generating higher response rates (Schaller, 2005). Those that have staffed their survey teams with students have often found them to be effective (McHugh et al., 2017; Schaller, 2005), and one study found that female surveyors produced better response rates than their male counterparts (Memarian et al., 2012). In addition, if an individual with limited English proficiency is the subject of a survey, the surveyorâs ability to speak their native language would increase the likelihood of a response. â¢ Incentives: Incentives offered to encourage participation have also been shown to increase response rates (Baltes, 2002; Schaller, 2005; Memarian et al., 2012; Schmitt, 2012). In past surveys, such incentives have ranged from small giveaway items, such as pens, to free short- term transit passes, to entry into a drawing for a larger monetary prize or longer-term passes. One study suggested that the use of monetary prizes was more effective than other forms (Memarian et al., 2012). â¢ Frequency: The frequency with which on-board surveys are administered can have an influ- ence as well. Transit providers have reported that administering surveys on a somewhat regular basis can improve response rates because customers come to expect them and learn that completing them can lead to meaningful changes in service and quality (Schaller, 2005). Surveying riders too frequently can have the opposite effect, as survey fatigue reduces respon- siveness. Survey fatigue is challenging in OD surveys as the same rider may be approached multiple times during the study period. â¢ Length: Finally, the number of questions included in a survey make a difference in the overall response rate (Spitz, 2006; Cherrington, 2007; Memarian et al., 2012). Surveys that are too long can cause a respondent to never return a survey instrument, or not begin a survey at all. Though it can be a challenge to limit the number of questions, survey questionnaires should be designed to request only essential information related to trip characteristics and demo- graphic data (Baltes, 2002; Cherrington, 2007; Memarian et al., 2012). Environmental Factors Affecting Response Rate Environmental characteristics of transit trips can also make it hard to collect responses in all desired situations: â¢ Crowdedness: If a survey is meant to be entirely completed on board a vehicle, this can lead to lower response rates. On a crowded, moving bus, riders may find it difficult to fill out a written form, and surveyors may find it difficult to conduct interviews for lack of privacy (Agrawal et al., 2015). Combined, these characteristics make it easier to conduct a survey during non-peak hours, which creates concerns about sample bias (Agrawal et al., 2015).

Literature Review 15 â¢ Short Trips: Short transit trips are notably challenging to survey; respondents may simply not have enough time to complete an interview or a self-administered questionnaire (AECOM, 2009; Agrawal et al., 2015). Some providers have used auxiliary data collection to better target short trips (AECOM, 2009) such as two-step surveys or the option to complete the survey at another time. â¢ Round Trips: Transit riders on round-trip journeys may be less inclined to complete a survey if they were already surveyed on their inbound leg. This leads to lower response rates for out- bound trips, notably trips taken during the evening peak period. Transit providers have tried to circumvent this by allowing riders to denote on a survey that they plan to take the exact same trip in the opposite direction later in the day (COTA, 2014). â¢ Season: Another environmental factor that could affect response rates is the time of year in which a survey is completed. Fall and spring are generally thought to have more typical travel patterns, whereas travel during the summer and winter may be irregularâsummer is a popular vacation time and a recent study by Silver et al. (2016) concludes that avoiding winter months may be good practice if the survey aims to describe normal, weekday travel behavior. Note that seasonal effects on ridership differ between transit systems; for example, in warm climates such as South Florida, weather is not a major factor in the winter, but tourists visiting for spring break may disrupt typical transit travel patterns in the spring. Demographic Factors Response rates may differ between different demographic groups, affecting the quality of a survey sample (Baltes, 2002; Schaller, 2005). Some factors influencing response rates related to demographics include: â¢ Transit Mode: Schaller (2005) found that response rates tended to be higher for express bus, light rail, and commuter rail riders, which are modes that are more likely to have higher income riders of working age. These results implied that local-bus riders, who tend to have lower average incomes, may be systematically underrepresented. In addition, younger riders are often undercounted in transit surveys because many transit providers do not ask riders under a certain age to complete questionnaires (Neff and Pham, 2007). â¢ Limited English-Proficient Populations: One of the most difficult groups from which to ensure a high response rate are individuals with limited English proficiency (LEP). There are several reasons why those with LEP would choose not to participate in an on-board survey, including inability to understand the questions on a questionnaire, a lack of multi- lingual surveyors, and, in the case of surveys that use tablets, the desire to avoid having their behavior tracked by an electronic system (Schaller, 2005; Agrawal et al., 2015). To overcome a low response rate from this group, providing survey material in the most commonly spoken languages in the areas where the survey is being conducted, and providing multi- lingual surveyors can be a big help (Schaller, 2005). In addition, employing a two-step method with a multi lingual CATI approach has proven to be an effective way to increase responses from individuals with LEP (Schmitt, 2012). Completion Rate Influences Whereas response rate is a measure that allows the agency to determine whether the returned surveys represent the transit-riding public, completion rate is a measure that shows if those responses can be used. Some determine completion rates by calculating the ratio of completed surveys to returned surveys, whereas others use a ratio of completed surveys to total passengers approached (Agrawal et al., 2015).

16 Public Transit Rider OriginâDestination Survey Methods and Technologies The definition of a âusableâ survey response varies from agency to agency, and depends highly on the expressed goal of the survey. For most OD surveys, the essential information includes questions on trip characteristics and demographics (Cherrington, 2007). According to Schaller (2005), most transit providers determine if a survey is complete by either requiring that a certain percentage of questions are answered or that certain key questions are answered. For those that used a percentage-based determination in that study, the figure ranged from 25 to 90 percent, but most had a threshold over 50 percent (Schaller, 2005). In a synthesis report on OD survey methods for the Florida Department of Transportation in 2012, Schmitt argues that for a response to be considered âusable,â the following data are essential: 1. Production/attraction zone (origin/destination), 2. Boarding/alighting location, 3. Access/egress mode, 4. Purpose from/to, 5. Key demographic attributes, and 6. Route sequence. Completion rates are influenced by many factors that can inspire a respondent to provide answers to all questions presented. These include: the length of the survey, question non- response, question misinterpretation, and the method used to complete the survey. Survey Length A lengthy questionnaire can prevent someone from completely answering enough ques- tions to make their response usable. This is especially the case for respondents complet- ing short trips where they do not have time to complete a long form or a lengthy interview (Cherrington, 2007). Question Nonresponse Respondents may systematically skip certain survey questions (Agrawal et al., 2015). The reasons for this have been explored in a large body of research outside of transportation, and include cognitive and motivational factors (Beatty and Herrmann, 2002), issues of interpreta- tion and comprehension (Moore et al., 1999), and perceived issues of confidentiality (Singer et al., 1992). Factors influencing question nonresponse include questions about personal infor- mation and whether the survey is being conducted in person. â¢ Personal Information: A persistent challenge for on-board surveys is questions that ask respondents to reveal personal information, such as income, race or ethnicity, and age (Cherrington, 2007; Agrawal et al., 2015; Lor et al., 2017). There is some evidence that respondents are more willing to answer these questions on a written, self-administered survey than in a personal interview, especially those having to do with income (Agrawal et al., 2015). Some researchers have found that providing closed-ended income categories may increase responses to these questions, instead of providing an open-ended answer option (Moore, 2006; Schmitt, 2012; Agrawal et al., 2015). One study suggested that survey designers take one step further and create income categories that are artificially low to encourage low-income riders to respond (Schmitt, 2012). â¢ In-Person Interviews: Lor et al. (2017) found that providing an in-person rationale for why personal questions on race and ethnicity, income, and age are being asked can increase the likelihood that they will be answered. This was especially true for questions concern- ing income. When personal interviews are used, respondents have also expressed more

Literature Review 17 willingness to answer questions on race and ethnicity if they are interviewed by someone of a similar racial or ethnic background (Lor et al., 2017). Question Misinterpretation The design and wording of survey questions has a big impact on completion rates. On self- administered surveys, question design is particularly important because there is little opportu- nity to ask for clarification from survey staff (Agrawal et al., 2015). In general, survey designers would benefit from following best practices that encourage paying close attention to ques- tion wording that might prove confusing to respondents (Tierney et al., 1996; Baltes, 2002; Cherrington, 2007). Words with ambiguous meanings, double barreled questions (i.e., two questions posed together but only one set of response options provided), double negatives, hypothetical situations, and acronyms should be avoided (Baltes, 2002). Survey pre-tests are rec- ommended to ensure instructions are clear and questions are easy to understand (Baltes, 2002). â¢ Origin/Destination: Respondents to on-board surveys have historically misinterpreted ques- tions regarding origin and destination, route sequencing, and one-way trips, which can lead to significant measurement error (Schmitt, 2012). Origin and destination reporting is par- ticularly important for building a profile of rider behavior. To overcome potential errors in reporting for surveys without automatic validation (i.e., survey software determines that address is a real and valid location), Cherrington (2007) recommends that a survey should ask three variations of origin and destination location information in tandem to create redundan- cies in the answer: full address, cross streets (or a nearest corner), and the name of the place or building. â¢ Route Sequencing: The literature suggests that questions on route sequencing are especially challenging for riders. When asked to list all route numbers that will be used on a one-way trip, the respondents may reply by providing all possible routes that they could have used (Schmitt, 2012). Respondents may also misinterpret the meaning of a one-way trip, which is evident in circumstances when both the origin and destination of a trip are reported as âhomeâ (Schmitt, 2012). Graphic and written explanations can help clarify questions for sur- vey respondents (Schmitt, 2012). Survey Sampling Plan In some smaller transit providers, it is logistically possible to complete a census of all riders, but most transit providers must create a sampling plan to solicit survey responses from a representative slice of the population, or a sample. To identify a representative sample, survey designers must: 1. Understand their target population, study population, and sampling frame; 2. Choose a sampling method; 3. Determine the correct sample size; and 4. Create a sampling plan that manages or minimizes expected sample bias. Target Population, Study Population, and Sample Frame The first step in creating a sampling plan is to identify the target population, study popula- tion, and sample frame (see Figure 2). Defining these three factors will affect what a survey is measuring: â¢ The target population, or theoretical population, for any survey is the population of interest to the research. For surveys that examine transit trip-making behavior, the targets of the survey are trips utilizing transit, not transit riders (Baltes, 2002; Schaller,

18 Public Transit Rider OriginâDestination Survey Methods and Technologies 2005; Cherrington, 2007). If the expressed purpose of an on-board survey is to collect atti- tudinal data from riders, then the target population would be transit riders (Baltes, 2002; Cherrington, 2007). The main difference between these two is that surveys that aim to collect information on trips may end up collecting multiple responses from one rider, whereas a survey that focuses on riders will likely only seek one response per rider. In circumstances in which a survey is meant to gather both types of information, it is up to the survey designer to determine what the emphasis of the survey should be and create a survey method that avoids the double counting of attitudinal data. This is often accomplished by asking riders to fill out a new survey questionnaire for each trip, but to omit answers to attitudinal questions if they have already filled them out once (Schaller, 2005). â¢ The study population is a subset of the target population that can be reached to conduct the survey. For on-board transit surveys, the target and study populations are usually the same, because all riders can theoretically be reached while riding in a transit vehicle. In 2005, Schaller determined that most providers that were surveyed in the study defined their study population to be âall ridersâ of their transit system, whereas others identified riders within a specific geographic area, riders on certain routes, or riders during certain time periods to be their study populations. â¢ The sampling frame is a complete listing of all specific items in the identified study popula- tion. For on-board transit surveys, the frame usually consists of all customers that are on the bus or rail routes that are being studied (Schaller, 2005). A frame can be defined generally as all riders on a given route, or further defined by days of the week, time of day, and direction of travel (Schaller, 2005). Sampling Methods Agencies have used several different methods to identify a sample of riders that will be approached to complete a survey. Generally, sample methods can be grouped into two catego- ries: probability, or random sampling, and nonprobability, or nonrandom sampling (Baltes, 2002). Though it is impossible to eliminate all sources of bias through a sampling plan, the stan- dard practice in transit surveys is to use probability sampling methods because they are more likely to achieve a representative sample of riders (Baltes, 2002; Schaller, 2005; Cherrington, 2007). The three methods of probability sampling (see Figure 3) are as follows: â¢ Simple Random Sample: Under a simple random sample, each trip surveyed is selected by chance from the larger population, and the probability of selecting an individual within a sampling frame is the same (Baltes, 2002; Cherrington, 2007). Though a simple random sample is attractive from a methodological standpoint by maintaining the probability of any one trip being selected for inclusion, this method is rarely used for on-board surveys because it can be difficult to field (Schaller, 2005; Cherrington, 2007). If bus routes, for instance, are selected completely at random, survey staff may spend a large amount of time traveling from one route to another in a system (Schaller, 2005). Target Population All weekday transit trips Sample Population All weekday transit trips taken by riders over the age of 16 years Sample Frame Transit trips taken on the particular vehicles being surveyed Figure 2. Example of how target population, sample population, and sample frame may differ in an originâdestination study.

Literature Review 19 â¢ Stratified Random Sample: In a stratified random sample, the sample frame is divided in homogeneous subsets, or strata, and then random sampling techniques are used to select individuals from within those strata (Baltes, 2002; Schaller, 2005; Cherrington, 2007). Transit operators often stratify a survey sample by mode, route, bus or train run, time of day, or service type, and then select participants at random from each of those groups (Cherrington, 2007). This method ensures that specific subgroups of transit trips are captured. â¢ Systematic Sample: A systematic sample, otherwise known as an interval sample, selects every Nth trip within the sample frame to participate in a survey (Schaller, 2005; Cherrington, 2007). Though the selection of individual trips to be included in the sample is not completely random, each trip has an equal likelihood of being selected (Cherrington, 2007). This method is particularly useful for survey methods that utilize personal interviews. Since it is impossible to interview every rider on a given route, this method maintains randomness and ensures that surveyors do not simply approach those who seem friendly or similar in appearance to them (Schaller, 2005). Sample Size and Error Determining the sample size needed for a survey depends on the objectives of the survey, the target population size, and the level of precision desired in the responses (Baltes, 2002; Schaller, 2005). Any survey that relies on a sample will feature some level of error compared to the popu- lation as a whole (Baltes, 2002). The difference between a true, but unknown, value for an entire population and the observed value gathered from a sample is known as sampling error (Baltes, 2002; Schaller, 2005). The sample size required for a specific survey is highly dependent on the level of sample error that can be tolerated at specific confidence intervals. Confidence intervals rep- resent the percentage of time that the survey designer is willing to be correct, and are usually set at 95 percent for social science surveys (Baltes, 2002). This would mean that, 95 percent of the time that a survey is completed, the results from a truly randomized sample are expected to fall within certain percentage ranges. The sample size requirements are dictated by the degree of precision desired; for example, an OD survey with valid results on a route-by-route basis will require a larger sample than one reporting only on systemwide statistics. In developing the sampling plan, organizations must balance the competing interests of managing sample size (and therefore cost) and achieving a high level of precision for survey results. There are a variety of resources for guidance on statistical methodology and concepts such as sample error and confidence intervals, including several listed on the webpage of the Federal Committee on Statistical Methodology (https://nces.ed.gov/FCSM/policies.asp). Simple Random Sample Surveyors are placed at each subway station within a system and hand out surveys to riders entering the system Stratified Sample A team of surveyors spends each day surveying a different bus route until the entire sample frame is surveyed Systematic Sample Every fifth rider boarding a vehicle is asked to participate in an interview-led survey Figure 3. Examples of simple, stratified, and systematic samples in originâdestination surveys.

20 Public Transit Rider OriginâDestination Survey Methods and Technologies Sample Bias Systematic errors created by a sampling method are referred to as sample bias (Baltes, 2002). Whether or not bias is present in a sample is hard to quantify, because a 100-percent census of all transit trips in a system is almost never available for comparison. Even though biases are found to some degree in all surveys, transit providers should try to minimize them when designing sampling procedures (Baltes, 2002). The three most common sources of sample bias in on-board transit surveys are noncoverage bias, nonresponse bias, and self-selection bias (Baltes, 2002; Schaller, 2005). â¢ Noncoverage Bias: Noncoverage bias is present when a sampling plan does not include all subsets of a study population. For on-board transit surveys, this type of bias would be present if trips taken by certain groups of individuals are not included in the sample frame (Baltes, 2002; Schaller, 2005). Examples of such bias include a sampling frame that excludes persons under age 16 from participating or days of the week when certain route patterns operate. â¢ Nonresponse Bias: Nonresponse bias can be present when a portion of the study population that possesses certain traits does not respond to a survey, causing the responses collected to be significantly different from those that would have been collected from the full population (Baltes, 2002; Schaller, 2005). This type of bias causes the results of a survey unrepresentative of the study population, since certain traits of trips reported (e.g., length of trip, mode used, sociodemographic characteristics of rider) may be disproportionately present. Though it is nearly impossible to calculate the error from nonresponse, it can be evaluated by comparing characteristics of the respondents with those of the total population to find whether certain groups were left out (Schaller, 2005). â¢ Self-Selection Bias: Survey responses almost always include some level of self-selection bias, which happens when individuals voluntarily choose to respond, or not respond, to a survey questionnaire on their own (Baltes, 2002). This is more likely to happen with surveys that do not have sampling plans that aim to target specific individuals to participate, such as methods that invite all transit riders on a bus to participate, online surveys open to all transit riders, or mail-in surveys sent to all transit riders (Baltes, 2002). In these cases, participants who have a strong inclination to respond will often self-select to participate, whereas those that do not have such an inclination may not. Survey Expansion Methods Because most OD surveys include only a sample of riders, survey results must be expanded to reflect the total rider population. As discussed earlier, response rates differ based on a range of factors such as the route being surveyed, demographics of respondents, and length of trip. Applying an expansion factor to each response allows a survey to partially control for these dif- ferent response rates. The following section outlines a variety of expansion techniques outlined in the literature. Boardings-Based Expansion Method The most common method for expanding survey results is by multiplying the number of responses by the ratio of responses to boardings. As described by Schmitt (2012), such expansion factors are frequently developed across three dimensions: route, direction, and time of day. A bidirectional route broken into an AM peak, midday, and PM peak period would therefore have six separate expansion factors based on ridership within each directional subperiod. Boardings are commonly determined based on data from fare boxes, automatic passenger counters (APCs), or manual ride checks.

Literature Review 21 Some transit providers choose to include location along the route as a fourth expansion factor dimension (Schmitt, 2012). Phoenixâs Valley Metro 2010â2011 on-board survey broke routes with 4,000 daily boardings or more into segments for expansion purposes. The segments included multiple stops and were delineated to ensure expansion factors did not exceed 40 for any given segment (ETC Institute, 2011). For agencies such as Valley Metro, accurate counts of stop-level boardings, alightings, and passenger load are needed to determine the correct expan- sion factor. Other Approaches to Survey Expansion Boarding-based expansion factors help to control for differences in response rates by time of day, route, or location, but fail to account for survey bias based on other factors such as demo- graphics or income. There is limited literature on the development or use of more sophisti- cated expansion factors for transit surveys specifically. The Transportation Research Board in 2007 released a research needs statement titled âExpansion Factors for Transit Survey Responsesâ (TRB, 2007) which called for further research to be done on the topic. Examples do exist of providers applying other practices to more accurately expand survey results, notably in the use of more sophisticated control counts than purely ridership, the use of synthetic records, and the iterative proportional fitting (IPF) method of weighting responses. â¢ Control Counts: To establish an improved basis for expansion, providers look to various data sources to establish control counts that can be used to determine whether any groups of riders are overrepresented or underrepresented in the sample. The most common type of control count is to collect data on rider unlinked trips (i.e., where a rider boards and alights). This can be done by either conducting a separate on/off survey or through data generated by automatic fare collection systems. The latter is only viable in systems where rider entries and exits are recorded. Schmitt (2012) highlights innovative methods that providers use to establish control counts. For example, Tri-Rail in Southern Florida found that riders who drove and parked at its stations had a higher response rate than those who were dropped off or used another mode, such as a bus or walking. To account for this, Tri-Rail conducted manual counts of access/egress mode to inform expansion factors. Other methods men- tioned in the paper include establishing control with fare data to ensure control for any response bias by fare media used and boarding (Schmitt, 2012). â¢ Synthetic Records: One method identified in the literature is the use of synthetic records, the practice of creating a record of trips to fill gaps in the OD data. In its 2013 on-board survey, Central Ohio Transit Authority (COTA) created separate expansion factors based on route, direction, time of day, and route segment. It compared the survey records to APC data and identified any period, direction, or segments where APC data existed without a corresponding survey record. For such instances, a âsyntheticâ record was added to survey data to account for identified gaps (COTA, 2014, p. A-6). â¢ Iterative Proportional Fitting: The COTA survey also utilized IPF to create its expansion factors. IPF is a procedure used to adjust sample data across two dimensions such as board- ings and alightings (see Figure 4). Agencies such as COTA utilize IPF to expand segment-to- segment survey response totals for a particular route to equal the boardings and alightings recorded by APCs. IPF is used to correct for response bias based on distance because short trips frequently have fewer proportional survey responses (Schmitt, 2012). Transforming Results to Linked Trips FTA suggests that OD studies include a conversion of survey results into linked trips (K. Cervenka, telephone interview, June 13, 2018). A linked trip is a trip from origin to destination on a transit system and can include multiple routes and transfers. Survey data are

22 Public Transit Rider OriginâDestination Survey Methods and Technologies expanded based on unlinked trips, which counts each boarding as a separate trip regardless of transfers. To transform unlinked trips to linked trips, each expansion weight needs to be divided by the number of unlinked segments that the respondent rode during their trip. For example, if a survey response includes two transfers and three routes, the expansion factor applied to that response should be divided by three. If a survey response includes no transfers, it should be divided by one. To check for transfer bias in the survey sample, survey results can be validated through a linked trip decomposition test. This test calculates the number of expected trips per route based on linked trip data and compares it to actual ridership. A large difference between the predicted and actual value means that transfer trips are either being undercounted or over- counted (K. Cervenka, telephone interview, June 13, 2018). Survey Expansion and Sample Quality Survey expansion can help reduce response bias but ultimately cannot address all systematic forms of response bias. As outlined by Schmitt (2012), having better data on rider characteristics before a survey is conducted allows providers to develop better sampling plans. Harnessing Passive Data Collection âPassive dataâ refers to data collected for other purposes, which may nevertheless be used for transit network planning. These data are often collected automatically, such as with APCs, or using entirely separate processes such as cellular phone location records. This contrasts with âactiveâ methods such as surveys. New passive data collection technologies are being used to generate new sources of travel behavior data. These new data sources are used in a variety of contexts, but have been more commonly applied to highway-related studies. Improvements to technologies, and to services running on those technologies, have resulted in increasing avail- ability of passive data, which have value to transit planning in addition to highway planning. Data collected passively present a trade-off in terms of detail and population size with active data. Whereas active data allow researchers to know many aspects of a small population, passive data show fewer aspects of a large population. In active data collection, planners design a survey to capture specific answers to questions they are asking: the survey responses contain every piece of behavioral information, and the responses are weighted to represent the popu- lation of interest. At the same time, on-board transit surveys can suffer from nonresponse biases and small samples. Efforts to increase response rates and/or sample size are expensive. Passive data differ in that researchers and planners cannot generally design the data collec- tion: the specific collection technology defines both the data collected and the population from Original Data (Surveys Collected) Weighted Data Alightings # Alightings 40 35 25 40 35 25 Boardings # 20 6 6 3 Boardings 20 10.32 6.63 3.05 30 8 10 10 30 11.80 9.47 8.72 35 9 10 9 35 15.19 10.84 8.98 15 3 14 8 15 2.69 8.06 4.24 Source: Hunsinger (2008). Reproduced with permission. Figure 4. Example of two-dimensional iterative proportional fitting.

Literature Review 23 which the data are collected. As a result, passive data sets tend to be âincompleteâ in the sense that not every question can be answered from a single product. Automated fare collection cards collect key details of every trip made on a transit system but cannot independently reveal trip purpose or details about the traveler. It is left to the researcher to identify and merge other data sets that may provide more information, or to develop data processing algorithms to infer missing elements. It is impossible to describe the usefulness of passive data to on-board survey methods, then, without considering the technologies used to collect, process, and distribute them. There are generally two classes of passive data that are relevant to collecting information about transit riders: data collected by transit providers and data collected by outside parties. Both are discussed. Data Collected by Transit Providers There are several sources of passive data collected directly by transit agencies through their normal operations. Many agencies are leveraging these data for planning and reporting in addi- tion to their initial purposes. â¢ Electronic Fare Collection Systems: Transit agencies who use EFC collect boarding data every time a passenger pays a fare. Given sufficient fare-card penetration and fare system design, it is possible to reconstruct complete trip and tour records for passengers in the system. An extensive summary of early research in this area is available from Pelletier et al. (2010), and algorithms for building tours from boarding data are presented by Seaborn et al. (2009), Wang et al. (2011), and Ma et al. (2013). Efforts to join EFC data to socioeconomic data such as income and age include home-based matching to Census data or probabilistic linking of trips to surveys (Kusakabe and Asakura, 2014). Chapter 4 of this synthesis includes a case example of MBTAâs use of fare-card data to estimate boarding, transfer, and alighting patterns of riders. â¢ Mobile Ticketing Systems: Many transit agencies are developing mobile applications that can conceivably collect information about the ultimate origin and destination of a participat- ing passengerâs trips rather than simply where the passenger entered and exited the transit system. The applications could also provide more information about incidental activities between transfers, such as shopping or other errands around transfer locations, information that is not available in EFC data. TCRP Synthesis 125 (Okunieff, 2017) on these applications does not report that any agencies are using the data in this way, though Rahman et al. (2016) execute a planning exercise using such data. â¢ Mobile and Desktop Trip Planners: Many transit providers maintain their own branded mobile applications that provide schedule and route information as well as real-time track- ing for the next bus or train. In the same way that an agency might use a mobile ticketing system to collect information about riders, so might an application that provides real-time tracking information. OD data could also be harvested from open trip planners embedded on agency websites using tools such as 1-Click (Cambridge Systematics) and OpenTripPlanner (http://www.opentripplanner.org), although desktop and mobile data may need to be cross- referenced to see if researched trips are ever made. Data Collected by Outside Parties There are also sources of passive data collected by commercial entities and nonprofit organizations. â¢ OriginâDestination Matrices: There are commercially marketed OD data products derived from cellular phone or mobile device location data. Cellular data can have high penetra- tion rates but large geographic tolerances; mobile device data may be more precise but the

24 Public Transit Rider OriginâDestination Survey Methods and Technologies penetration rate is lower and, in many cases, undisclosed. At present, neither type of data is likely to provide statistics on the number of people taking transit between points in a system. What the data can do, however, is identify total demand between points within the region, divided by time period. This can inform activities such as bus network redesign and long- range scenario planning. â¢ Mobile Applications and GTFS-ride/GTFS-flex: Mobile applications that take advantage of General Transit Feed Specification (GTFS) real-time data, such as Citymapper (https:// citymapper.com), OneBusAway (https://onebusaway.org), and Transit app (https:// transitapp.com), could aggregate observed positional data and report route-level OD data to agencies or a public data clearinghouse. With all of these applications, it would be easiest to share observed demand between app developers and transit providers using a common, shared data standard. One such standardâGTFS-ride (https://www.gtfs-ride.org)âis being developed in partnership between the Oregon Department of Transportation (ODOT) and Oregon State. It is an open, fixed-route transit ridership data standard that allows for improved ridership data collection, storing, sharing, reporting, and analysis. Another TRB-sponsored data standard under development for flexible-route service is the General Transit Feed Speci- fication Flex (GTFS-flex), which could expand GTFS-ride to cover flexible services as well (DemandTrans Solutions, 2018; Center for Urban Transportation Research, 1996). The Future of Passive Data Passive data could provide transit planners with information on travel behavior at more regular intervals, and at lower costs, than active data collection can. The trade-off, though, is that the technologies collecting data passively can change rapidly. Users of passive data therefore must be able to adapt efficiently and think critically about who the data are rep- resenting, what they are showing, and who is missing. The other big passive data challenge is that it yields different data types and results than active data collection. Passive data will almost always show just a few aspects of a large sample, whereas active data explain the exact breadth of information about a population that the surveyor is interested in, albeit a small sample. Multiple sets of passive data can be merged using innovative techniques to under- stand more about a population observed through passive data, but there is little to no research or guidance available on how Title VI requirements might be satisfied in this way. Further research along these lines would allow more transit operators to embrace passive data in their planning processes.

Next: Chapter 3 - Current State of Practice »

Public Transit Rider Origin–Destination Survey Methods and Technologies (2019)

Chapter: Chapter 2 - Literature Review

Welcome to OpenBook!

Get Email Updates