Read "Improving ADA Complementary Paratransit Demand Estimation" at NAP.edu

« Previous: Chapter 3 - Preliminary Data Analysis

Page 35

Suggested Citation:"Chapter 4 - Model Development." National Academies of Sciences, Engineering, and Medicine. 2007. Improving ADA Complementary Paratransit Demand Estimation. Washington, DC: The National Academies Press. doi: 10.17226/23146.

Page 36

Page 37

Page 38

Page 39

Page 40

Page 41

Page 42

Page 43

Page 44

Page 45

Page 46

Page 47

Page 48

Page 49

Page 50

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

The modeling process requires a choice of mathematical forms to be used in the regression procedure. This section describes the advantages and disadvantages of several mathematical forms, leading to a recommendation for the most promising ones. The results of the regression analysis are then presented. A model that predicts annual ADA paratransit trips per capita pro- duces the best results. Appropriate Mathematical Forms We have explored two principal types of regression models: Linear: (1) Trips = a + b (Population) + c (Factor 1) + d (Factor 2) + . . . Logarithmic: (2) log(Trips) = a + b(log(Population)) + c(log(Factor 1)) + d(log(Factor 2)) + . . . In this equation, âlog(Trips)â represents taking the logarithm of trips. The logarithmic form is equivalent to a multiplicative form as follows: (3) Trips = a Ã (Population)b Ã (Factor 1)c Ã (Factor 2)d Ã . . . Both linear and logarithmic forms can also be used for models that predict not total trips, but trips per capita: Linear per capita: (4) Trips/Population = a + b (Factor 1) + c (Factor 2) + . . . Logarithmic per capita: (5) log(Trips/Population) = a + b(log(Factor 1)) + c(log(Factor 2)) + . . . The logarithmic-per-capita form is equivalent to: (6) Trips/Population = a Ã (Factor 1)b Ã (Factor 2)c Ã . . . Notice that (3) and (6) are equivalent except for the exponent b on the Population term in (3). If b = 1 in equation (3), then the two forms are exactly equivalent. It is also possible to have mixed forms, in which some of the factors on the right-hand side appear without logarithms. Before proceeding to presentation of model specifics, some discussion of these possible forms is provided. 18 C H A P T E R 4 Model Development

Exhibit 4-1. ADA trips and service area population (linear). Model Development 19 Linear Model of Trips Of the possible models, the linear form with trips (1) was eliminated. The enormous spread of values for many variables would give undue influence to a handful of cases. For example, Exhibit 4-1 shows service area population and total ADA paratransit trips for the 28 representative sys- tems. Fitting a line to these points would essentially just connect New York City to the clump of other systems to the lower left. The position of the line would be almost entirely due to the values for New York and would tell us nothing about differences among the other systems. Eliminating New York from the analysis, in addition to losing a valuable data point, would only partially solve the problem, since a number of other large population areas would still exert undue influence. This linear form is undesirable for other reasons as well. For example, suppose a model is produced similar to this one: Trips = 0.47 Ã Population â10,000 Ã Fare The value of 10,000 for the fare term is chosen purely for the sake of illustration. This equa- tion would say that raising fares by $1.00 would reduce ridership by 10,000 trips per year regard- less of the size of the service area or the initial ridership level. But 10,000 trips would be close to a 100% change in the smallest representative systems and less than a 1% change in the four largest systems. The impact would also be same regardless of whether the fare was raised from $0.50 to $1.50 (i.e., tripled) or from $3.00 to $4.00 (a 33% increase). Logarithmic Model of Trips The logarithmic model form has many advantages. One advantage is that the logarithmic transformation reduces the problem of extreme variation and extreme values in the data. For example, Exhibit 4-2 shows the same trip and population data just presented in Exhibit 4-1, except using a logarithmic scale that is equivalent to graphing log (trips) against log (popula- tion). There are still a few extreme cases, but the problem is much reduced. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.0 Service Area Population (Millions) A nn ua l A D A P ar at ra ns it Tr ip s (M illi on s) 10.08.06.04.02.0

Exhibit 4-2. ADA trips and service area population (logarithmic). The other major advantage of the logarithmic form is that the impacts of coefficients are, in effect, percentage increases or decreases in trips. Using the example of fare again, suppose a model is produced similar to this one: Log(Trips) = 0.9 Ã log(Population) â 0.5 Ã log(Fare) This would mean that a 10% increase in fare would result in approximately a 5% reduction in ridership. (More precisely, ridership would fall by 1 â (1.1)â0.5 or 4.7%.) This percentage change would apply equally regardless of the initial ridership level, the service area population, or the initial fare level. In other words, the model would mean that the elasticity of ridership with respect to fare is â0.5. Either the ânatural logarithmâ or a base-10 logarithm can be used, but the natural logarithm is usual because it is easier to prove the interpretation of coefficients as elasticities that way. Per Capita Models Transforming trips and some other variables to per capita form is also useful. Exhibits 4-3 and 4-4 illustrate this using the example of total ADA paratransit trips and revenue vehicle miles of fixed-route service. Both of these variables have one extreme case, a lot of cases clus- tered at lower values, and a handful of cases that are less extreme but still very different from the small systems. Plotting per-capita versions of both variables creates a much clearer view. There is still a noticeable difference between small systems, large systems, and the one very large system, but the differences are much less extreme and variation among the systems is now much easier to see. Note that population variables such as the number of people age 65 and older and the number of people in poverty become percentages of population when expressed in per-capita formâfor example, the percentage of the population with incomes below the poverty line. 20 Improving ADA Complementary Paratransit Demand Estimation 0.01 0.1 1 10 0.01 Service Area Population (Millions) A nn ua l A D A P ar at ra ns it Tr ip s (M illi on s) 1010.1

Model Development 21 Some variables, such as fare, do not need to be put in per-capita form. Models that predict ADA trips per capita would probably have no population term, for example, (purely for illustration) a model could take the form: ADA Trips per Capita = 0.6 Ã RVM per Capita â 0.2 Base Fare This would say that a $1.00 fare increase would cause trips to fall by 0.2 trips per capita. This is much more reasonable than the simple linear form, since it adjusts for the population of the Exhibit 4-4. ADA trips per capita and revenue vehicle miles per capita. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 Annual Revenue Vehicle Miles (Millions) A nn ua l A D A P ar at ra ns it Tr ip s (M illi on s) 100 200 300 400 500 Exhibit 4-3. ADA trips and revenue vehicle miles. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 10 20 30 40 50 60 Annual Revenue Vehicle Miles per Capita A nn ua l A D A P ar at ra ns it Tr ip s pe r C ap ita

service area. However, it produces the same impact regardless of the initial ridership level. Since initial ridership levels among the representative systems range from a low of 0.08 trips per capita to a high of 1.86 trips per capita, this would still be very awkward. The model would probably predict negative ridership for possible fare changes at some systems. Logarithmic per Capita Models Using logarithms of per-capita data combines the advantages of both of these transformations. Exhibit 4-5 illustrates this using the same RVM data and paratransit trip data as in the previous two exhibits. Compared with the pure linear form or the linear per-capita form, analysis using logarithms of per-capita data greatly reduces the problems of extreme values or skewed distri- butions of values. As in the pure logarithmic form, the coefficients indicate percentage changes and can be interpreted as elasticities. This type of model assumes that, all other things being equal, ADA paratransit trips per capita are about the same regardless of population size. In other words, holding constant factors such as fares, intensity of transit service, and income levels, ADA paratransit trips are about proportional to service area population. This cannot be assumed, but needs to be tested by analysis of the data. Testing Possible Models The considerations described in the preceding section lead to the following plan of attack for testing possible models: 1. Estimate a model using logarithms of total values (i.e., a model of total ADA paratransit trips per year). 2. If the model results show that ADA paratransit ridership is proportional to population as long as other factors are held constant, then estimate a model using logarithms of per-capita values (i.e., a model of annual ADA paratransit trips per capita). 3. As a fall-back, consider a model of per-capita values without logarithms. 22 Improving ADA Complementary Paratransit Demand Estimation 0.01 0.1 1 10 100101 Annual Revenue Vehicle Miles per Capita A nn ua l A D A P ar at ra ns it Tr ip s pe r C ap ita Exhibit 4-5. ADA trips per capita and revenue vehicle miles per capita (Logarithmic).

Model Development 23 Logarithmic Model of Total ADA Paratransit Trips The preliminary data analysis identified a number of possible problems with variables, including some variables that clearly cannot be present together in a model. However, only a handful of variables were initially excluded from consideration: â¢ Separate age variables by sex were not used because they are so highly correlated. Instead, percent of population age 65 or older and percent of population age 75 or older were tested. â¢ Average fare per passenger (i.e., total ADA paratransit fare revenue divided by total ADA para- transit passengers) was replaced by base fare (the full cash fare for an ADA paratransit trip before any discounts for advance purchase or use of a monthly pass, and before adding any zone charges). Average fare per passenger would clearly be preferable, since many systems have monthly passes, discounts for pre-paid tickets, or zone charges, and these could have a big impact on ridership. However, after preliminary model testing indicated possible problems with the fare revenue data, further investigation determined that a number of systems had provided only cash fare revenue, leaving out revenue from pass or ticket sales. Attempts to correct this were only partially successful. At least one system known to have missing fare revenue has still not provided the requested data. Other cases of missing data are suspected and would require further investigation before average fare per passenger could be used with confidence. â¢ Percent of housing units with no vehicle available was not used because, as described earlier, it acts essentially as an indicator variable for New York City. This remains true even with trans- formation by logarithms. â¢ Data with missing values as described previously. Also, even in a model of total ADA paratransit trips, it is necessary to express some variables in per-capita form. Without this transformation, all variables related to size of the area would be highly correlated including total population, population age 65 and older, population below the poverty line, fixed-route fleet size, fixed-route RVM, and so on. Therefore, only one of these vari- ables can be included as an explanatory variable and the rest would need to be expressed in per- capita form (or percent of population). Total population, population age 65 and older, popula- tion age 75 and older were all tested as candidates for inclusion as totals. The disability variables can only be used as percentages since any one alone is insufficient and they cannot be added together. Percentage variables were tested using logarithms and without. The candidate variables were tested using stepwise regression (forward and backward), with the data issues kept in mind for examination of difficulties that could come of this procedure. Some ini- tial experiments led to examination of the data, which discovered coding or other errors that were then corrected. Where the stepwise method produced models with closely correlated variables, each variable was tested separately to determine which was better. These experiments converged on the candidate model shown in Exhibit 4-6. None of the other candidate variables are significant if added to this model. The variables are listed in approximate descending order of statistical significance, with the most significant variables first. Examining the regression results shows that â¢ All of the coefficients are significant at better than 95% confidence level. â¢ The model has excellent goodness of fit as measured by R Squared, with 96% of variation in total ADA paratransit explained. â¢ None of the variables are highly correlated with each other. Note that because three variables are not in logarithmic form, their coefficients cannot be interpreted directly as elasticities. (This is explained further below.) Examining the results for each variable we see â¢ The significant constant gives the predicted number of trips when all the other variables are set to zero. Because of the logarithm form, this would occur when population = 1, base fare = $1,

percent found conditionally eligible = 0%, conditional trip screening is not used, and effective window = 1 minute. This has no practical meaning. â¢ The estimated coefficient for total population is very close to 1.0, meaning that if other factors are kept constant, ADA paratransit trips are proportional to total population. This indicates that a per-capita model of trips should be tested. â¢ The model gives a fare elasticity of â0.76. This value may indicate that paratransit trip mak- ing is more sensitive to fares than is general transit ridership. In the survey of practitioners conducted for this research, fares were rated quite low as a factor that influences demand. Low sensitivity to fares would be expected in paratransit systems that are capacity constrained, since only the most necessary trips would be made and only trips for which no other alterna- tives were available. However, in paratransit systems without capacity constraints, a greater level of fare sensitivity would be expected given the general low income of people with dis- abilities and the relatively high fares that characterize many paratransit systems. Because the model uses cross-sectional data, the estimated elasticity shows long-term effects that would be expected to be greater than short-term effects estimated by commonly cited fare elastici- ties. Note that base fare and poverty rate have a weak negative correlation, so leaving out either one would cause biased results. â¢ Trips decrease with the percent of applicants found conditionally ADA eligible, which accords with expectations. The coefficient of â1.373 is equivalent to an elasticity of â0.29 at the mean value of the percentage of applicants found conditionally eligible at the representative systems. â¢ Conditional trip screening reduces paratransit usage. The coefficient indicates that systems that use conditional trip screening have 48% less ridership2 than systems that do not use conditional trip screening. Given experience in the field, it is extremely unlikely that systems with condi- tional trip screening are actually screening out 48% of trip requests based on conditions of eligibility. In fact, respondents to the survey of practitioners ranked trip-by-trip eligibility screen- ing lower than many other variables as a factor influencing demand. However, it is possible that riders reduce their requests based on the conditions they have been given or based on experi- 24 Improving ADA Complementary Paratransit Demand Estimation Regression 1 Dependent Variable: Log of Total ADA Paratransit Trips Unstandardized Coefficients Standard Error t-Statistic Probability* 0.0092.879 1.2603.628(Constant) Log of Service Area Population 0.000-4.1940.181-0.759Log of Base Fare Percent found conditionally eligible/100a -1.373 Conditional Trip Screening Percent below Poverty/100b -6.686 0.014-2.6850.265-0.712Log of Effective Window 0.957R Squared Standard Error of the Estimate * Probability that the estimated coefficient is due to chance. a The coefficient implies an elasticity of â0.29 at the mean observed conditional eligibility percentage. b The coefficient implies an elasticity of â0.90 at the mean observed poverty rate. 0.000 13.512 0.073 0.984 0.002-3.4740.395 0.002-3.5430.186-0.658 0.002-3.5021.909 0.450 Exhibit 4-6. Logarithmic model of total ADA paratransit trips. 2 Calculated as 1 â eâ.658.

Exhibit 4-7. Logarithmic model of total ADA paratransit trips with revenue vehicle miles. Model Development 25 ences when they have requested trips and been turned down for trip-specific eligibility reasons. It is also likely that systems that use conditional trip screening also have more rigorous eligibil- ity screening practices in general in ways not captured by the percentage of applicants found fully or conditionally eligible. Note that there is no significant correlation between use of conditional trip screening and any of the eligibility outcome variables. (This is desirable, since it means that both variables can be included in the model without difficulty.) As noted earlier, conditional trip screening is weakly correlated with average fare per trip (0.40 correlation) and with the per- centage of the population age 65 or older or age 75 or older (0.45 to 0.46 correlation). â¢ The coefficients indicate that trip making decreases at higher poverty rates. It might have been expected that lower income would be reflected as lack of access to other modes and therefore higher paratransit usage. However, the variable in question is total area-wide poverty rate, not the rate of poverty among people with disabilities. In general, people with higher incomes travel more than people with lower incomes. It is also likely that communities with higher poverty rates will have fewer available activities that generate travel than more affluent com- munities. The coefficient of â6.686 for poverty rate is equivalent to an elasticity of â0.90 at the mean value of poverty rate for the representative systems. â¢ Longer effective windows for defining on-time pick-ups reduce trip making. The direction of this effect is as expected, although its strength is surprising compared with expectations of the practitioners surveyed for this research. Note that the measured on-time percentage was not found to be significant. This may be explained by the great variation in methods for measuring on-time performance, which result in very accurate data for some systems and less accurate data for others. A number of variables did not prove significant and are not included in the model. These included the percentage of the population with various disabilities, the percentage age 65 and older or age 75 and older, and various measures of the availability of fixed-route transit service. One measure of transit service that was very nearly significant was RVM per capita. Because of its implications for further research, the regression with this variable included is shown in Exhibit 4-7. The other variables have coefficients very similar to those in Regression 1. The coefficient for RVM per capita has a âprobabilityâ of 0.066, meaning there is a 6.6% chance that the true value could be zero. (This is equivalent to a 93.4% significance level.) The positive coefficient of 0.374 means that Regression 2 Dependent Variable: Log of Total ADA Paratransit Trips Unstandardized Coefficients Standard Error t-Statistic Probability 0.017 0.000 2.601 12.244 1.210 0.075 0.372 0.185 1.795 0.252 3.147 0.923 (Constant) Log of Service Area Population 0.003 0.001 0.001 0.009 0.019 -3.343 -3.818 -2.897 -3.788 -2.562 0.185 -0.618 -0.537 -6.800 -0.645 Log of Base Fare Percent found conditionally eligible/100 -1.422 Conditional Trip Screening Percent below Poverty/100 Log of Effective Window 0.0661.9430.1930.374Log of RVM per captia 0.930 0.423 R Squared Standard Error of the Estimate

higher levels of fixed-route transit service correspond to higher levels of paratransit trip making. Fur- ther transformation of RVMâe.g., RVM per square mile per capitaâdid not produce significant results. Clearly, adding transit service does not increase paratransit usage. Instead, it is assumed that this variable is acting as an indicator of an areaâs general transit-oriented character, reflected in less dependence on private automobiles for travel. If a significant fraction of people are used to travel by public transportation, then they may be likely to turn to paratransit when they can no longer use conventional service. However, if nearly everyone is accustomed to drive for all of their trips and drives until they can no longer do so, then they may be unlikely to consider tran- sit or paratransit as a realistic alternative when they can no longer drive. The predictions of Regression 1 track observed values for the representative systems reason- ably well, as would be expected from the high R Squared values. Exhibit 4-8 shows observed and predicted ADA paratransit trips for each representative system for Regression 1. The systems are arranged in increasing order of population. (New York City is off the scale.) A key to the abbre- viations is provided in Exhibit 4-9. Predicted trips increase with population generally tracking the trend of observed trips. More interesting are the deviations from the general trend that are due to other factors such as fares, service quality, and demographics. With a small number of exceptions, the predictions track these âturning pointsâ generally deviating above and below the trend in the same way as the observations. A more challenging comparison uses observed and predicted trips per capita, which shows equally for small systems and large systems how well predictions match observations. Exhibit 4-10 shows this comparison. Here it is clear that the most difficult cases are the small systems with much higher than average paratransit trip making. Overall, the predications 26 Improving ADA Complementary Paratransit Demand Estimation 0 200 400 600 800 1000 1200 1400 1600 1800 2000 O TA BT Li nk JA UN T W TA BF T EC CT A LT D CA TA M VR TA Tu ls a CC CT A CN YR TA FA X FW TA H AR T SO RT A SM CT D R IP TA TR IM ET PA AC Ki ng UT A SC VT A R TD D AR T O CT A N YC A D A T rip s in th ou sa nd s Observed ADA Trips Predicted ADA Trips Exhibit 4-8. Observed and predicted trips: Regression 1. Note: Representative systems arranged in increasing order of population. (New York City is off the scale.)

Exhibit 4-10. Observed and predicted trips per capita: Regression 1. Model Development 27 OTA BT Link SORTA Southwest Ohio Regional Transit Authority San Mateo County Transit DistrictSMCTD JAUNT, Inc.JAUNT WTA BFT ECCTA Eastern Contra Costa Transit Authority (Tri-Delta Transit) PAAC LTD CATA Capital Area Transportation Authority (Lansing, Michigan) UTA MVRTA Merrimack Valley Regional Transit Authority SCVTA Santa Clara Valley Transportation Authority Tulsa RTD Regional Transportation District (Denver) CCCTA Central Contra Costa Transit Authority DART CNYRT A Central New York Regional Transportation Authority OCTA Orange County Transportation Authority FAX FWTA Fort Worth Transportation Authority HART Hillsborough Area Regional Transit RIPTA Rhode Island Public Transit Authority TRIMET Portland Tri-Met Port Authority of Allegheny County King King County Metro Transit Utah Transit Authority Dallas Area Rapid Transit NYC New York City Transit AuthorityFresno Area Express Metropolitan Tulsa Transit Authority Lane Transit District Ben Franklin Transit Whatcom Transportation Authority Link Transit Blacksburg Transit Ottumwa Transit Authority Exhibit 4-9. Representative system abbreviations. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 O TA BT Li nk JA UN T W TA BF T EC CT A LT D CA TA M VR TA Tu ls a CC CT A CN YR TA FA X FW TA H AR T SO RT A SM CT D R IP TA TR IM ET PA AC Ki ng UT A SC VT A R TD D AR T O CT A N YC A D A T rip s pe r C ap ita Observed Trips per Capita Predicted Trips per Capita Note: Representative systems arranged in increasing order of population.

Exhibit 4-11. Observed and predicted trips per capita: Regression 2. track observation reasonably well, giving good confidence in the equations as models of paratransit trip making on average. However, predictions for individual systems can vary considerably from actual experience. Exhibit 4-11 provides a similar comparison as Regres- sion 2 using the RVM per capita variable. Regression 2 gives a closer match for systems including Link Transit and New York City, but worse for others such as Ottumwa, What- com, King County, and Orange County. The coefficients for the raw percentage variables (poverty rate and conditional eligibility) turn out to have a simple interpretation, similar to elasticities when working with other variables. Each 1% increase in the poverty rate or the percentage conditionally eligible (e.g., from 5% to 6%) corresponds to a constant percentage drop in ADA paratransit ridership. The amount of the percentage drop is equal to 1 â eb, where âeâ is the base of the natural logarithms and âbâ is the coefficient. For example, in Regression 1 the coefficient for poverty rate is â6.686. Each 1 per- centage point increase in poverty rate corresponds to a 6.5% drop in ADA paratransit ridership, which can be calculated as 1 â eâ(0.01Ã6.686) = 0.065. The coefficient of â1.373 for conditional eligi- bility implies that each additional 1 percentage point increase in the conditional eligibility rate corresponds to a 1.4% drop in ADA paratransit ridership. Users do not need to be able to do this calculation; instead these factors can be supplied with the model. Logarithmic Model of ADA Paratransit Trips per Capita Since the total trip model showed that trip making is proportional to total population when other factors are held constant, it makes sense to test models of paratransit trips per capita. The entire set of candidate variables was tested as before. The procedure produced a model nearly 28 Improving ADA Complementary Paratransit Demand Estimation 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 O TA BT Li nk JA UN T W TA BF T EC CT A LT D CA TA M VR TA Tu ls a CC CT A CN YR TA FA X FW TA H AR T SO RT A SM CT D R IP TA TR IM ET PA AC Ki ng UT A SC VT A R TD D AR T O CT A N YC A D A T rip s pe r C ap ita Observed Trips per Capita Predicted Trips per Capita with RVM Note: Representative systems arranged in increasing order of population.

Exhibit 4-12. Model of trips per capita. Model Development 29 identical to the one shown before, with the exception that there is no population variable since the effect of total population is eliminated by expressing trips as trips per capita. The detailed results are shown as Regression 3 in Exhibit 4-12. As before, none of the other candidate variables are significant if added to this model. The vari- ables are listed in approximate descending order of statistical significance, with the most signifi- cant variables first. The coefficients are very similar to those of the total trip model, as they should be, and all of coefficients are highly significant. The coefficients are slightly more significant in the per-capita model than the total trip model (Regression 1). As before, the constant term, although statistically significant, has no practical meaning. Exhibits 4-13 and 4-14 show how predictions from Regressions 3 compare with observed values. Because most of the variation in trips has been removed by using trips per capita, it is to be expected that R Squared is somewhat lower than in the total trip model. However, the Standard Error of Estimate is lower in the per-capita model than in the total trip model. This statistic pro- vides an absolute measure of the unexplained variation. It indicates that the per-capita model (Regression 3) predicts trip making slightly better than the total trip model.3 It is recommended for use as the basis for a demand estimation tool. The unexplained variation in Regression 3 can also be stated in terms of percentage variation of trips per capita. Individual systems with observed trips per capita higher than the predicted value differ from the prediction by an average of 55%, while systems with observed trips per capita lower than the predicted value differ by 36% from the prediction.4 The âaccuracyâ of the model is from â16% to +19%. In statistical terms, this is a 95% confidence interval for the Regression 3 Dependent Variable: Log of Trips per Capita 0.0023.5640.9723.463(Constant) 0.000-4.612 -3.625 -3.668 -3.583 0.167 0.382 0.181 1.851 -0.772 -1.385 -0.662 -6.633 Log of Base Fare Percent found conditionally eligible/100a Percent below Poverty/100b 0.010 0.002 0.001 0.001 -2.8310.255-0.722Log of Effective Window 0.744 0.440 R Squared Standard Error of the Estimate * Probability that the estimated coefficient is due to chance. a The coefficient implies an elasticity of â0.29 at the mean observed conditional eligibility percentage. b The coefficient implies an elasticity of â0.90 at the mean observed poverty rate. t-Statistic Probability* Standard Error Unstandardized Coefficients 3 The unexplained variation in log(total trips) and log(trips per capita) can be compared directly since both of them represent proportions. 4 The Standard Error of the Estimate gives the mean unexplained variation in log(trips per capita), which is 0.440 in Regression 3. This is converted to a percentage variation using exp(0.440) = 1.55 or 55% for âhigh sideâ deviations. The âlow sideâ deviation is 1/1.55 = 0.64 or 36% less than 1.0.

30 Improving ADA Complementary Paratransit Demand Estimation 0 200 400 600 800 1000 1200 1400 1600 1800 2000 O TA BT Li nk JA UN T W TA BF T EC CT A LT D CA TA M VR TA Tu ls a CC CT A CN YR TA FA X FW TA H AR T SO RT A SM CT D R IP TA TR IM ET PA AC Ki ng UT A SC VT A R TD D AR T O CT A N YC A D A T rip s in th ou sa nd s Observed ADA Trips Predicted ADA Trips Exhibit 4-13. Observed and predicted trips: Regression 3. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 O TA BT Li nk JA UN T W TA BF T EC CT A LT D CA TA M VR TA Tu ls a CC CT A CN YR TA FA X FW TA H AR T SO RT A SM CT D R IP TA TR IM ET PA AC Ki ng UT A SC VT A R TD D AR T O CT A N YC A D A T rip s pe r C ap ita Observed ADA Trips per Capita Predicted ADA Trips per Capita Exhibit 4-14. Observed and predicted trips per capita: Regression 3. Note: Representative systems arranged in increasing order of population. Note: Representative systems arranged in increasing order of population.

position of the regression line at the point where the average values for all the explanatory terms are used.5 As for the total trip model, RVM per capita was nearly significant. Again, because of its impli- cations for further research, the regression with this variable included is shown in Exhibit 4-15. The other variables have coefficients very similar to those in Regression 3. The coefficient for RVM per capita has a âprobabilityâ of 0.11, meaning there is an 11% chance that the true value could be zero. The positive coefficient of 0.292 means that higher levels of fixed-route transit service correspond to higher levels of paratransit trip making. Sensitivity Analysis None of the variables in the model are highly correlated except with trips per capita. The strongest correlation is between base fare and percent below poverty, where there is a negative correlation. This correlation, at â0.42, is not so strong as to make the model unstable or unreli- able, but it does mean that leaving either variable out would result in a biased estimate of the coefficient for the remaining variable. In other words, since the two effects have a weak tendency to cancel each other out (higher poverty, which depresses ridership, often goes with lower fare which encourages ridership), leaving out poverty rate would result in too-low an estimate for the effect of fares. One of the surprising outcomes of the modeling process is that none of the age variables turns out to be significant. Since the percentage of older people and conditional trip screen- ing have what appears to be a chance correlation (from 0.45 to 0.46 depending on the choice of age variable), the possibility was tested that the conditional trip screening variable is pre- venting the age variables from remaining in the stepwise regression procedure. However, age turns out statistically insignificant even if conditional trip screening is removed from the list of candidate variables. Model Development 31 5 The 95% confidence interval for the prediction at the mean is Â±t.025 (Standard Error) / sqrt(n). Where t.025 is Studentâs t for a two-tailed test using 22 degrees of freedom, or 2.07. The 95% C.I. for predicted ln(trips per capita) is Â± (2.07)(0.440)/sqrt(28) = 0.172, and exp(0.172) = 1.19. Regression 4 Dependent Variable: Log of Trips per Capita Unstandardized Coefficients Standard Error t-Statistic Probability* 0.026004.21.0752.579(Constant) 0.000 0.001 0.004 0.001 0.010 -4.208 -3.947 -3.222 -3.682 -2.853 0.167 0.370 0.180 -0.701 -1.462 -0.581 -6.558 -0.701 Log of Base Fare Percent found conditionally eligible/100a Conditional Trip Screening Percent below Poverty/100b Log of Effective Window 0.1101.6680.175 0.246 1.781 0.292Log of RVM per capita 0.774 0.423 R Squared Standard Error of the Estimate * Probability that the estimated coefficient is due to chance. a The coefficient implies an elasticity of â0.31 at the mean observed conditional eligibility percentage. b The coefficient implies an elasticity of â0.89 at the mean observed poverty rate. Exhibit 4-15. Model of trips per capita with revenue vehicle miles.

The effect of eliminating certain systems from the regression was also tested. Since New York City is so different from all other systems, the regression was repeated with its data removed. This produces no significant change in the estimated coefficients. Removing the cases with the most extreme differences between predicted and observed trips per capita (JAUNT and Denver RTD) also results in no significant change in the estimated coefficients. Removing as many as four cases that have the strongest overall influence on the coefficients (as indicated by Cookâs Distance and leverage values) produces very little change in the estimated coefficients. All of the coefficients remain significant. In response to concerns about the strength of the poverty rate variable, the cases with the most extreme values for poverty rate were removed, again with no significant impact on estimated coefficients. The overall conclusion of the sensitivity analysis is that the regression results are highly stable and should be reliable, within the limited accuracy of the model, for predicting ridership at other systems. Also, the coefficient values can be considered meaningful as a basis for policy discus- sion and guiding further research. Exploratory Analysis of Hold Time Because nine of the representative systems could not provide quantitative measures of tele- phone hold time, it was not considered a candidate for inclusion in the model. However, there is strong reason to believe that long hold times do discourage ridership. To test this hypothesis, a model was developed including average hold time with imputed values for the missing data. The missing values were imputed using multiple imputation as implemented in the SOLAS soft- ware.6 In the multiple imputation procedure, values for the missing variable are estimated by regression on the other variables. A random error term is added to the imputed values based on the regression. Then a regression model of the variable to be explained (trips per capita in this case) is estimated on all the variables, including the imputed values. The process is repeated mul- tiple times with the random error terms chosen anew each time. The results of all of the regres- sions are combined, with variance of the estimated coefficients calculated using the estimated variance for each trial plus the variance among the individual trials. The multiple imputation method is considered superior to simply discarding the cases with missing average hold times because it preserves the information for all the other variables in the cases that only lack a value for average hold time. This is particularly important when there is some difference between the cases that have missing data and those that do not. In the case of average hold time, systems that did not provide this measure have somewhat higher poverty rates than systems that did and are somewhat less likely to use conditional trip screening. As a result, simply discarding the systems without hold time data would produce biased results. The multi- ple imputation procedure is specifically designed to avoid this difficulty. Also, compared with some simpler methods, it avoids the appearance of unrealistically high estimates of significance for the variable with imputed values. Exhibit 4-16 illustrates the concept, showing a dataset for which three values of an explanatory variable are missing. Four sets of imputed values are shown for the missing values. The imputed values retain the overall trend, but avoid creating the appearance of a clearer trend than can actu- ally be inferred from the available observations. Sample results are reported in Exhibit 4-17. (Repeating the procedure produces slightly dif- ferent results each time, but the results shown are typical.) The results of all the variables previously included are similar to those obtained before. For average hold time, the estimated 32 Improving ADA Complementary Paratransit Demand Estimation 6 http://www.statsol.ie/html/solas/solas_home.html, accessed on November 29, 2006.

coefficient of â0.264 is consistent with a negative effect of hold times on demand. However, the estimated value of Studentâs t, â1.626, corresponds to a probability of 0.119 that the estimated coefficient is due to chance (i.e., a âconfidence levelâ of only 88%). In the survey of practition- ers conducted for this research, âability to get through on the phone to reserve a rideâ was ranked very highly as a factor that influenced demand. Most likely, there is a strong effect due to hold times, and the lack of significance in the model is a result of the small data set that was available. Model Development 33 Value to be Predicted Ex pl an at or y Va ria bl e Observed Data Imputed Data Imputed Data Imputed Data Imputed Data Exhibit 4-16. Multiple imputation of missing values. t-Statistic Unstandardized Coefficients Standard Error Not reported3.649(Constant) -4.2080.163-0.684Log of Base Fare Percent found conditionally eligible/100 -1.393 0.381 -3.657 Conditional Trip Screening -0.630 0.174 -3.616 Percent below Poverty/100 -6.129 2.173 -2.821 Log of Effective Window -0.826 0.241 -3.424 Log of Average Hold Time -0.263 0.161 -1.626 0.804R Squared Standard Error of the Estimate 0.397 Exhibit 4-17. Model with imputed values for missing hold times.

Next: Chapter 5 - Long-Term Trends that May Affect ADA Paratransit Demand »

Improving ADA Complementary Paratransit Demand Estimation (2007)

Chapter: Chapter 4 - Model Development

Welcome to OpenBook!

Get Email Updates