Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 63

63
CHAPTER 9
APC Sampling Needs and National Transit
Database Passenger-Miles Estimates
Two common questions regarding APC system design are values requires a far greater sample size than estimating mean
how much accuracy is needed from APCs and how many APC values, which is an argument favoring instrumenting the
units are needed to obtain an adequate sample size. Answer- entire fleet with APCs, a course being pursued by Tri-Met.
ing those questions requires facing a related question: when
an agency generates measures such as peak load, passenger-
9.2 Accuracy and Sample Size
miles, and route boardings from APC data, how timely and
Needed for Passenger-Miles
precise must those estimates be?
All U.S. agencies receiving federal assistance and operating
in urban areas are required to report annual systemwide
9.1 Sample Size and Fleet
passenger-miles by mode to the NTD. Traditionally, these esti-
Penetration Needed
mates are made from a sample of manually counted ons and
for Load Monitoring
offs. Agencies can use a standard sampling and estimation pro-
According to the Transit Data Collection Design Manual (16), cedure that requires on-off counts on 549 or more trips (39),
the passenger-countbased statistic requiring the greatest pre- or they can use any other sampling method that achieves a pre-
cision is average peak load on heavy-demand routes, a mea- cision of ±10% or smaller at the 95% confidence level. Because
sure used to adjust headway. A reasonable target precision to manual on-off counts are labor intensive, there is a natural
ensure that the route is neither overcrowded nor overserved is desire to find less burdensome measurement and estimation
5% or 6%, effectively limiting permissible load bias on crowded methods, including using APC-generated counts (40).
segments to about 5%. One factor in using counts measured by APCs is the accu-
Sample size needed to achieve this target precision depends racy of the counts themselves, which, as the previous chapter
on the bias and cv of load estimates. Fleet penetration needed, shows, depends not only on sensor accuracy but also on data
in turn, depends on the number of daily trips on the route- processing techniques used for parsing, screening, and balanc-
direction-period being analyzed, the data recovery rate, and ing. The second factor is having an adequate sample size. The
how the instrumented fleet is distributed. Fleet penetration of two factors are related; the less accurate the counts, the larger a
10% will afford about 20r observations per quarter for a route- sample is needed. This section deals with that accuracy/sample
direction-period with five trips per day, where r is the data size trade-off.
recovery rate. (For example, if r = 80%, then 20r = 16 observa- For all but the smallest transit agencies, sampling require-
tions that would be obtained.) If needed, greater sample sizes ments for NTD passenger-miles reporting are considerably
can be achieved by simply concentrating equipped vehicles on less demanding than are other uses of the data such as moni-
heavy-demand routes, at the expense of low-demand routes, toring load or boardings by route, because the NTD precision
for which less precision in load estimates is needed. requirement is only applied to a whole year's sample aggre-
We have posited elsewhere that APCs make possible a more gated systemwide. Therefore, meeting the NTD require-
precise method of scheduling and service quality monitoring ment should be easy for almost any transit system with APCs.
focused on extreme values of load rather than mean values. However, because the NTD requires that alternative sampling
Extreme values reflect the impacts of load variability and methods be statistically justified, the following section
service regularity as well as frequency and better reflect the examines passenger-miles sampling and estimation with
quality of service as felt by passengers. Estimating extreme APCs in detail.

OCR for page 63

64
9.2.1 Standard Error Targets in the Presence 9.2.2 Sample Size and
of Bias Coverage Requirements
Let The determination of sample size requirements assumes
three stages of sampling: in stage 1 all routes are selected; in
Y = mean passenger-miles per trip
stage 2, for each route, certain timetable trips are selected; and
b = relative bias in the passenger-miles estimate
in stage 3, for each selected timetable trip, certain days are
_ (b = bias/Y ) observed. The assumed cv's for trip-level passenger-miles at
y = estimated mean passenger-miles
stages 2 and 3 are:
se = standard error of the passenger-miles estimate
rse = se/Y = relative standard error cv 2 = 0.9 = cv oftimetable trip means (within route)
The precision specification can be interpreted as: cv 3 = 0.3 = cv ofdaily passenger-miles
(within a given timetable trip)
P ( y - 0.1Y Y + 0.1Y )
= P (Y - 0.1Y y Y + 0.1Y ) 0.95 (5) The assumed values are conservative estimates based on
experience with data from many transit agencies. The values
] = Y
Subtracting E[y (1 + b) and then dividing by se, reflect the fact that, for a given route, most variation in trip-
level passenger-miles is due to differences in where trips fall
- 0.1Y - bY y - Y - bY 0.1Y - bY within the timetable (peak/off-peak, inbound/outbound),
P 0.95
se se se rather than random differences between days. Sample size
requirements derived in this section are based on the week-
By the Central Limit Theorem, the middle term approaches day sample only; the addition of weekends, sampled with the
a standard normal variate as sample size increases; therefore, same degree of fleet penetration as on weekdays, will improve
using the notation () = cumulative standard normal distri- precision, although not by much.
bution, the precision requirement becomes The effective penetration rate (f3) is defined as the expected
fraction of the daily schedule observed each day. It is the
- 0.1 - b - 0.1 - b product of fleet penetration rate and data recovery rate.
- 0.95 (6)
rse rse
From relation 6, selected values of permitted relative stan- Covering Every Weekday Trip
dard error for a given value of relative bias are shown in With an effective fleet penetration rate as small as 1% and
Table 11. For manual data collection, assumed bias-free, the careful rotation, every weekday timetable trip can be observed
permitted relative standard error is 0.051; with 8% relative at least once per year. The annual estimate is determined by
bias, the permitted relative standard error falls to 0.012. To calculating average passenger-miles for each timetable trip,
be safe, a transit agency would do well to limit the permissible expanding by number of days that trip was operated, and
bias in passenger-miles or load to less than 8%. summing over all timetable trips. Stratifying to this level is a
very effective estimation technique because it eliminates the
Table 11. Relative standard effect of variability between timetable trips. The weekday
error required versus sample size requirement is
measurement bias.
n max ( N 2 , ( 0.3 rse )
2
) (7)
Measurement Permitted Relative
Bias* Standard Error*
where N2 equals the number of weekday timetable trips and
0.00 0.0510
rse is the permitted relative standard error from Table 11. For
0.01 0.0500
bias up to 8% and for all but the smallest transit systems, the
0.02 0.0471
0.03 0.0423
N2 term will control; that is, it is sufficient to simply observe
0.04 0.0365 every timetable trip once.
0.05 0.0304
0.06 0.0243
Covering Most Weekday Trips
0.07 0.0182
(Two-Stage Sampling)
0.08 0.0122
0.09 0.0061 Logistics and data recovery problems can frustrate plans
*Relative to mean passenger-miles per trip to observe every weekday timetable trip. The following plan

OCR for page 63

65
1.00
8% bias, 1% eff. penetration
8% bias, 4% eff. penetration
5% bias, 1% eff. penetration
0.95
5% bias, 4% eff. penetration
2% bias, 1% eff. penetration
0.90
0.85
0.80
0 200 400 600 800 1000 1200
Number of Trips in Weekday Timetable
Figure 20. Timetable coverage rate required versus timetable size.
assumes that only a percentage ( f2) of the timetable trips because lower coverage rates suggest poor logistical manage-
is covered. The estimation procedure is to get an average ment with likely sampling biases (e.g., whole routes being
for each timetable trip that was observed, determine the missed or seriously undersampled). With 5% bias, 85% cover-
route average (per trip), expand each route average by the age is sufficient even for an agency with only 200 trips in the
number of trips operated per year, and then sum over all weekday timetable and 1% effective penetration. Only smaller
routes. systems with moderate to large bias will need greater timetable
The relative standard error of the estimate is given by coverage or effective penetration.
cv 22 cv 2
rse 2 = (1 - f 2 ) + 3 (8) 9.2.3 Intentional Sampling
f2 N 2 Df 3 N 2
The recommended estimation procedures just described
where D is the number of weekdays in the year (about 252). involve unintentional sampling--the APCs collect data all
For all but the smallest transit systems, the third term will be year long, and the agency just rolls it up. This approach
insignificant, and the size of the relative standard error will assumes that instrumented buses, for reasons beyond NTD
depend mostly on f2. passenger-miles estimation, are being circulated in a manner
Using equation 8, Figure 20 shows the required timetable that covers the entire schedule regularly. Intentional sampling
coverage f2 versus the number of trips in the timetable (N2) methods with limited sample sizes are clearly inferior, unless
for selected values of bias and effective penetration rate. data processing procedures are still so undeveloped that each
Degree of coverage is restricted to values of 85% or greater, trip's data must be manually checked.