Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 161

APPENDIX B
Sample Sizes, Sample Estimates,
and Confidence Intervals
This appendix outlines how to determine the sample size so that the confidence interval of a
sample estimate will be within a specified range and, when analyzing the results, how to estimate
values and confidence intervals for characteristics of the population. The determination of sam-
ple sizes and confidence intervals are provided for the four types of probability sampling, namely:
random, sequential, stratified, and cluster sampling. The description is only a brief summary of
the methods as applicable for airport surveys and the reader is encouraged to refer to statistical
texts listed in the bibliography for a more complete description.
Examples are then provided for applying these methods to determine sample sizes for small
population sizes such as employee or tenant surveys, and for passenger surveys using each of the
possible sampling strategies.
Throughout this section, it is assumed that the sample size will be large enough that the
sample means will be approximately Normally distributed, per the Central Limit Theorem.
Generally a sample size of at least 30 is considered to provide a reasonably good approxima-
tion to the Normal distribution. Most airport user surveys will have a much larger sample size
than this.
Random Sampling
Suppose we are interested in a characteristic of the population which we denote by X. For
example, X could be the variable, gender, where:
X = 0 if passenger is male
1 if passenger is female
Let the mean value of this characteristic be . In the example, is the proportion of the pas-
sengers that are female. Using random sampling the sample mean is an unbiased estimator for
the population mean, . If the sample size is n, the sample mean is:
X = xi n (1)
where X is the sample mean
xi is the ith individual in the sample, i = 1, . . . , n.
The 95% confidence interval for (the range that contains the true value of with a proba-
bility of 0.95) is given by:
X ± 1.96 X
B-1

OCR for page 161

B-2 Guidebook for Conducting Airport User Surveys
where X is the standard deviation of the sample mean. For random sampling without replace-
ment, it can be shown that:
X = X (1 - n N ) n
where X is the standard deviation of the variable, X
N is the number of individuals in the population.
Hence the 95% confidence interval for is given by:
X ± 1.96 X (1 - n N ) n (2)
For very large populations (1 n/N) is approximately one, and this reduces to:
X ± 1.96 X n
The constant, 1.96, is applicable for a 95% confidence interval. For a 99% confidence inter-
val, the constant should be 2.58, and for a 90% confidence interval, use 1.65.
If the sample size is to be chosen so that the sample mean is within w of the population mean,
, with a probability of 0.95, then the 95% confidence interval is given by:
X ± w (note that the width of the confidence in
nterval is 2w )
Thus, w = 1.96 X (1 - n N ) n
This can be solved to give the sample size, n:
1.962 X2
n= (3)
w 2 + 1.962 X 2
N
In the survey planning stage when trying to determine the sample size, the standard deviation
of the variable X, X, will be unknown and must be estimated based on data from previous sur-
veys, other similar airports, or knowledge of the airport.
For categorical variables, the standard deviation is related to the expected proportion of indi-
viduals in a category:
X = p (1 - p )
w = 1.96 p (1 - p )(1 - n N ) n
where p is the proportion of the population in the category of interest.
For the example above, where the categorical valuable is the respondent gender, assume that we
are interested in estimating the proportion of female passengers. Thus, p is the proportion of female
passengers. If we initially estimate this proportion to be 0.5, then for a sample size of n = 400 and
population size, N, of 50,000, the width of the 95% confidence interval, w, is ± 0.049. The sample
size can be chosen to obtain the required accuracy using Equation 3.
If X represents the mode of transport to the airport, and the category, taxi or limousine, has an
estimated proportion of 0.20, then the width of the 95% confidence interval, w, is ± 0.039 for a
sample size of n = 400 and population size of N = 50,000.
For a non-categorical variable, such as time in the terminal before departure or expenditure at
the concessions, the standard deviation is determined from the variation of the variable. Again, this

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-3
will be unknown at the planning stage of the survey. If an estimate of the standard deviation, X, can
be obtained from past surveys, other similar airports, or knowledge of the airport, the required sam-
ple size for a given confidence interval width can be determined in a similar way using Equation 2.
Once the survey has been completed, the standard deviation of X, X, can be estimated from
the sample values of X for use in determining the accuracy of the estimated population mean and
confidence intervals. The sample standard deviation is the square root of the sample variance,
2
sX , which is given by:
= ( xi - X ) (n - 1)
2
2
sX
where X is the sample average.
Sequential Sampling
Sequential sampling is equivalent to random sampling if the order of the sample with respect
to the characteristics of interest is essentially random. In this case, sample sizes and estimates of
standard deviation and confidence intervals are determined in the same way as outlined above
for random sampling.
Sequential sampling can result in a more representative sample, and thus narrower confidence
intervals, by ensuring a more even spread of sampled individuals over the population (provided
the cases where serious biases can occur are avoided31). Calculation of the confidence intervals
and required sample sizes are difficult and dependent on the relationships between the ordering
variable and the characteristics of the population of interest. In airport surveys, it can usually be
assumed that the width of the confidence interval is no wider than that which would occur with
random sampling and the expressions given for random sampling above can be used.
Stratified Sampling
With stratified sampling, the population is divided into groups, referred to as strata, and each
stratum is sampled separately using random or sequential sampling. Assume there are ns strata
and the proportion of the population in the ith strata is Wi. The sample mean for the ith stratum is:
X i = j =1 x ij ni
ni
(4)
and the sample mean for the population is estimated by:
X = i =1Wi X i
ns
(5)
where Xi is the sample mean of the stratum i
xij is the value of characteristic X for individual j in stratum i
ni is the number of individuals in stratum i
Wi is the proportion of the population in stratum i, Ni/N.
The standard deviation of the sample mean, X, is given by:
X = i W i2 X
2
i
31
Serious biases can occur if a characteristic of interest occurs in a cyclic order in the population list and the length
of each cycle corresponds to the sampling fraction, but this would be rare in airport surveys.

OCR for page 161

B-4 Guidebook for Conducting Airport User Surveys
where X
is the standard deviation of the mean for members of stratum i, X , and is given by:
i i
Xi = Xi (1 - ni N i ) ni
2
where Xi is the standard deviation of X for members of stratum i
ni and Ni are the sample and population sizes for stratum i, respectively.
Hence the standard deviation of the sample mean can be expressed as:
X = i Wi Xi (1 - ni N i ) ni
2 2
The accuracy of the sample estimate is improved if the variation within each stratum (mea-
sured by X ) is less than the variation over the whole population.
i
As above, the 95% confidence interval, 2w, that the sample mean is within w of the popula-
tion mean, , is given by:
w = 1.96 X (6)
With proportional stratified sampling, the sampling fraction is the same for each stratum
and equals the proportion of the population in the stratum, Wi. Thus,
ni N i = n N and Wi = ni n = N i N
where n and N are the total sample and population sizes, respectively.
In this case, the sample mean given by Equation 4 reduces to that for a random sample, Equa-
tion 1, and
X = {(1 - n N ) n ( W )}
i i
2
Xi (7)
To determine the required sample size, estimates of Xi can be determined in the survey plan-
ning stage. These must be estimated from results of previous surveys, other similar airports, or
knowledge of the airport. Once approximate estimates of Xi have been obtained, the total sam-
ple size, n, for a confidence interval of width 2w is then found using the relationship (from Equa-
tions 6 and 7):
w = 1.96 {(1 - n N ) n (W )} i i
2
Xi
which can be solved to give the sample size, n, as:
1.962 i (Wi Xi )
2
n= (8)
w 2 + 1.962 i (Wi Xi )
2
N
If separate estimates for each stratum cannot be obtained in the planning stage, an approxi-
mate estimate of the standard deviation of X over the total population could be used to provide
a conservative estimate of the sample size, n. For proportional stratified samples, the sample size
for each stratum, ni, is then found by:
ni = n Wi

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-5
For determining confidence intervals of the final estimates using the survey data, the standard
deviation of X in each strata, Xi , can be estimated by the sample standard deviation, sXi , deter-
mined from the sample values in each stratum. The 95% confidence interval is then:
X ± 1.96 {(1 - n N ) n (W s )}i i
2
Xi (9)
With non-proportional stratified sampling, the sampling fractions differ for each stratum.
The sample mean for each stratum and overall sample mean can be found using Equations 4 and
5, respectively.
The sample sizes, ni , are chosen so that a confidence interval of width 2w is given by the
relationship:
w = 1.96 i W i Xi (1 - ni N i ) ni
2 2
(10)
In this case, many different combinations of ni could be chosen to produce a confidence inter-
val width of 2w. The choice of ni is dependent on the reason for choosing to use non-proportional
stratified sampling. To achieve a similar level of accuracy in the results for each stratum, it will be
necessary to use non-proportional stratified sampling, with the sample size in each stratum
inversely proportional to the variance of the characteristic within that stratum. Thus, ni = k / Xi 2
,
where k is a constant. Substituting this into Equation 10 and solving for k gives:
k=
i W i2 ( Xi
2
2
)
w2 1.96 + ( i Wi 2 Xi
2 2
Ni )
The sample size for each stratum is then found using the relationship: ni = k / Xi
2
Because the actual variance in the characteristic in the survey responses for each stratum, Xi
2
,
will not be known until the survey has been performed, it will be necessary to make an initial
assumption of the differences in the variance across the strata in order to determine the propor-
tion of the survey responses to assign to each stratum. These assumptions can be based on the
results of prior surveys or of surveys performed at similar airports.
As before, confidence intervals for the final estimates can be determined using the survey data.
The standard deviation of X in each strata, Xi, can be estimated by the sample standard devia-
tion for each strata, sXi , determined from the sample values. The 95% confidence interval is then:
X ± 1.96 i Wi s Xi (1 - ni N i ) ni
2 2
(11)
Cluster Sampling
As described in Section 3.2 of the guidebook, with cluster sampling the population is divided
into clusters (or groups) and clusters rather than individual members of the population are sam-
pled. In single stage cluster sampling, all individuals in the cluster are sampled so a complete pic-
ture of the population with the sampled clusters is obtained. For example in a survey of departing
passengers, flights could be used as clusters and a sample of flights would then be chosen and all
passengers on those flights would be surveyed. If two-stage cluster sampling is used, individuals
within each cluster are also sampled.

OCR for page 161

B-6 Guidebook for Conducting Airport User Surveys
Let N be the numbers of clusters in the population
n be the numbers of clusters sampled
Mk be the number of individuals in the kth cluster in the population
mk be the number of individuals in the kth cluster included in the sample
xkj be the value of X for the jth individual in the kth cluster.
The average number of individuals per cluster is the total population size divided by the num-
ber of clusters:
M = k =1 M k N
N
(12)
The sample mean for the kth cluster is:
X k = j =1 x kj mk
mk
(13)
The population mean is estimated by:
X = k =1 ( M k M ) X k n
n
(14)
When the numbers of individuals in each cluster are equal, Mk = M , and the mean for the
population reduces to simply the average of the cluster means, Xk.
Where the clusters to be sampled are drawn randomly from the population of all clusters, and
a sample of individuals are drawn randomly from each cluster, the variance of the sample mean,
X, includes two components of variation, the between and within cluster components, and for a
categorical variable is estimated by:
X
2
= (1 - n N )
n c2 + (n N ) n
k =1
(1 - mk M k ) pk (1 - pk ) n (mk - 1)
2
(15)
Between cl
luster component Within cluster component
where p is the proportion of the total population in the category of interest
pk is the proportion of individuals in cluster k in the category of interest
c is the variance of the cluster means around the population mean and can be
estimated by:
c2 = k =1 ( X k - X ) (n - 1)
n 2
if all individuals in each selected cluster are sampled, mk = Mk, and the variance is given
by the "between cluster component" term only.
Clusters could be selected using stratified sampling to reduce the variance between clusters
within a given stratum and thus improve the accuracy of the estimate. For example, where the
clusters are flights, flights could be stratified into groups such as domestic and international and
short- and long-haul.
Assume that the clusters are stratified into ns strata and in each stratum, ni clusters are sampled.
The population mean is estimated by:
X = i =1 k =1 ( M ik M ) X ik n
ns ni
(16)
where ni is the numbers of clusters sampled in the ith stratum
Mik is the number of individuals in the kth cluster in the ith stratum
Xik is the mean of individuals in the kth cluster in the ith stratum
n is the total number of clusters sampled over all strata.

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-7
With proportional stratified sampling of clusters, the between cluster variance component of
Equation 15 becomes:
Var ( X c ) = (1 - n N ) n i =1Wi ci2
ns
(17)
where n is the number of clusters sampled (over all strata)
N is the number of clusters in the population (over all strata)
ci
2
is the variance of cluster means for clusters in stratum i
Wi is the proportion of clusters sampled that are in stratum i (= ni / n = Ni / N for
proportional stratified sampling).
Assuming that the clusters within each stratum have the same size, Mi, the same sample size,
mi, and the same proportion of individuals with the category of interest, pi, the within cluster
variance component becomes:
Var ( X c ) = i =1 k =1(ni N i )(1 - mi M i ) pi (1 - pi ) ni (mi - 1)
ns ni 2
= i =1 ni (ni N i )(1 - mi M i ) pi (1 - pi ) ni (mi - 1)
ns 2
= i =1 (1 N i )(1 - f ) pi (1 - pi ) (mi - 1)
ns
(18)
where f is the fraction of passengers sampled and is assumed to be constant in all clusters.
The calculation of the accuracy of estimates, required sample sizes, and confidence intervals
is complex and the reader is referred to Levy and Lemeshow, Sampling of Populations, and
Cochran, Sampling Techniques, listed in the bibliography. Comprehensive statistical software
programs exist which would be useful for analyzing data from cluster sampling.
It should be noted, however, that cluster sampling is less efficient that random, sequential, and
stratified sampling and larger sample sizes will be required to obtain the same levels of accuracy.
Use of the expressions applicable for random sampling will underestimate the true standard
errors of estimated population characteristics and the associated confidence intervals. Preferably,
clusters should be chosen so that variation in the characteristics of interest between clusters is
small, but within clusters is large.
In airport surveys, cluster sampling is commonly used for sampling of flights in departing
passenger surveys. For many characteristics such as trip duration, airfares, trip purpose, time at
airport, spending in airport concessions, and of course destination and sector, passengers on the
same flight will be more likely to have similar values of these characteristics than passengers in
general. This homogeneity of characteristics within a flight significantly reduces the efficiency of
cluster sampling for analyzing these characteristics.
There are relatively few air party characteristics that have a similar distribution across differ-
ent flights. For characteristics such as household size, the variation of the characteristics across
passengers on one flight is likely to be fairly similar to that of all passengers. In this case, use of
cluster sampling should not greatly reduce the sampling efficiency.
It can be shown that the variances of the estimates of mean of the characteristic of interest
for cluster sampling can be expressed, approximately, as a function of the variance for random
sampling:
1 + (mav - 1)
2
XC
2
= XR (19)
where X
2
is the variance using cluster sampling
C
X
2
is the variance using random sampling
R

OCR for page 161

B-8 Guidebook for Conducting Airport User Surveys
represents the population intra-class correlation
mav is the mean number of cases sampled per cluster.
The variance using cluster sampling will be greater than using random sampling unless either
mav = 1 or i 0. mav = 1 corresponds to the special case where each cluster consists of a single
case and is equivalent to random sampling. The intra-class correlation, , is a measure of homo-
geneity and if individuals in a cluster are more homogeneous than the population as a whole,
will be greater than zero.
/ 2
The ratio of the variances: X
2
C XR is often referred to as the design effect, DE, and is
given by:
DE = 1 + (mav - 1) (20)
The effective sample size is given by: mav n / DE.
Examples of Calculation of Sample Sizes
Small Population Size--Using Random or Sequential Sampling
for Categorical Variables
In this example, the sample size is required for a survey of a relatively small population, such
as an employee or tenant survey, using random sampling. The critical characteristics of the pop-
ulation being determined are categorical variables, e.g., percentage of employees accessing the
airport by private vehicle.
The sample sizes depend on the expected proportion in the category and the level of accuracy
desired. Sample sizes were determined using Equation 3 for a range of population sizes; for two
values of the expected proportion in the category: 50% and 20%; and for three levels of accuracy:
±5 percentage points, ±3 percentage points, and ±2 percentage points (the latter for the expected
proportion of 20% only). As discussed in Section 3.2 of the guidebook, an error of ±5 percent-
age points represents a percentage error in the category proportion of 10% for an expected pro-
portion of 50% and 25% for an expected proportion of 20%. As is evident in Table B-1, large
sample sizes are required if an accuracy of 10% is required for categories with low proportions
of the population.
Table B-1. Sample sizes required for categorical variable with expected proportions of 50%
and 20% and varying levels of accuracy.
Sample Size, n, Required for:
Expected Proportion in Category = 50% Expected Proportion in Category = 20%
Total
Number in Accuracy of ±5 Accuracy of ±3 Accuracy of ±5 Accuracy of ±3 Accuracy of ±2
Population N Percentage Points Percentage Points Percentage Points Percentage Points Percentage Points
(Equivalent to (Equivalent to (Equivalent to (Equivalent to (Equivalent to
±10% of Proportion) ±6% of Proportion) ±25% of Proportion) ±15% of Proportion) ±10% of Proportion)
50 44 48 42 47 48
100 80 92 71 87 94
200 132 168 110 155 177
500 218 340 165 290 377
1,000 280 515 200 405 605
5,000 360 880 235 600 1,175
Note: Assumes random sampling without replacement.

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-9
Table B-2. Assumed number of flights and average number
of originating passengers per flight by market sector and
day of week.
Sector of Flights
Day of Week Short-Haul Long-Haul
International Total
Domestic Domestic
Monday 60 20 8 88
Tuesday 60 20 8 88
Wednesday 60 20 8 88
Thursday 60 20 9 89
Friday 60 20 12 92
Saturday 50 16 12 78
Sunday 55 20 12 87
Total 405 136 69 610
Avg. Originating
50 120 170 79.2
Passengers/Flight
Passenger Survey--Using Random, Stratified, and Cluster Sampling
A survey of air passengers is to be undertaken to obtain information on airport access trips. A
critical question to be answered is (say): What is the percentage of passengers dropped off at the
curb outside departures check-in?
It was decided to determine the sample sizes for each of the different sampling types and chose
the most cost-effective method. It is known that the percentage dropped off at the curb varies
greatly by characteristics such as trip purpose, flight sector (e.g., short-haul domestic, long-haul
domestic, international), day of the week, time of day, etc. The trip purpose distribution of pas-
sengers is not known at the sampling stage and so could not be used. In addition to random sam-
pling, stratified sampling with passengers stratified by flight sector or day of the week, and cluster
sampling with flights stratified by flight sector were examined.
The survey is planned to be conducted during a two-week period.
The flight schedule is obtained from the Official Airline Guide and the numbers of flights
per sector by day of the week and the estimated average number of originating passengers per
flight (estimated using average load factors and percentages of connecting passengers) are as
in Table B-2.
To determine the sample size required, it is necessary to have at least approximate estimates of
the mean and standard deviation of the variable of interest--in this case the percentage of pas-
sengers dropped off at the curb. From knowledge of passengers using the airport, the percentage
of passengers dropped off at the curb was estimated in Table B-3.
Table B-3. Estimated percentage of passengers
dropped off at the curb.
Passengers Dropped Off at Curb Total
Sector of Flight
Mean, p Standard deviation Pass.
Short-Haul Domestic 40% 0.490 20,250
Long-Haul Domestic 60% 0.490 16,320
International 90% 0.300 11,730
Overall 58.9% 0.492 48,300

OCR for page 161

B-10 Guidebook for Conducting Airport User Surveys
The overall mean percentage of 58.9% is a weighted average of the means for each sector with
weights being the numbers of passengers in each sector.
Since the variable of interest is a categorical variable, the standard deviation (SD) is given by:
p (1 - p ) , where p is the proportion of the population in the category of interest (i.e.,
percentage dropped at curb).
Random Sampling of Passengers
A random sample of originating passengers could be surveyed as they exit the security line. In this
case, the sample size is determined for a given width of confidence interval using Equation 3, where
Total population N = 48, 300
Mean proportion using curb X = p = 0.59
SD for individual pass. X = 0.492
The sample size, n, was found using Equation 3 for three widths of the 95% confidence inter-
vals (C.I.)--±2%, ±3%, and ±4%--as shown in Table B-4.
A simple approximation, given in Section 3.4.1 of the guidebook32, could also have been used:
n = 40, 000 p (1 - p ) (100 w )
2
Using this equation, the estimated sample sizes for the ±2%, ±3%, and ±4% cases are: 2,411,
1,074, and 606. The approximation leads to slightly higher estimates of the required sample sizes.
If, for example, it was decided that the narrow confidence interval is appropriate, i.e., the mean
estimate should be accurate to within ±2%, a sample size of 2,218 is required. This corresponds
to a sampling fraction of 4.6% for a population of 48,300, and if using sequential sampling every
21st passenger passing through security should be surveyed.
Stratified Sampling of Passengers--Stratified by Sector
Consider the case where stratified sampling is used to select passengers to be surveyed and pas-
sengers are stratified by the sector of the flight. Assume that at this airport, passengers on the dif-
ferent sectors use different security screening checkpoints, thus allowing passengers on each
flight sector to be sampled separately.
We consider here the simple case where proportional stratified sampling is used. Thus the pro-
portion of the sample size in each flight sector is equal to the proportion of the total passengers
in each flight sector, Wi = ni/n = Ni/N.
Table B-4. Sample sizes for random
sampling of passengers for 95%
confidence interval widths 2%,
3%, and 4%.
95% C.I. 95% C.I. ± w Sample
Mean ± w w as % of mean n
2.00% 3.40% 2,218
3.00% 5.10% 1,012
4.00% 6.79% 574
32
The denominator in the equation in Section 3.4.1 is a2 where the width of the 95% confidence interval is ± a
where a is expressed in percentage points. Since w above is not expressed in percentage points, w = a/100.

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-11
Table B-5. Calculation of standard deviation of sample mean for stratified
sampling of passengers by market sector.
Est. Avg.
Enplaned Pass. % of SD
Sector of Flight Proportion Wi Xi 2
Total, Ni Total Wi Xi
at Curb, p
Short-Haul Domestic 20,250 0.4193 0.40 0.49 0.10062
Long-Haul Domestic 16,320 0.3379 0.60 0.49 0.08109
International 11,730 0.2429 0.90 0.30 0.02186
Total 48,300 1.0000 0.59 0.20357
The sample size is determined using Equation 8. Table B-5 shows the calculation of the sum-
mation over the three flight sectors.
The standard deviation for each sector is found using the relationship applicable for categorical
variables: pi (1 - pi ) , where pi is the proportion of the population in the category of interest
for flights in sector i (i.e., percentage dropped at the curb).
Substituting 0.20357 from Table B-5 for i(Wi Xi 2
) in Equation 8 for three C.I. widths of ±2%,
±3%, and ±4% gives the sample sizes, n, in Table B-6.
The sample sizes for each flight sector are then found, based on the proportion of passengers
in each sector, Wi, to be as shown in Table B-7.
Comparing the total sample size with that found with random sampling, we find that stratifi-
cation by flight segment has reduced the required sample size for the ± 2% case from 2,218 to
1,879--a reduction of 15%. Note that the size of the reduction is very dependent on the vari-
ation in the mean responses across the different strata.
Table B-6. Sample sizes for stratified
sampling of passengers by sector of
flight for 95% confidence interval
widths 2%, 3%, and 4%.
95% C.I. 95% C.I. ± w Sample
Mean ± w w as % of mean n
2.00% 3.40% 1,879
3.00% 5.09% 854
4.00% 6.79% 484
Table B-7. Sample sizes by sector for
stratified sampling of passengers by sector
of flight for 95% confidence interval widths
2%, 3%, and 4%.
Sample Size for C.I. Width
Sector of Flight
± 2% ± 3% ± 4%
Short-Haul Domestic 788 358 203
Long-Haul Domestic 635 288 163
International 456 207 118
Total 1,879 853 484

OCR for page 161

B-12 Guidebook for Conducting Airport User Surveys
Table B-8. Calculation of standard deviation of sample mean for stratified sampling
of passengers by day of week.
Originating Passengers Est. Avg.
SD
Day Short-Haul Long-Haul Total % of Proportion Wi Xi 2
International Xi
Domestic Domestic Ni Total Wi at Curb, pi
Monday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427
Tuesday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427
Wednesday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427
Thursday 3,000 2,400 1,530 6,930 0.1435 0.580 0.494 0.03496
Friday 3,000 2,400 2,040 7,440 0.1540 0.602 0.490 0.03692
Saturday 2,500 1,920 2,040 6,460 0.1337 0.617 0.486 0.03160
Sunday 2,750 2,400 2,040 7,190 0.1489 0.609 0.488 0.03546
Total 20,250 16,320 11,730 48,300 1.0000 0.589 0.24175
Stratified Sampling of Passengers--Stratified by Day of Week
Now consider the case where the passengers are stratified by the day of the week. This form of
stratification is easy to implement during the conduct of the survey, and numbers of passengers
are known, at least approximately, at the sample design stage.
Again consider the simple case where proportional stratified sampling is used. Thus the pro-
portion of the sample size on each day of the week is equal to the proportion of the total passen-
gers in each day of the week.
It was assumed that the proportion of people using the curb on each weekday was entirely
explained by the sector of their flight. Thus, the average percentage of passengers using the curb
was estimated for each day by the weighted average of the percentages for each flight sector, with
weights equal to the numbers of passengers on that day to each sector.
The sample size is determined using Equation 8. Table B-8 shows the calculation of the sum-
mation over the days of the week.
The standard deviation for each day is found using the relationship applicable for categorical
variables: p (1 - p ) , where pi is the proportion of the population in the category of interest
(i.e., percentage dropped at curb) on day i.
Substituting 0.24175 from Table B-8 for i(Wi Xi 2
) in Equation 8 for three C.I. widths of ±2%,
±3%, and ±4% gives the sample sizes, n, in Table B-9.
The sample sizes for each day of the week are then found, based on the proportion of passen-
gers on each day of the week, Wi, to be as shown in Table B-10.
Table B-9. Sample sizes for stratified
sampling of passengers by day of week
for 95% confidence interval widths 2%,
3%, and 4%.
95% C.I. 95% C.I. ± w Sample
Mean ± w w as % of mean n
2.000% 3.40% 2,215
3.000% 5.09% 1,010
4.000% 6.79% 574

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-13
Table B-10. Sample sizes for
each day for stratified sampling of
passengers by day of week for 95%
confidence interval widths 2%,
3%, and 4%.
Sample Size for C.I. Width
Day of Week
± 2% ± 3% ± 4%
Monday 310 141 80
Tuesday 310 142 80
Wednesday 310 142 80
Thursday 318 145 82
Friday 341 156 89
Saturday 296 135 77
Sunday 330 150 86
Total 2,215 1,011 574
Comparing the total sample size with that found with random sampling, we find that stratifi-
cation by day of the week has reduced the required sample size for the ±2% case from 2,218 to
2,215--a reduction of only 0.1%. Thus, in this case stratification makes almost no difference to
the required sample size. This is due to the low variability in the average percentage of passen-
gers at the curb, pi, over the various days of the week. Note that the size of the reduction is very
dependent on the variation in the mean responses across the different strata. This varies
depending on the variable of interest and in some cases could vary greatly over the days of the
week making stratification by day of the week worthwhile.
Cluster Sampling of Flights--Additional Assumptions
A very common form of sampling for passenger surveys is to select a sample of flights to sur-
vey and to sample either all, or a portion, of passengers on those flights. This is a form of cluster
sampling where each flight represents a cluster.
Using the same example as above, the pertinent characteristics required to estimate the sam-
ple size are given in Table B-11.
Cluster sampling is very dependent on one parameter not relevant to the passenger sampling
considered above--the variation in the mean value for each flight of the characteristic of interest
(i.e., percentage of passengers dropped at the curb) over the range of flights, ci. An estimate of
Table B-11. Assumed characteristics for flights in three market sectors for
illustrative examples of cluster sampling.
For Flight Sector, i Total
Quantity Short-Haul Long-Haul
Symbol International Symbol Value
Domestic Domestic
No. of Departing Flights Ni 405 136 69 N 610
Avg. Originating Pass./Flight Mi /Ni 50 120 170 M/N 79.2
Total Originating Passengers Mi 20,250 16,320 11,730 M 48,300
Proportion of Flights in Sector Wi = Ni /N 0.6639 0.2230 0.1131 1.0000
% Dropped at Curb Xi = pi 40% 60% 90% X= p 58.9%
Difference from Overall Avg. Xi X -19% 1% 31%
SD in % Between Flights: c i 10% 10% 10% c 21.1%

OCR for page 161

B-14 Guidebook for Conducting Airport User Surveys
this variation, expressed in terms of the standard deviation, is given in Table B-11. It is estimated
from previous surveys and knowledge of passengers at the airport that the standard deviation is
10% around the mean value for each sector33. Thus, for short-haul flights the mean value of the
percentage using the curb for each flight would be expected to be between 20.4% and 59.6% for
95% of flights [= 40% ± 1.96 x 10%]. The standard deviation over all flights includes both the vari-
ation between flights within each sector and the variation between sectors and is given by:
c = { i
Wi
ci
2
+ ( Xi - X )
2
}
where Wi is the proportion of flights in sector i (= Ni / N).
Cluster Sampling with Random Sampling Flights
If a random sample of flights is selected and all passengers on each of the selected flights are sur-
veyed, the sample size is determined for a given width of confidence interval using Equation 3, where
Total population (flights) N = 610
Mean proportion using curb X = p = 0.589
SD for individual flight X = c = 0.211
The number of flights to be sampled, n, was found using Equation 3 for three widths of the
95% confidence interval--±2%, ±3%, and ±4%--as shown in Table B-12.
Since all passengers on each flight are sampled, the number of passengers sampled is
the number of flights sampled multiplied by the average number of passengers per flight
(M = M / N).
Comparing the total passenger sample size with that found with random sampling of pas-
sengers, we find that cluster sampling by flight has increased the required sample size
greatly--for ±2% accuracy from 2,218 to 19,953. This is due to the high variation in the char-
acteristics of interest between flights. In other cases, the additional sample size with cluster-
ing may be much less. For example, if the mean percentage was 50% for each sector (instead
of 40%, 60%, and 90%), the sample size for ±2% accuracy would be 83 flights or 6,572 pas-
sengers. The increase in the sample size is very dependent on the variation in the mean
responses for a flight across the different flights and the above example may not be typical
in general.
Table B-12. Sample sizes for cluster sampling
with random sampling flights for 95% confidence
interval widths 2%, 3%, and 4%.
95% C.I. 95% C.I. ± w Sample Sample
Mean ± w w as % of mean n (flights) Pass.
2.00% 3.40% 252 19,953
3.00% 5.10% 146 11,560
4.00% 6.79% 92 7,285
33
Note that if there was no difference between sectors, so that the mean percentage of passengers dropped off at
the curb was 58.9% for all flights, the standard deviation in the percentage between flights, ci, would equal the
standard deviation of the mean for each flight. Then ci 2
= pi (1 - pi) / (Ni / Mi) where Ni / Mi is the average num-
bers of passengers on each flight in sector i. Thus the values of ci for the short-haul, long-haul, and international
sectors would be 7.0%, 4.5%, and 3.8%, respectively, and c would be 6.2%.

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-15
Table B-13. Calculation of standard deviation of sample mean
for cluster sampling with flights stratified by sector and all
passengers on selected flights surveyed.
Departing Flights Est. Avg. %
SD
Sector of Flight at curb, Wi ci 2
Total
Wi = Ni /N ci
Ni pi
Short-Haul Domestic 405 0.6639 0.40 0.10 0.00664
Long-Haul Domestic 136 0.2230 0.60 0.10 0.00223
International 69 0.1131 0.90 0.10 0.00113
Total 610 1.0000 0.589 0.01000
Cluster Sampling with Stratified Sampling of Flights and All Passengers
on Selected Flights Surveyed
Now consider the case where flights to be surveyed are determined using stratified sampling
and all passengers on each of the selected flights are surveyed. The flight sample size is deter-
mined for a given width of confidence interval using Equation 8, where the units sampled in each
stratum are clusters rather than individuals. Since flights are being sampled, rather than passen-
gers, the standard deviation Xi in Equation 8 is the standard deviation of the average percent-
age of passengers using the curb for each flight, ci, as shown in Table B-13.
Substituting 0.01000 from Table B-13 for i (Wi ci2
) in Equation 8, the number of flights to
be sampled, n, was found for three widths of the 95% confidence intervals--±2%, ±3%, and
±4%--as shown in Table B-14.
The numbers of flights in each sector and estimated number of passengers (based on average
numbers of passengers per flight in that sector) are as shown in Table B-15.
Table B-14. Sample number of flights
for cluster sampling with flights
stratified by sector and all passengers
on selected flights surveyed for 95%
confidence interval widths 2%,
3%, and 4%.
95% C.I. 95% C.I. ± w Sample
Mean ± w w as % of mean n (flights)
2.00% 3.40% 83
3.00% 5.09% 40
4.00% 6.79% 24
Table B-15. Sample sizes by sector for cluster sampling with flights
stratified by sector and all passengers on selected flights surveyed
for 95% confidence interval widths 2%, 3%, and 4%.
C.I. Width ± 2% C.I. Width ± 3% C.I. Width ± 4%
Sector of Flight
Flights Pass. Flights Pass. Flights Pass.
Short-Haul Domestic 55 2,750 27 1,350 16 800
Long-Haul Domestic 19 2,280 9 1,080 5 600
International 9 1,530 5 850 3 510
Total* 83 6,560 41 3,280 24 1,910
* Total may be higher than previous table as number of flights must be an integer

OCR for page 161

B-16 Guidebook for Conducting Airport User Surveys
The stratification of flights by sector results in a large reduction in the numbers of flights and
passengers to be surveyed. In this example, much of the variation in the variable of interest is
explained by the flight sector, which results in a large reduction in sample size compared to ran-
dom sampling of flights. By sampling the flights by sector, the likelihood of selecting a sample
with close to the actual proportions of passengers in each sector is much greater than when ran-
domly sampling flights. Again note that the results here reflect the assumptions regarding vari-
ation considered in this example and will vary in other situations.
Cluster Sampling with Stratified Sampling of Flights
and a Sample of Passengers on Selected Flights
Now consider the case where flights to be surveyed are determined using stratified sampling
and a sample of passengers on each of the selected flights are surveyed. Assume initially that 50%
of passengers on the selected flights are surveyed. The variance of the estimate is greater than
with 100% sampling of each flight as it includes both the variation between flights (as before)
and the variation due to sampling of passengers on individual flights. It is calculated from Equa-
tions 17 and 18 as follows:
= (1 - n N ) n i =1 ( N i N ) ci + i =1 (1 N i )(1 - f ) pi (1 - pi ) (mi - 1)
ns ns
X
2 2
where ci is the standard deviation of the mean percentage using the curb across flights in
sector i
Ni is the number of flights in sector i (N is total over all sectors)
ni is the number of flights sampled in sector i (n is total over all sectors)
Mi is the average number of passengers on a flight in sector i
mi is the average number of passengers sampled on a flight in sector i ( = f Mi )
pi is the probability of a passenger on a flight in sector i being dropped off at the curb
f is the proportion of passengers sampled on a flight ( = mi / Mi, assumed the same for
all flights).
The flight sample size is determined for a given confidence interval X ± w by solving the fol-
lowing relationships for n:
2 2
w = 1.96 X
where X
is given by the equation above.
2
The summations over the sectors for calculating X
are determined for a given n value as
shown in Table B-16 (n = 117 used in table).
The number of flights to be sampled, n, was found by setting an approximate value initially
and determining the width, w, then adjusting the value of n until the appropriate value of w was
2
is evaluated for n = 117 and the resulting value of w is 0.0200 or 2.00%.
obtained. In the table, X
Samples sizes for three widths of the 95% confidence intervals--±2%, ±3%, and ±4%--were
found to be as shown in Table B-17.
Table B-16. Calculation of standard deviation of sample mean for cluster sampling with
flights stratified by sector and a 50% sample of passengers on selected flights.
Calculate X 2 for n = 117
Departing Est. Pass. on Sampled on SD Between Within Between
Flights Avg. % Each Each Flight Between Cluster Cluster + Within
Sector of Flight (1 / Ni) (1 f)
Total, Wi = at Curb, Flight, % # Flights, (1 n/N ) / n
ci (Ni / N) ci 2
pi (1 pi) / Total
Ni Ni / N pi Mi f mi (mi 1)
Short-Haul Dom. 405 0.6639 0.40 50 50% 25 0.100 0.0000459 0.0000123 0.0000582
Long-Haul Dom. 136 0.2230 0.60 120 50% 60 0.100 0.0000154 0.0000150 0.0000304
International 69 0.1131 0.90 170 50% 85 0.100 0.0000078 0.0000078 0.0000156
Total 610 1.0000 0.59 X 2 = 0.0001041
w = 1.96 X = 0.0200

OCR for page 161

Sample Sizes, Sample Estimates, and Confidence Intervals B-17
Table B-17. Sample number of flights
for cluster sampling with flights
stratified by sector and a 50% sample
of passengers on selected flights
surveyed for 95% confidence interval
widths 2%, 3%, and 4%.
95% C.I. 95% C.I. ± w Sample
Mean ± w w as % of mean n (flights)
2.00% 3.40% 117
3.00% 5.09% 47
4.00% 6.79% 26
The numbers of flights in each sector and estimated number of passengers (based on average
numbers of passengers per flight in that sector) are as shown in Table B-18.
The surveying of only a 50% sample of passengers on each flight resulted in an increase in the
number of flights to be surveyed from 83 to 117 for the ±2% accuracy case. However, since only
50% of passengers on these flights are to be surveyed, the total number of passengers decreased
from 6,560 to 4,615. For surveys conducted in the departure lounge, it is almost impossible to sur-
vey all passengers on a flight due to reasons given in Chapter 5 of the guidebook. In practice it may
be possible to obtain complete responses from 50% of passengers, in which case the number of
flights to be surveyed based on the 50% passenger sample should be used.
It is evident that the total number of passengers that need to be surveyed can be reduced by
reducing the percentage of passengers sampled on each flight, but the number of flights surveyed
increases. Several other cases were examined using this example:
· If 75% of passengers on each of the selected flights were to be surveyed, a sample of 92 flights
and 5,580 passengers would be required.
· If 30% of passengers on each of the selected flights were to be surveyed, a sample of 268 flights
and 6,360 passengers would be required.
The optimal balance for a particular survey will depend on the variation in responses between
and within flights, and on the relative costs of surveying passengers and flights, which vary from
survey to survey.
Another important consideration with interview surveys in gate lounges, discussed in Chapter 5
of the guidebook, is the limitation on the number of interviews that each interviewer can complete
in the time window between when passengers start to arrive in the gate lounge and the start of flight
boarding. As a practical matter, this limits the number of passengers who can be surveyed on a given
flight.
Again note that the results here reflect the assumptions regarding variation considered in
this example and will vary in other situations.
Table B-18. Sample sizes by sector for cluster sampling with flights
stratified by sector and a 50% sample of passengers on selected flights
surveyed for 95% confidence interval widths 2%, 3%, and 4%.
C.I. Width ± 2% C.I. Width ± 3% C.I. Width ± 4%
Sector of Flight
Flights Pass. Flights Pass. Flights Pass.
Short-Haul Domestic 78 1,950 31 775 17 425
Long-Haul Domestic 26 1,560 11 660 6 360
International 13 1,105 5 425 3 255
Total 117 4,615 47 1,860 26 1,040