Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
B-1 This appendix outlines how to determine the sample size so that the conï¬dence interval of a sample estimate will be within a speciï¬ed range and, when analyzing the results, how to estimate values and conï¬dence intervals for characteristics of the population. The determination of sam- ple sizes and conï¬dence intervals are provided for the four types of probability sampling, namely: random, sequential, stratiï¬ed, and cluster sampling. The description is only a brief summary of the methods as applicable for airport surveys and the reader is encouraged to refer to statistical texts listed in the bibliography for a more complete description. Examples are then provided for applying these methods to determine sample sizes for small population sizes such as employee or tenant surveys, and for passenger surveys using each of the possible sampling strategies. Throughout this section, it is assumed that the sample size will be large enough that the sample means will be approximately Normally distributed, per the Central Limit Theorem. Generally a sample size of at least 30 is considered to provide a reasonably good approxima- tion to the Normal distribution. Most airport user surveys will have a much larger sample size than this. Random Sampling Suppose we are interested in a characteristic of the population which we denote by X. For example, X could be the variable, gender, where: X = 0 if passenger is male 1 if passenger is female Let the mean value of this characteristic be μ. In the example, μ is the proportion of the pas- sengers that are female. Using random sampling the sample mean is an unbiased estimator for the population mean, μ. If the sample size is n, the sample mean is: where X â is the sample mean xi is the ith individual in the sample, i = 1, . . . , n. The 95% conï¬dence interval for μ (the range that contains the true value of μ with a proba- bility of 0.95) is given by: X X± 1 96. Ï X x ni= â ( )1 A P P E N D I X B Sample Sizes, Sample Estimates, and Confidence Intervals
B-2 Guidebook for Conducting Airport User Surveys where ÏXâ is the standard deviation of the sample mean. For random sampling without replace- ment, it can be shown that: where ÏX is the standard deviation of the variable, X N is the number of individuals in the population. Hence the 95% conï¬dence interval for μ is given by: For very large populations (1 â n/N) is approximately one, and this reduces to: The constant, 1.96, is applicable for a 95% conï¬dence interval. For a 99% conï¬dence inter- val, the constant should be 2.58, and for a 90% conï¬dence interval, use 1.65. If the sample size is to be chosen so that the sample mean is within w of the population mean, μ, with a probability of 0.95, then the 95% conï¬dence interval is given by: This can be solved to give the sample size, n: In the survey planning stage when trying to determine the sample size, the standard deviation of the variable X, ÏX, will be unknown and must be estimated based on data from previous sur- veys, other similar airports, or knowledge of the airport. For categorical variables, the standard deviation is related to the expected proportion of indi- viduals in a category: where p is the proportion of the population in the category of interest. For the example above, where the categorical valuable is the respondent gender, assume that we are interested in estimating the proportion of female passengers. Thus, p is the proportion of female passengers. If we initially estimate this proportion to be 0.5, then for a sample size of n = 400 and population size, N, of 50,000, the width of the 95% conï¬dence interval, w, is ± 0.049. The sample size can be chosen to obtain the required accuracy using Equation 3. If X represents the mode of transport to the airport, and the category, taxi or limousine, has an estimated proportion of 0.20, then the width of the 95% conï¬dence interval, w, is ± 0.039 for a sample size of n = 400 and population size of N = 50,000. For a non-categorical variable, such as time in the terminal before departure or expenditure at the concessions, the standard deviation is determined from the variation of the variable. Again, this ÏX p p w p p n N n = â( )â¡â£ â¤â¦ = â( ) â( )â¡â£ â¤â¦ 1 1 96 1 1. n w N X X = + 1 96 1 96 3 2 2 2 2 . . ( ) 2 Ï Ï X w± (note that the width of the confidence interval is 2 Thus, w w n N nX ) .= â( )1 96 1Ï X nX± 1 96. Ï X n N nX± â( )1 96 1 2. ( )Ï Ï ÏX X n N n= â( )1
Sample Sizes, Sample Estimates, and Confidence Intervals B-3 will be unknown at the planning stage of the survey. If an estimate of the standard deviation, ÏX, can be obtained from past surveys, other similar airports, or knowledge of the airport, the required sam- ple size for a given conï¬dence interval width can be determined in a similar way using Equation 2. Once the survey has been completed, the standard deviation of X, ÏX, can be estimated from the sample values of X for use in determining the accuracy of the estimated population mean and conï¬dence intervals. The sample standard deviation is the square root of the sample variance, sX 2, which is given by: where X â is the sample average. Sequential Sampling Sequential sampling is equivalent to random sampling if the order of the sample with respect to the characteristics of interest is essentially random. In this case, sample sizes and estimates of standard deviation and conï¬dence intervals are determined in the same way as outlined above for random sampling. Sequential sampling can result in a more representative sample, and thus narrower conï¬dence intervals, by ensuring a more even spread of sampled individuals over the population (provided the cases where serious biases can occur are avoided31). Calculation of the conï¬dence intervals and required sample sizes are difï¬cult and dependent on the relationships between the ordering variable and the characteristics of the population of interest. In airport surveys, it can usually be assumed that the width of the conï¬dence interval is no wider than that which would occur with random sampling and the expressions given for random sampling above can be used. Stratified Sampling With stratiï¬ed sampling, the population is divided into groups, referred to as strata, and each stratum is sampled separately using random or sequential sampling. Assume there are ns strata and the proportion of the population in the ith strata is Wi. The sample mean for the ith stratum is: and the sample mean for the population is estimated by: where X â i is the sample mean of the stratum i xij is the value of characteristic X for individual j in stratum i ni is the number of individuals in stratum i Wi is the proportion of the population in stratum i, Ni/N. The standard deviation of the sample mean, X â , is given by: Ï ÏX i i XW i= â¡â£ â¤â¦â 2 2 X W Xi ii ns = = â 1 5( ) X x ni ij ij ni = = â 1 4( ) s x X nX i 2 2 1= â( ) â( )â 31 Serious biases can occur if a characteristic of interest occurs in a cyclic order in the population list and the length of each cycle corresponds to the sampling fraction, but this would be rare in airport surveys.
B-4 Guidebook for Conducting Airport User Surveys where ÏXâi is the standard deviation of the mean for members of stratum i, X â i, and is given by: where ÏXi is the standard deviation of X for members of stratum i ni and Ni are the sample and population sizes for stratum i, respectively. Hence the standard deviation of the sample mean can be expressed as: The accuracy of the sample estimate is improved if the variation within each stratum (mea- sured by ÏXâi) is less than the variation over the whole population. As above, the 95% conï¬dence interval, 2w, that the sample mean is within w of the popula- tion mean, μ, is given by: With proportional stratified sampling, the sampling fraction is the same for each stratum and equals the proportion of the population in the stratum, Wi. Thus, where n and N are the total sample and population sizes, respectively. In this case, the sample mean given by Equation 4 reduces to that for a random sample, Equa- tion 1, and To determine the required sample size, estimates of ÏXi can be determined in the survey plan- ning stage. These must be estimated from results of previous surveys, other similar airports, or knowledge of the airport. Once approximate estimates of ÏXi have been obtained, the total sam- ple size, n, for a conï¬dence interval of width 2w is then found using the relationship (from Equa- tions 6 and 7): which can be solved to give the sample size, n, as: If separate estimates for each stratum cannot be obtained in the planning stage, an approxi- mate estimate of the standard deviation of X over the total population could be used to provide a conservative estimate of the sample size, n. For proportional stratiï¬ed samples, the sample size for each stratum, ni, is then found by: n nWi i= n W w W i i Xi i i Xi = ( )â¡â£ â¤â¦ + ( )â¡â£ â¤â¦ â â 1 96 1 96 2 2 2 2 2 . . Ï Ï N ( )8 w n N n Wi i Xi= â( )â¡â£ â¤â¦ ( ){ }â1 96 1 2. Ï Ï ÏX i i Xin N n W= â( )â¡â£ â¤â¦ ( ){ }â1 72 ( ) n N n N W n n N Ni i i i i= = =and w X= 1 96 6. ( )Ï Ï ÏX i i Xi i i iW n N n= â( )â¡â£ â¤â¦â 2 2 1 Ï ÏXi Xi i i in N n= â( )â¡â£ â¤â¦2 1
For determining conï¬dence intervals of the ï¬nal estimates using the survey data, the standard deviation of X in each strata, ÏXi , can be estimated by the sample standard deviation, sXi , deter- mined from the sample values in each stratum. The 95% conï¬dence interval is then: With non-proportional stratified sampling, the sampling fractions differ for each stratum. The sample mean for each stratum and overall sample mean can be found using Equations 4 and 5, respectively. The sample sizes, ni , are chosen so that a confidence interval of width 2w is given by the relationship: In this case, many different combinations of ni could be chosen to produce a conï¬dence inter- val width of 2w. The choice of ni is dependent on the reason for choosing to use non-proportional stratiï¬ed sampling. To achieve a similar level of accuracy in the results for each stratum, it will be necessary to use non-proportional stratiï¬ed sampling, with the sample size in each stratum inversely proportional to the variance of the characteristic within that stratum. Thus, ni = k / ÏXi 2 , where k is a constant. Substituting this into Equation 10 and solving for k gives: The sample size for each stratum is then found using the relationship: ni = k / ÏXi 2 Because the actual variance in the characteristic in the survey responses for each stratum, ÏXi 2 , will not be known until the survey has been performed, it will be necessary to make an initial assumption of the differences in the variance across the strata in order to determine the propor- tion of the survey responses to assign to each stratum. These assumptions can be based on the results of prior surveys or of surveys performed at similar airports. As before, conï¬dence intervals for the ï¬nal estimates can be determined using the survey data. The standard deviation of X in each strata, ÏXi, can be estimated by the sample standard devia- tion for each strata, sXi , determined from the sample values. The 95% conï¬dence interval is then: Cluster Sampling As described in Section 3.2 of the guidebook, with cluster sampling the population is divided into clusters (or groups) and clusters rather than individual members of the population are sam- pled. In single stage cluster sampling, all individuals in the cluster are sampled so a complete pic- ture of the population with the sampled clusters is obtained. For example in a survey of departing passengers, ï¬ights could be used as clusters and a sample of ï¬ights would then be chosen and all passengers on those ï¬ights would be surveyed. If two-stage cluster sampling is used, individuals within each cluster are also sampled. X W s n N ni i Xi i i i± â( )â¡â£ â¤â¦â1 96 1 112 2. ( ) k W w W N i i Xi i i Xi i = ( ) + ( ) â â 2 2 2 2 2 2 21 96 Ï Ï. w W n N ni i Xi i i i= â( )â¡â£ â¤â¦â1 96 1 102 2. ( )Ï X n N n W si i Xi± â( )â¡â£ â¤â¦ ( ){ }â1 96 1 92. ( ) Sample Sizes, Sample Estimates, and Confidence Intervals B-5
B-6 Guidebook for Conducting Airport User Surveys Let N be the numbers of clusters in the population n be the numbers of clusters sampled Mk be the number of individuals in the kth cluster in the population mk be the number of individuals in the kth cluster included in the sample xkj be the value of X for the jth individual in the kth cluster. The average number of individuals per cluster is the total population size divided by the num- ber of clusters: The sample mean for the kth cluster is: The population mean is estimated by: When the numbers of individuals in each cluster are equal, Mk = M â , and the mean for the population reduces to simply the average of the cluster means, X â k. Where the clusters to be sampled are drawn randomly from the population of all clusters, and a sample of individuals are drawn randomly from each cluster, the variance of the sample mean, X â , includes two components of variation, the between and within cluster components, and for a categorical variable is estimated by: where p is the proportion of the total population in the category of interest pk is the proportion of individuals in cluster k in the category of interest Ïc is the variance of the cluster means around the population mean and can be estimated by: if all individuals in each selected cluster are sampled, mk = Mk, and the variance is given by the âbetween cluster componentâ term only. Clusters could be selected using stratiï¬ed sampling to reduce the variance between clusters within a given stratum and thus improve the accuracy of the estimate. For example, where the clusters are ï¬ights, ï¬ights could be stratiï¬ed into groups such as domestic and international and short- and long-haul. Assume that the clusters are stratiï¬ed into ns strata and in each stratum, ni clusters are sampled. The population mean is estimated by: where ni is the numbers of clusters sampled in the ith stratum Mik is the number of individuals in the kth cluster in the ith stratum X â ik is the mean of individuals in the kth cluster in the ith stratum n is the total number of clusters sampled over all strata. X M M X nik ikk n i ns i = ( ) == ââ 11 16( ) Ïc kk n X X n2 2 1 1= â( ) â( ) = â Ï ÏX cn N n 2 21= â( ) Between cluster component + ( ) ( ) ( )â ân N m M p p n mk k k k1 1 2 kkn â( )â¡â£ â¤â¦=â 11 15 Within cluster component ( ) X M M X nk kk n = ( ) = â 1 14( ) X x mk kj kj mk = = â 1 13( ) M M Nkk N = = â 1 12( )
Sample Sizes, Sample Estimates, and Confidence Intervals B-7 With proportional stratiï¬ed sampling of clusters, the between cluster variance component of Equation 15 becomes: where n is the number of clusters sampled (over all strata) N is the number of clusters in the population (over all strata) Ïci 2 is the variance of cluster means for clusters in stratum i Wi is the proportion of clusters sampled that are in stratum i (= ni / n = Ni / N for proportional stratiï¬ed sampling). Assuming that the clusters within each stratum have the same size, Mi, the same sample size, mi, and the same proportion of individuals with the category of interest, pi, the within cluster variance component becomes: where f is the fraction of passengers sampled and is assumed to be constant in all clusters. The calculation of the accuracy of estimates, required sample sizes, and conï¬dence intervals is complex and the reader is referred to Levy and Lemeshow, Sampling of Populations, and Cochran, Sampling Techniques, listed in the bibliography. Comprehensive statistical software programs exist which would be useful for analyzing data from cluster sampling. It should be noted, however, that cluster sampling is less efï¬cient that random, sequential, and stratiï¬ed sampling and larger sample sizes will be required to obtain the same levels of accuracy. Use of the expressions applicable for random sampling will underestimate the true standard errors of estimated population characteristics and the associated conï¬dence intervals. Preferably, clusters should be chosen so that variation in the characteristics of interest between clusters is small, but within clusters is large. In airport surveys, cluster sampling is commonly used for sampling of flights in departing passenger surveys. For many characteristics such as trip duration, airfares, trip purpose, time at airport, spending in airport concessions, and of course destination and sector, passengers on the same ï¬ight will be more likely to have similar values of these characteristics than passengers in general. This homogeneity of characteristics within a ï¬ight signiï¬cantly reduces the efï¬ciency of cluster sampling for analyzing these characteristics. There are relatively few air party characteristics that have a similar distribution across differ- ent ï¬ights. For characteristics such as household size, the variation of the characteristics across passengers on one ï¬ight is likely to be fairly similar to that of all passengers. In this case, use of cluster sampling should not greatly reduce the sampling efï¬ciency. It can be shown that the variances of the estimates of mean of the characteristic of interest for cluster sampling can be expressed, approximately, as a function of the variance for random sampling: where ÏXâC 2 is the variance using cluster sampling ÏXâR 2 is the variance using random sampling Ï Ï ÏXC XR avm2 2 1 1 19= + â( )â¡â£ â¤â¦ ( ) Var X n N m M p p n mc i i i i i i i i( ) = ( ) â( ) â( ) â( )â¡â£ â¤â¦1 1 12kniins i i i i i i i i in n N m M p p n m == ââ = ( ) â( ) â( ) 11 21 1 â( )â¡â£ â¤â¦ = ( ) â( ) â( ) â( ) = = â 1 1 1 1 1 1 1 i ns i i i ii N f p p m nsâ ( )18 Var X n N n Wc i cii ns( ) = â( ) = â1 1721 Ï ( )
B-8 Guidebook for Conducting Airport User Surveys Ï represents the population intra-class correlation mav is the mean number of cases sampled per cluster. The variance using cluster sampling will be greater than using random sampling unless either mav = 1 or Ïi ⤠0. mav = 1 corresponds to the special case where each cluster consists of a single case and is equivalent to random sampling. The intra-class correlation, Ï, is a measure of homo- geneity and if individuals in a cluster are more homogeneous than the population as a whole, Ï will be greater than zero. The ratio of the variances: ÏXâC 2 / ÏXâR 2 is often referred to as the design effect, DE, and is given by: The effective sample size is given by: mav n / DE. Examples of Calculation of Sample Sizes Small Population SizeâUsing Random or Sequential Sampling for Categorical Variables In this example, the sample size is required for a survey of a relatively small population, such as an employee or tenant survey, using random sampling. The critical characteristics of the pop- ulation being determined are categorical variables, e.g., percentage of employees accessing the airport by private vehicle. The sample sizes depend on the expected proportion in the category and the level of accuracy desired. Sample sizes were determined using Equation 3 for a range of population sizes; for two values of the expected proportion in the category: 50% and 20%; and for three levels of accuracy: ±5 percentage points, ±3 percentage points, and ±2 percentage points (the latter for the expected proportion of 20% only). As discussed in Section 3.2 of the guidebook, an error of ±5 percent- age points represents a percentage error in the category proportion of 10% for an expected pro- portion of 50% and 25% for an expected proportion of 20%. As is evident in Table B-1, large sample sizes are required if an accuracy of 10% is required for categories with low proportions of the population. DE mav= + â( )1 1 20Ï ( ) Sample Size, n, Required for: Expected Proportion in Category = 50% Expected Proportion in Category = 20% Accuracy of ±5 Percentage Points Accuracy of ±3 Percentage Points Accuracy of ±5 Percentage Points Accuracy of ±3 Percentage Points Accuracy of ±2 Percentage Points Total Number in Population N (Equivalent to ±10% of Proportion) (Equivalent to ±6% of Proportion) (Equivalent to ±25% of Proportion) (Equivalent to ±15% of Proportion) (Equivalent to ±10% of Proportion) 50 44 48 42 47 48 100 80 92 71 87 94 200 132 168 110 155 177 500 218 340 165 290 377 1,000 280 515 200 405 605 5,000 360 880 235 600 1,175 Note: Assumes random sampling without replacement. Table B-1. Sample sizes required for categorical variable with expected proportions of 50% and 20% and varying levels of accuracy.
Sample Sizes, Sample Estimates, and Confidence Intervals B-9 Passenger SurveyâUsing Random, Stratified, and Cluster Sampling A survey of air passengers is to be undertaken to obtain information on airport access trips. A critical question to be answered is (say): What is the percentage of passengers dropped off at the curb outside departures check-in? It was decided to determine the sample sizes for each of the different sampling types and chose the most cost-effective method. It is known that the percentage dropped off at the curb varies greatly by characteristics such as trip purpose, ï¬ight sector (e.g., short-haul domestic, long-haul domestic, international), day of the week, time of day, etc. The trip purpose distribution of pas- sengers is not known at the sampling stage and so could not be used. In addition to random sam- pling, stratiï¬ed sampling with passengers stratiï¬ed by ï¬ight sector or day of the week, and cluster sampling with ï¬ights stratiï¬ed by ï¬ight sector were examined. The survey is planned to be conducted during a two-week period. The flight schedule is obtained from the Official Airline Guide and the numbers of flights per sector by day of the week and the estimated average number of originating passengers per flight (estimated using average load factors and percentages of connecting passengers) are as in Table B-2. To determine the sample size required, it is necessary to have at least approximate estimates of the mean and standard deviation of the variable of interestâin this case the percentage of pas- sengers dropped off at the curb. From knowledge of passengers using the airport, the percentage of passengers dropped off at the curb was estimated in Table B-3. Sector of Flights Day of Week Short-Haul Domestic Long-Haul Domestic International Total Monday 60 20 8 88 Tuesday 60 20 8 88 Wednesday 60 20 8 88 Thursday 60 20 9 89 Friday 60 20 12 92 Saturday 50 16 12 78 Sunday 55 20 12 87 Total 405 136 69 610 Avg. Originating Passengers/Flight 50 120 170 79.2 Table B-2. Assumed number of flights and average number of originating passengers per flight by market sector and day of week. Passengers Dropped Off at CurbSector of Flight Mean, p Standard deviation Total Pass. Short-Haul Domestic 40% 0.490 20,250 Long-Haul Domestic 60% 0.490 16,320 International 90% 0.300 11,730 Overall 58.9% 0.492 48,300 Table B-3. Estimated percentage of passengers dropped off at the curb.
B-10 Guidebook for Conducting Airport User Surveys The overall mean percentage of 58.9% is a weighted average of the means for each sector with weights being the numbers of passengers in each sector. Since the variable of interest is a categorical variable, the standard deviation (SD) is given by: , where p is the proportion of the population in the category of interest (i.e., percentage dropped at curb). Random Sampling of Passengers A random sample of originating passengers could be surveyed as they exit the security line. In this case, the sample size is determined for a given width of conï¬dence interval using Equation 3, where The sample size, n, was found using Equation 3 for three widths of the 95% conï¬dence inter- vals (C.I.)â±2%, ±3%, and ±4%âas shown in Table B-4. A simple approximation, given in Section 3.4.1 of the guidebook32, could also have been used: Using this equation, the estimated sample sizes for the ±2%, ±3%, and ±4% cases are: 2,411, 1,074, and 606. The approximation leads to slightly higher estimates of the required sample sizes. If, for example, it was decided that the narrow conï¬dence interval is appropriate, i.e., the mean estimate should be accurate to within ±2%, a sample size of 2,218 is required. This corresponds to a sampling fraction of 4.6% for a population of 48,300, and if using sequential sampling every 21st passenger passing through security should be surveyed. Stratified Sampling of PassengersâStratified by Sector Consider the case where stratiï¬ed sampling is used to select passengers to be surveyed and pas- sengers are stratiï¬ed by the sector of the ï¬ight. Assume that at this airport, passengers on the dif- ferent sectors use different security screening checkpoints, thus allowing passengers on each ï¬ight sector to be sampled separately. We consider here the simple case where proportional stratiï¬ed sampling is used. Thus the pro- portion of the sample size in each ï¬ight sector is equal to the proportion of the total passengers in each ï¬ight sector, Wi = ni/n = Ni/N. n p p w= â( ) ( )40 000 1 100 2, Total population Mean proportion usi N = 48 300, ng curb SD for individual pass. .X p X = = = 0 59 Ï 0 492. p p1 â( )â¡â£ â¤â¦ 95% C.I. Mean ± w 95% C.I. ± w w as % of mean Sample n 2.00% 3.40% 2,218 3.00% 5.10% 1,012 4.00% 6.79% 574 Table B-4. Sample sizes for random sampling of passengers for 95% confidence interval widths 2%, 3%, and 4%. 32 The denominator in the equation in Section 3.4.1 is a2 where the width of the 95% conï¬dence interval is ± a where a is expressed in percentage points. Since w above is not expressed in percentage points, w = a/100.
Sample Sizes, Sample Estimates, and Confidence Intervals B-11 The sample size is determined using Equation 8. Table B-5 shows the calculation of the sum- mation over the three ï¬ight sectors. The standard deviation for each sector is found using the relationship applicable for categorical variables: , where pi is the proportion of the population in the category of interest for ï¬ights in sector i (i.e., percentage dropped at the curb). Substituting 0.20357 from Table B-5 for âi(Wi ÏXi2 ) in Equation 8 for three C.I. widths of ±2%, ±3%, and ±4% gives the sample sizes, n, in Table B-6. The sample sizes for each ï¬ight sector are then found, based on the proportion of passengers in each sector, Wi, to be as shown in Table B-7. Comparing the total sample size with that found with random sampling, we ï¬nd that stratiï¬- cation by ï¬ight segment has reduced the required sample size for the ± 2% case from 2,218 to 1,879âa reduction of 15%. Note that the size of the reduction is very dependent on the vari- ation in the mean responses across the different strata. p pi i1 â( )â¡â£ â¤â¦ Sector of Flight Enplaned Pass. Total, Ni % of Total Wi Est. Avg. Proportion at Curb, p SD ÏXi Wi ÏXi 2 Short-Haul Domestic 20,250 0.4193 0.40 0.49 0.10062 Long-Haul Domestic 16,320 0.3379 0.60 0.49 0.08109 International 11,730 0.2429 0.90 0.30 0.02186 Total 48,300 1.0000 0.59 0.20357 Table B-5. Calculation of standard deviation of sample mean for stratified sampling of passengers by market sector. 95% C.I. Mean ± w 95% C.I. ± w w as % of mean Sample n 2.00% 3.40% 1,879 3.00% 5.09% 854 4.00% 6.79% 484 Table B-6. Sample sizes for stratified sampling of passengers by sector of flight for 95% confidence interval widths 2%, 3%, and 4%. Sample Size for C.I. Width Sector of Flight ± 2% ± 3% ± 4% Short-Haul Domestic 788 358 203 Long-Haul Domestic 635 288 163 International 456 207 118 Total 1,879 853 484 Table B-7. Sample sizes by sector for stratified sampling of passengers by sector of flight for 95% confidence interval widths 2%, 3%, and 4%.
B-12 Guidebook for Conducting Airport User Surveys 95% C.I. Mean ± w 95% C.I. ± w w as % of mean Sample n 2.000% 3.40% 2,215 3.000% 5.09% 1,010 4.000% 6.79% 574 Table B-9. Sample sizes for stratified sampling of passengers by day of week for 95% confidence interval widths 2%, 3%, and 4%. Stratified Sampling of PassengersâStratified by Day of Week Now consider the case where the passengers are stratiï¬ed by the day of the week. This form of stratiï¬cation is easy to implement during the conduct of the survey, and numbers of passengers are known, at least approximately, at the sample design stage. Again consider the simple case where proportional stratiï¬ed sampling is used. Thus the pro- portion of the sample size on each day of the week is equal to the proportion of the total passen- gers in each day of the week. It was assumed that the proportion of people using the curb on each weekday was entirely explained by the sector of their ï¬ight. Thus, the average percentage of passengers using the curb was estimated for each day by the weighted average of the percentages for each ï¬ight sector, with weights equal to the numbers of passengers on that day to each sector. The sample size is determined using Equation 8. Table B-8 shows the calculation of the sum- mation over the days of the week. The standard deviation for each day is found using the relationship applicable for categorical variables: , where pi is the proportion of the population in the category of interest (i.e., percentage dropped at curb) on day i. Substituting 0.24175 from Table B-8 for âi(Wi ÏXi2 ) in Equation 8 for three C.I. widths of ±2%, ±3%, and ±4% gives the sample sizes, n, in Table B-9. The sample sizes for each day of the week are then found, based on the proportion of passen- gers on each day of the week, Wi, to be as shown in Table B-10. p p1 â( )â¡â£ â¤â¦ Originating Passengers Day Short-Haul Domestic Long-Haul Domestic International Total Ni % of Total Wi Est. Avg. Proportion at Curb, pi SD Monday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427 Tuesday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427 Wednesday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427 Thursday 3,000 2,400 1,530 6,930 0.1435 0.580 0.494 0.03496 Friday 3,000 2,400 2,040 7,440 0.1540 0.602 0.490 0.03692 Saturday 2,500 1,920 2,040 6,460 0.1337 0.617 0.486 0.03160 Sunday 2,750 2,400 2,040 7,190 0.1489 0.609 0.488 0.03546 Total 20,250 16,320 11,730 48,300 1.0000 0.589 0.24175 ÏXi Wi ÏXi 2 Table B-8. Calculation of standard deviation of sample mean for stratified sampling of passengers by day of week.
Sample Sizes, Sample Estimates, and Confidence Intervals B-13 Comparing the total sample size with that found with random sampling, we ï¬nd that stratiï¬- cation by day of the week has reduced the required sample size for the ±2% case from 2,218 to 2,215âa reduction of only 0.1%. Thus, in this case stratiï¬cation makes almost no difference to the required sample size. This is due to the low variability in the average percentage of passen- gers at the curb, pi, over the various days of the week. Note that the size of the reduction is very dependent on the variation in the mean responses across the different strata. This varies depending on the variable of interest and in some cases could vary greatly over the days of the week making stratification by day of the week worthwhile. Cluster Sampling of FlightsâAdditional Assumptions A very common form of sampling for passenger surveys is to select a sample of ï¬ights to sur- vey and to sample either all, or a portion, of passengers on those ï¬ights. This is a form of cluster sampling where each ï¬ight represents a cluster. Using the same example as above, the pertinent characteristics required to estimate the sam- ple size are given in Table B-11. Cluster sampling is very dependent on one parameter not relevant to the passenger sampling considered aboveâthe variation in the mean value for each ï¬ight of the characteristic of interest (i.e., percentage of passengers dropped at the curb) over the range of ï¬ights, Ïci. An estimate of Sample Size for C.I. Width Day of Week ± 2% ± 3% ± 4% Monday 310 141 80 Tuesday 310 142 80 Wednesday 310 142 80 Thursday 318 145 82 Friday 341 156 89 Saturday 296 135 77 Sunday 330 150 86 Total 2,215 1,011 574 Table B-10. Sample sizes for each day for stratified sampling of passengers by day of week for 95% confidence interval widths 2%, 3%, and 4%. For Flight Sector, i Total Quantity Symbol Short-Haul Domestic Long-Haul Domestic International Symbol Value No. of Departing Flights Ni 405 136 69 N 610 Avg. Originating Pass./Flight Mi/Ni 50 120 170 M/N 79.2 Total Originating Passengers Mi 20,250 16,320 11,730 M 48,300 Proportion of Flights in Sector Wi = Ni/N 0.6639 0.2230 0.1131 1.0000 % Dropped at Curb 40% 60% 90% 58.9% Difference from Overall Avg. -19% 1% 31% SD in % Between Flights: 10% 10% 10% 21.1% Xi â X â â Xi = pi â X = pâ Ïc i Ïc Table B-11. Assumed characteristics for flights in three market sectors for illustrative examples of cluster sampling.
B-14 Guidebook for Conducting Airport User Surveys this variation, expressed in terms of the standard deviation, is given in Table B-11. It is estimated from previous surveys and knowledge of passengers at the airport that the standard deviation is 10% around the mean value for each sector33. Thus, for short-haul ï¬ights the mean value of the percentage using the curb for each ï¬ight would be expected to be between 20.4% and 59.6% for 95% of ï¬ights [= 40% ± 1.96 x 10%]. The standard deviation over all ï¬ights includes both the vari- ation between ï¬ights within each sector and the variation between sectors and is given by: where Wi is the proportion of ï¬ights in sector i (= Ni / N). Cluster Sampling with Random Sampling Flights If a random sample of ï¬ights is selected and all passengers on each of the selected ï¬ights are sur- veyed, the sample size is determined for a given width of conï¬dence interval using Equation 3, where The number of ï¬ights to be sampled, n, was found using Equation 3 for three widths of the 95% conï¬dence intervalâ±2%, ±3%, and ±4%âas shown in Table B-12. Since all passengers on each flight are sampled, the number of passengers sampled is the number of flights sampled multiplied by the average number of passengers per flight (M â = M / N). Comparing the total passenger sample size with that found with random sampling of pas- sengers, we find that cluster sampling by flight has increased the required sample size greatlyâfor ±2% accuracy from 2,218 to 19,953. This is due to the high variation in the char- acteristics of interest between flights. In other cases, the additional sample size with cluster- ing may be much less. For example, if the mean percentage was 50% for each sector (instead of 40%, 60%, and 90%), the sample size for ±2% accuracy would be 83 flights or 6,572 pas- sengers. The increase in the sample size is very dependent on the variation in the mean responses for a flight across the different flights and the above example may not be typical in general. Total population (flights) N = 610 Mean proportion using curb SD f .X p= = 0 589 or individual flight Ï ÏX c= = 0 211. Ï Ïc i ci iiW X X= + â( )â¡â£â¢ â¤â¦â¥{ }â 2 2 95% C.I. Mean ± w 95% C.I. ± w w as % of mean Sample n (flights) Sample Pass. 2.00% 3.40% 252 19,953 3.00% 5.10% 146 11,560 4.00% 6.79% 92 7,285 Table B-12. Sample sizes for cluster sampling with random sampling flights for 95% confidence interval widths 2%, 3%, and 4%. 33 Note that if there was no difference between sectors, so that the mean percentage of passengers dropped off at the curb was 58.9% for all ï¬ights, the standard deviation in the percentage between ï¬ights, Ïci, would equal the standard deviation of the mean for each ï¬ight. Then Ïci 2 = pi (1 â pi) / (Ni / Mi) where Ni / Mi is the average num- bers of passengers on each ï¬ight in sector i. Thus the values of Ïci for the short-haul, long-haul, and international sectors would be 7.0%, 4.5%, and 3.8%, respectively, and Ïc would be 6.2%.
Sample Sizes, Sample Estimates, and Confidence Intervals B-15 Cluster Sampling with Stratified Sampling of Flights and All Passengers on Selected Flights Surveyed Now consider the case where ï¬ights to be surveyed are determined using stratiï¬ed sampling and all passengers on each of the selected ï¬ights are surveyed. The ï¬ight sample size is deter- mined for a given width of conï¬dence interval using Equation 8, where the units sampled in each stratum are clusters rather than individuals. Since ï¬ights are being sampled, rather than passen- gers, the standard deviation ÏXi in Equation 8 is the standard deviation of the average percent- age of passengers using the curb for each ï¬ight, Ïci, as shown in Table B-13. Substituting 0.01000 from Table B-13 for âi(Wi Ïci2) in Equation 8, the number of ï¬ights to be sampled, n, was found for three widths of the 95% conï¬dence intervalsâ±2%, ±3%, and ±4%âas shown in Table B-14. The numbers of ï¬ights in each sector and estimated number of passengers (based on average numbers of passengers per ï¬ight in that sector) are as shown in Table B-15. Departing Flights Sector of Flight Total Ni Wi = Ni/N Est. Avg. % at curb, pi SD Short-Haul Domestic 405 0.6639 0.40 0.10 0.00664 Long-Haul Domestic 136 0.2230 0.60 0.10 0.00223 International 69 0.1131 0.90 0.10 0.00113 Total 610 1.0000 0.589 0.01000 Ïci Wi Ïci 2 Table B-13. Calculation of standard deviation of sample mean for cluster sampling with flights stratified by sector and all passengers on selected flights surveyed. 95% C.I. Mean ± w 95% C.I. ± w w as % of mean Sample n (flights) 2.00% 3.40% 83 3.00% 5.09% 40 4.00% 6.79% 24 Table B-14. Sample number of flights for cluster sampling with flights stratified by sector and all passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. C.I. Width ± 2% C.I. Width ± 3% C.I. Width ± 4%Sector of Flight Flights Pass. Flights Pass. Flights Pass. Short-Haul Domestic 55 2,750 27 1,350 16 800 Long-Haul Domestic 19 2,280 9 1,080 5 600 International 9 1,530 5 850 3 510 Total* 83 6,560 41 3,280 24 1,910 * Total may be higher than previous table as number of flights must be an integer Table B-15. Sample sizes by sector for cluster sampling with flights stratified by sector and all passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%.
B-16 Guidebook for Conducting Airport User Surveys The stratiï¬cation of ï¬ights by sector results in a large reduction in the numbers of ï¬ights and passengers to be surveyed. In this example, much of the variation in the variable of interest is explained by the ï¬ight sector, which results in a large reduction in sample size compared to ran- dom sampling of ï¬ights. By sampling the ï¬ights by sector, the likelihood of selecting a sample with close to the actual proportions of passengers in each sector is much greater than when ran- domly sampling ï¬ights. Again note that the results here reflect the assumptions regarding vari- ation considered in this example and will vary in other situations. Cluster Sampling with Stratified Sampling of Flights and a Sample of Passengers on Selected Flights Now consider the case where ï¬ights to be surveyed are determined using stratiï¬ed sampling and a sample of passengers on each of the selected ï¬ights are surveyed. Assume initially that 50% of passengers on the selected ï¬ights are surveyed. The variance of the estimate is greater than with 100% sampling of each ï¬ight as it includes both the variation between ï¬ights (as before) and the variation due to sampling of passengers on individual ï¬ights. It is calculated from Equa- tions 17 and 18 as follows: where Ïci is the standard deviation of the mean percentage using the curb across flights in sector i Ni is the number of ï¬ights in sector i (N is total over all sectors) ni is the number of ï¬ights sampled in sector i (n is total over all sectors) Mi is the average number of passengers on a ï¬ight in sector i mi is the average number of passengers sampled on a ï¬ight in sector i ( = f Mi ) pi is the probability of a passenger on a ï¬ight in sector i being dropped off at the curb f is the proportion of passengers sampled on a ï¬ight ( = mi / Mi, assumed the same for all ï¬ights). The ï¬ight sample size is determined for a given conï¬dence interval X â ± w by solving the fol- lowing relationships for n: w = 1.96 ÏXâ 2 where ÏXâ 2 is given by the equation above. The summations over the sectors for calculating ÏXâ 2 are determined for a given n value as shown in Table B-16 (n = 117 used in table). The number of ï¬ights to be sampled, n, was found by setting an approximate value initially and determining the width, w, then adjusting the value of n until the appropriate value of w was obtained. In the table, ÏXâ 2 is evaluated for n = 117 and the resulting value of w is 0.0200 or 2.00%. Samples sizes for three widths of the 95% conï¬dence intervalsâ±2%, ±3%, and ±4%âwere found to be as shown in Table B-17. Ï ÏX i cii ns i in N n N N N f p 2 2 1 1 1 1 1= â( ) ( ) + ( ) â( ) â = â p mi iins ( ) â( )=â 11 Calculate ÏX 2 for n = 117 Departing Flights Sampled on Each Flight Between Cluster Within Cluster Between + Within Sector of Flight Total, Ni Wi = Ni / N Est. Avg. % at Curb, pi Pass. on Each Flight, Mi % f # mi SD Between Flights, (1 â n/N ) / n (Ni / N) Ïci2 (1 / Ni) (1 â f) pi (1 â pi) /(mi â 1) Total Short-Haul Dom. 405 0.6639 0.40 50 50% 25 0.100 0.0000459 0.0000123 0.0000582 Long-Haul Dom. 136 0.2230 0.60 120 50% 60 0.100 0.0000154 0.0000150 0.0000304 International 69 0.1131 0.90 170 50% 85 0.100 0.0000078 0.0000078 0.0000156 Total 610 1.0000 0.59 Ïci â ÏX 2 = 0.0001041â w = 1.96 ÏX = 0.0200â Table B-16. Calculation of standard deviation of sample mean for cluster sampling with flights stratified by sector and a 50% sample of passengers on selected flights.
Sample Sizes, Sample Estimates, and Confidence Intervals B-17 The numbers of ï¬ights in each sector and estimated number of passengers (based on average numbers of passengers per ï¬ight in that sector) are as shown in Table B-18. The surveying of only a 50% sample of passengers on each ï¬ight resulted in an increase in the number of ï¬ights to be surveyed from 83 to 117 for the ±2% accuracy case. However, since only 50% of passengers on these ï¬ights are to be surveyed, the total number of passengers decreased from 6,560 to 4,615. For surveys conducted in the departure lounge, it is almost impossible to sur- vey all passengers on a ï¬ight due to reasons given in Chapter 5 of the guidebook. In practice it may be possible to obtain complete responses from 50% of passengers, in which case the number of ï¬ights to be surveyed based on the 50% passenger sample should be used. It is evident that the total number of passengers that need to be surveyed can be reduced by reducing the percentage of passengers sampled on each ï¬ight, but the number of ï¬ights surveyed increases. Several other cases were examined using this example: ⢠If 75% of passengers on each of the selected ï¬ights were to be surveyed, a sample of 92 ï¬ights and 5,580 passengers would be required. ⢠If 30% of passengers on each of the selected ï¬ights were to be surveyed, a sample of 268 ï¬ights and 6,360 passengers would be required. The optimal balance for a particular survey will depend on the variation in responses between and within ï¬ights, and on the relative costs of surveying passengers and ï¬ights, which vary from survey to survey. Another important consideration with interview surveys in gate lounges, discussed in Chapter 5 of the guidebook, is the limitation on the number of interviews that each interviewer can complete in the time window between when passengers start to arrive in the gate lounge and the start of ï¬ight boarding. As a practical matter, this limits the number of passengers who can be surveyed on a given ï¬ight. Again note that the results here reflect the assumptions regarding variation considered in this example and will vary in other situations. Table B-17. Sample number of flights for cluster sampling with flights stratified by sector and a 50% sample of passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. Mean ± w 95% C.I. ± w w as % of mean Sample n (flights) 2.00% 3.40% 117 3.00% 5.09% 47 4.00% 6.79% 26 Table B-18. Sample sizes by sector for cluster sampling with flights stratified by sector and a 50% sample of passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. C.I. Width ± 2% C.I. Width ± 3% C.I. Width ± 4%Sector of Flight Flights Pass. Flights Pass. Flights Pass. Short-Haul Domestic 78 1,950 31 775 17 425 Long-Haul Domestic 26 1,560 11 660 6 360 International 13 1,105 5 425 3 255 Total 117 4,615 47 1,860 26 1,040