Cover Image

Not for Sale



View/Hide Left Panel
Click for next page ( 162


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 161
APPENDIX B Sample Sizes, Sample Estimates, and Confidence Intervals This appendix outlines how to determine the sample size so that the confidence interval of a sample estimate will be within a specified range and, when analyzing the results, how to estimate values and confidence intervals for characteristics of the population. The determination of sam- ple sizes and confidence intervals are provided for the four types of probability sampling, namely: random, sequential, stratified, and cluster sampling. The description is only a brief summary of the methods as applicable for airport surveys and the reader is encouraged to refer to statistical texts listed in the bibliography for a more complete description. Examples are then provided for applying these methods to determine sample sizes for small population sizes such as employee or tenant surveys, and for passenger surveys using each of the possible sampling strategies. Throughout this section, it is assumed that the sample size will be large enough that the sample means will be approximately Normally distributed, per the Central Limit Theorem. Generally a sample size of at least 30 is considered to provide a reasonably good approxima- tion to the Normal distribution. Most airport user surveys will have a much larger sample size than this. Random Sampling Suppose we are interested in a characteristic of the population which we denote by X. For example, X could be the variable, gender, where: X = 0 if passenger is male 1 if passenger is female Let the mean value of this characteristic be . In the example, is the proportion of the pas- sengers that are female. Using random sampling the sample mean is an unbiased estimator for the population mean, . If the sample size is n, the sample mean is: X = xi n (1) where X is the sample mean xi is the ith individual in the sample, i = 1, . . . , n. The 95% confidence interval for (the range that contains the true value of with a proba- bility of 0.95) is given by: X 1.96 X B-1

OCR for page 161
B-2 Guidebook for Conducting Airport User Surveys where X is the standard deviation of the sample mean. For random sampling without replace- ment, it can be shown that: X = X (1 - n N ) n where X is the standard deviation of the variable, X N is the number of individuals in the population. Hence the 95% confidence interval for is given by: X 1.96 X (1 - n N ) n (2) For very large populations (1 n/N) is approximately one, and this reduces to: X 1.96 X n The constant, 1.96, is applicable for a 95% confidence interval. For a 99% confidence inter- val, the constant should be 2.58, and for a 90% confidence interval, use 1.65. If the sample size is to be chosen so that the sample mean is within w of the population mean, , with a probability of 0.95, then the 95% confidence interval is given by: X w (note that the width of the confidence in nterval is 2w ) Thus, w = 1.96 X (1 - n N ) n This can be solved to give the sample size, n: 1.962 X2 n= (3) w 2 + 1.962 X 2 N In the survey planning stage when trying to determine the sample size, the standard deviation of the variable X, X, will be unknown and must be estimated based on data from previous sur- veys, other similar airports, or knowledge of the airport. For categorical variables, the standard deviation is related to the expected proportion of indi- viduals in a category: X = p (1 - p ) w = 1.96 p (1 - p )(1 - n N ) n where p is the proportion of the population in the category of interest. For the example above, where the categorical valuable is the respondent gender, assume that we are interested in estimating the proportion of female passengers. Thus, p is the proportion of female passengers. If we initially estimate this proportion to be 0.5, then for a sample size of n = 400 and population size, N, of 50,000, the width of the 95% confidence interval, w, is 0.049. The sample size can be chosen to obtain the required accuracy using Equation 3. If X represents the mode of transport to the airport, and the category, taxi or limousine, has an estimated proportion of 0.20, then the width of the 95% confidence interval, w, is 0.039 for a sample size of n = 400 and population size of N = 50,000. For a non-categorical variable, such as time in the terminal before departure or expenditure at the concessions, the standard deviation is determined from the variation of the variable. Again, this

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-3 will be unknown at the planning stage of the survey. If an estimate of the standard deviation, X, can be obtained from past surveys, other similar airports, or knowledge of the airport, the required sam- ple size for a given confidence interval width can be determined in a similar way using Equation 2. Once the survey has been completed, the standard deviation of X, X, can be estimated from the sample values of X for use in determining the accuracy of the estimated population mean and confidence intervals. The sample standard deviation is the square root of the sample variance, 2 sX , which is given by: = ( xi - X ) (n - 1) 2 2 sX where X is the sample average. Sequential Sampling Sequential sampling is equivalent to random sampling if the order of the sample with respect to the characteristics of interest is essentially random. In this case, sample sizes and estimates of standard deviation and confidence intervals are determined in the same way as outlined above for random sampling. Sequential sampling can result in a more representative sample, and thus narrower confidence intervals, by ensuring a more even spread of sampled individuals over the population (provided the cases where serious biases can occur are avoided31). Calculation of the confidence intervals and required sample sizes are difficult and dependent on the relationships between the ordering variable and the characteristics of the population of interest. In airport surveys, it can usually be assumed that the width of the confidence interval is no wider than that which would occur with random sampling and the expressions given for random sampling above can be used. Stratified Sampling With stratified sampling, the population is divided into groups, referred to as strata, and each stratum is sampled separately using random or sequential sampling. Assume there are ns strata and the proportion of the population in the ith strata is Wi. The sample mean for the ith stratum is: X i = j =1 x ij ni ni (4) and the sample mean for the population is estimated by: X = i =1Wi X i ns (5) where Xi is the sample mean of the stratum i xij is the value of characteristic X for individual j in stratum i ni is the number of individuals in stratum i Wi is the proportion of the population in stratum i, Ni/N. The standard deviation of the sample mean, X, is given by: X = i W i2 X 2 i 31 Serious biases can occur if a characteristic of interest occurs in a cyclic order in the population list and the length of each cycle corresponds to the sampling fraction, but this would be rare in airport surveys.

OCR for page 161
B-4 Guidebook for Conducting Airport User Surveys where X is the standard deviation of the mean for members of stratum i, X , and is given by: i i Xi = Xi (1 - ni N i ) ni 2 where Xi is the standard deviation of X for members of stratum i ni and Ni are the sample and population sizes for stratum i, respectively. Hence the standard deviation of the sample mean can be expressed as: X = i Wi Xi (1 - ni N i ) ni 2 2 The accuracy of the sample estimate is improved if the variation within each stratum (mea- sured by X ) is less than the variation over the whole population. i As above, the 95% confidence interval, 2w, that the sample mean is within w of the popula- tion mean, , is given by: w = 1.96 X (6) With proportional stratified sampling, the sampling fraction is the same for each stratum and equals the proportion of the population in the stratum, Wi. Thus, ni N i = n N and Wi = ni n = N i N where n and N are the total sample and population sizes, respectively. In this case, the sample mean given by Equation 4 reduces to that for a random sample, Equa- tion 1, and X = {(1 - n N ) n ( W )} i i 2 Xi (7) To determine the required sample size, estimates of Xi can be determined in the survey plan- ning stage. These must be estimated from results of previous surveys, other similar airports, or knowledge of the airport. Once approximate estimates of Xi have been obtained, the total sam- ple size, n, for a confidence interval of width 2w is then found using the relationship (from Equa- tions 6 and 7): w = 1.96 {(1 - n N ) n (W )} i i 2 Xi which can be solved to give the sample size, n, as: 1.962 i (Wi Xi ) 2 n= (8) w 2 + 1.962 i (Wi Xi ) 2 N If separate estimates for each stratum cannot be obtained in the planning stage, an approxi- mate estimate of the standard deviation of X over the total population could be used to provide a conservative estimate of the sample size, n. For proportional stratified samples, the sample size for each stratum, ni, is then found by: ni = n Wi

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-5 For determining confidence intervals of the final estimates using the survey data, the standard deviation of X in each strata, Xi , can be estimated by the sample standard deviation, sXi , deter- mined from the sample values in each stratum. The 95% confidence interval is then: X 1.96 {(1 - n N ) n (W s )}i i 2 Xi (9) With non-proportional stratified sampling, the sampling fractions differ for each stratum. The sample mean for each stratum and overall sample mean can be found using Equations 4 and 5, respectively. The sample sizes, ni , are chosen so that a confidence interval of width 2w is given by the relationship: w = 1.96 i W i Xi (1 - ni N i ) ni 2 2 (10) In this case, many different combinations of ni could be chosen to produce a confidence inter- val width of 2w. The choice of ni is dependent on the reason for choosing to use non-proportional stratified sampling. To achieve a similar level of accuracy in the results for each stratum, it will be necessary to use non-proportional stratified sampling, with the sample size in each stratum inversely proportional to the variance of the characteristic within that stratum. Thus, ni = k / Xi 2 , where k is a constant. Substituting this into Equation 10 and solving for k gives: k= i W i2 ( Xi 2 2 ) w2 1.96 + ( i Wi 2 Xi 2 2 Ni ) The sample size for each stratum is then found using the relationship: ni = k / Xi 2 Because the actual variance in the characteristic in the survey responses for each stratum, Xi 2 , will not be known until the survey has been performed, it will be necessary to make an initial assumption of the differences in the variance across the strata in order to determine the propor- tion of the survey responses to assign to each stratum. These assumptions can be based on the results of prior surveys or of surveys performed at similar airports. As before, confidence intervals for the final estimates can be determined using the survey data. The standard deviation of X in each strata, Xi, can be estimated by the sample standard devia- tion for each strata, sXi , determined from the sample values. The 95% confidence interval is then: X 1.96 i Wi s Xi (1 - ni N i ) ni 2 2 (11) Cluster Sampling As described in Section 3.2 of the guidebook, with cluster sampling the population is divided into clusters (or groups) and clusters rather than individual members of the population are sam- pled. In single stage cluster sampling, all individuals in the cluster are sampled so a complete pic- ture of the population with the sampled clusters is obtained. For example in a survey of departing passengers, flights could be used as clusters and a sample of flights would then be chosen and all passengers on those flights would be surveyed. If two-stage cluster sampling is used, individuals within each cluster are also sampled.

OCR for page 161
B-6 Guidebook for Conducting Airport User Surveys Let N be the numbers of clusters in the population n be the numbers of clusters sampled Mk be the number of individuals in the kth cluster in the population mk be the number of individuals in the kth cluster included in the sample xkj be the value of X for the jth individual in the kth cluster. The average number of individuals per cluster is the total population size divided by the num- ber of clusters: M = k =1 M k N N (12) The sample mean for the kth cluster is: X k = j =1 x kj mk mk (13) The population mean is estimated by: X = k =1 ( M k M ) X k n n (14) When the numbers of individuals in each cluster are equal, Mk = M , and the mean for the population reduces to simply the average of the cluster means, Xk. Where the clusters to be sampled are drawn randomly from the population of all clusters, and a sample of individuals are drawn randomly from each cluster, the variance of the sample mean, X, includes two components of variation, the between and within cluster components, and for a categorical variable is estimated by: X 2 = (1 - n N ) n c2 + (n N ) n k =1 (1 - mk M k ) pk (1 - pk ) n (mk - 1) 2 (15) Between cl luster component Within cluster component where p is the proportion of the total population in the category of interest pk is the proportion of individuals in cluster k in the category of interest c is the variance of the cluster means around the population mean and can be estimated by: c2 = k =1 ( X k - X ) (n - 1) n 2 if all individuals in each selected cluster are sampled, mk = Mk, and the variance is given by the "between cluster component" term only. Clusters could be selected using stratified sampling to reduce the variance between clusters within a given stratum and thus improve the accuracy of the estimate. For example, where the clusters are flights, flights could be stratified into groups such as domestic and international and short- and long-haul. Assume that the clusters are stratified into ns strata and in each stratum, ni clusters are sampled. The population mean is estimated by: X = i =1 k =1 ( M ik M ) X ik n ns ni (16) where ni is the numbers of clusters sampled in the ith stratum Mik is the number of individuals in the kth cluster in the ith stratum Xik is the mean of individuals in the kth cluster in the ith stratum n is the total number of clusters sampled over all strata.

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-7 With proportional stratified sampling of clusters, the between cluster variance component of Equation 15 becomes: Var ( X c ) = (1 - n N ) n i =1Wi ci2 ns (17) where n is the number of clusters sampled (over all strata) N is the number of clusters in the population (over all strata) ci 2 is the variance of cluster means for clusters in stratum i Wi is the proportion of clusters sampled that are in stratum i (= ni / n = Ni / N for proportional stratified sampling). Assuming that the clusters within each stratum have the same size, Mi, the same sample size, mi, and the same proportion of individuals with the category of interest, pi, the within cluster variance component becomes: Var ( X c ) = i =1 k =1(ni N i )(1 - mi M i ) pi (1 - pi ) ni (mi - 1) ns ni 2 = i =1 ni (ni N i )(1 - mi M i ) pi (1 - pi ) ni (mi - 1) ns 2 = i =1 (1 N i )(1 - f ) pi (1 - pi ) (mi - 1) ns (18) where f is the fraction of passengers sampled and is assumed to be constant in all clusters. The calculation of the accuracy of estimates, required sample sizes, and confidence intervals is complex and the reader is referred to Levy and Lemeshow, Sampling of Populations, and Cochran, Sampling Techniques, listed in the bibliography. Comprehensive statistical software programs exist which would be useful for analyzing data from cluster sampling. It should be noted, however, that cluster sampling is less efficient that random, sequential, and stratified sampling and larger sample sizes will be required to obtain the same levels of accuracy. Use of the expressions applicable for random sampling will underestimate the true standard errors of estimated population characteristics and the associated confidence intervals. Preferably, clusters should be chosen so that variation in the characteristics of interest between clusters is small, but within clusters is large. In airport surveys, cluster sampling is commonly used for sampling of flights in departing passenger surveys. For many characteristics such as trip duration, airfares, trip purpose, time at airport, spending in airport concessions, and of course destination and sector, passengers on the same flight will be more likely to have similar values of these characteristics than passengers in general. This homogeneity of characteristics within a flight significantly reduces the efficiency of cluster sampling for analyzing these characteristics. There are relatively few air party characteristics that have a similar distribution across differ- ent flights. For characteristics such as household size, the variation of the characteristics across passengers on one flight is likely to be fairly similar to that of all passengers. In this case, use of cluster sampling should not greatly reduce the sampling efficiency. It can be shown that the variances of the estimates of mean of the characteristic of interest for cluster sampling can be expressed, approximately, as a function of the variance for random sampling: 1 + (mav - 1) 2 XC 2 = XR (19) where X 2 is the variance using cluster sampling C X 2 is the variance using random sampling R

OCR for page 161
B-8 Guidebook for Conducting Airport User Surveys represents the population intra-class correlation mav is the mean number of cases sampled per cluster. The variance using cluster sampling will be greater than using random sampling unless either mav = 1 or i 0. mav = 1 corresponds to the special case where each cluster consists of a single case and is equivalent to random sampling. The intra-class correlation, , is a measure of homo- geneity and if individuals in a cluster are more homogeneous than the population as a whole, will be greater than zero. / 2 The ratio of the variances: X 2 C XR is often referred to as the design effect, DE, and is given by: DE = 1 + (mav - 1) (20) The effective sample size is given by: mav n / DE. Examples of Calculation of Sample Sizes Small Population Size--Using Random or Sequential Sampling for Categorical Variables In this example, the sample size is required for a survey of a relatively small population, such as an employee or tenant survey, using random sampling. The critical characteristics of the pop- ulation being determined are categorical variables, e.g., percentage of employees accessing the airport by private vehicle. The sample sizes depend on the expected proportion in the category and the level of accuracy desired. Sample sizes were determined using Equation 3 for a range of population sizes; for two values of the expected proportion in the category: 50% and 20%; and for three levels of accuracy: 5 percentage points, 3 percentage points, and 2 percentage points (the latter for the expected proportion of 20% only). As discussed in Section 3.2 of the guidebook, an error of 5 percent- age points represents a percentage error in the category proportion of 10% for an expected pro- portion of 50% and 25% for an expected proportion of 20%. As is evident in Table B-1, large sample sizes are required if an accuracy of 10% is required for categories with low proportions of the population. Table B-1. Sample sizes required for categorical variable with expected proportions of 50% and 20% and varying levels of accuracy. Sample Size, n, Required for: Expected Proportion in Category = 50% Expected Proportion in Category = 20% Total Number in Accuracy of 5 Accuracy of 3 Accuracy of 5 Accuracy of 3 Accuracy of 2 Population N Percentage Points Percentage Points Percentage Points Percentage Points Percentage Points (Equivalent to (Equivalent to (Equivalent to (Equivalent to (Equivalent to 10% of Proportion) 6% of Proportion) 25% of Proportion) 15% of Proportion) 10% of Proportion) 50 44 48 42 47 48 100 80 92 71 87 94 200 132 168 110 155 177 500 218 340 165 290 377 1,000 280 515 200 405 605 5,000 360 880 235 600 1,175 Note: Assumes random sampling without replacement.

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-9 Table B-2. Assumed number of flights and average number of originating passengers per flight by market sector and day of week. Sector of Flights Day of Week Short-Haul Long-Haul International Total Domestic Domestic Monday 60 20 8 88 Tuesday 60 20 8 88 Wednesday 60 20 8 88 Thursday 60 20 9 89 Friday 60 20 12 92 Saturday 50 16 12 78 Sunday 55 20 12 87 Total 405 136 69 610 Avg. Originating 50 120 170 79.2 Passengers/Flight Passenger Survey--Using Random, Stratified, and Cluster Sampling A survey of air passengers is to be undertaken to obtain information on airport access trips. A critical question to be answered is (say): What is the percentage of passengers dropped off at the curb outside departures check-in? It was decided to determine the sample sizes for each of the different sampling types and chose the most cost-effective method. It is known that the percentage dropped off at the curb varies greatly by characteristics such as trip purpose, flight sector (e.g., short-haul domestic, long-haul domestic, international), day of the week, time of day, etc. The trip purpose distribution of pas- sengers is not known at the sampling stage and so could not be used. In addition to random sam- pling, stratified sampling with passengers stratified by flight sector or day of the week, and cluster sampling with flights stratified by flight sector were examined. The survey is planned to be conducted during a two-week period. The flight schedule is obtained from the Official Airline Guide and the numbers of flights per sector by day of the week and the estimated average number of originating passengers per flight (estimated using average load factors and percentages of connecting passengers) are as in Table B-2. To determine the sample size required, it is necessary to have at least approximate estimates of the mean and standard deviation of the variable of interest--in this case the percentage of pas- sengers dropped off at the curb. From knowledge of passengers using the airport, the percentage of passengers dropped off at the curb was estimated in Table B-3. Table B-3. Estimated percentage of passengers dropped off at the curb. Passengers Dropped Off at Curb Total Sector of Flight Mean, p Standard deviation Pass. Short-Haul Domestic 40% 0.490 20,250 Long-Haul Domestic 60% 0.490 16,320 International 90% 0.300 11,730 Overall 58.9% 0.492 48,300

OCR for page 161
B-10 Guidebook for Conducting Airport User Surveys The overall mean percentage of 58.9% is a weighted average of the means for each sector with weights being the numbers of passengers in each sector. Since the variable of interest is a categorical variable, the standard deviation (SD) is given by: p (1 - p ) , where p is the proportion of the population in the category of interest (i.e., percentage dropped at curb). Random Sampling of Passengers A random sample of originating passengers could be surveyed as they exit the security line. In this case, the sample size is determined for a given width of confidence interval using Equation 3, where Total population N = 48, 300 Mean proportion using curb X = p = 0.59 SD for individual pass. X = 0.492 The sample size, n, was found using Equation 3 for three widths of the 95% confidence inter- vals (C.I.)--2%, 3%, and 4%--as shown in Table B-4. A simple approximation, given in Section 3.4.1 of the guidebook32, could also have been used: n = 40, 000 p (1 - p ) (100 w ) 2 Using this equation, the estimated sample sizes for the 2%, 3%, and 4% cases are: 2,411, 1,074, and 606. The approximation leads to slightly higher estimates of the required sample sizes. If, for example, it was decided that the narrow confidence interval is appropriate, i.e., the mean estimate should be accurate to within 2%, a sample size of 2,218 is required. This corresponds to a sampling fraction of 4.6% for a population of 48,300, and if using sequential sampling every 21st passenger passing through security should be surveyed. Stratified Sampling of Passengers--Stratified by Sector Consider the case where stratified sampling is used to select passengers to be surveyed and pas- sengers are stratified by the sector of the flight. Assume that at this airport, passengers on the dif- ferent sectors use different security screening checkpoints, thus allowing passengers on each flight sector to be sampled separately. We consider here the simple case where proportional stratified sampling is used. Thus the pro- portion of the sample size in each flight sector is equal to the proportion of the total passengers in each flight sector, Wi = ni/n = Ni/N. Table B-4. Sample sizes for random sampling of passengers for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. 95% C.I. w Sample Mean w w as % of mean n 2.00% 3.40% 2,218 3.00% 5.10% 1,012 4.00% 6.79% 574 32 The denominator in the equation in Section 3.4.1 is a2 where the width of the 95% confidence interval is a where a is expressed in percentage points. Since w above is not expressed in percentage points, w = a/100.

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-11 Table B-5. Calculation of standard deviation of sample mean for stratified sampling of passengers by market sector. Est. Avg. Enplaned Pass. % of SD Sector of Flight Proportion Wi Xi 2 Total, Ni Total Wi Xi at Curb, p Short-Haul Domestic 20,250 0.4193 0.40 0.49 0.10062 Long-Haul Domestic 16,320 0.3379 0.60 0.49 0.08109 International 11,730 0.2429 0.90 0.30 0.02186 Total 48,300 1.0000 0.59 0.20357 The sample size is determined using Equation 8. Table B-5 shows the calculation of the sum- mation over the three flight sectors. The standard deviation for each sector is found using the relationship applicable for categorical variables: pi (1 - pi ) , where pi is the proportion of the population in the category of interest for flights in sector i (i.e., percentage dropped at the curb). Substituting 0.20357 from Table B-5 for i(Wi Xi 2 ) in Equation 8 for three C.I. widths of 2%, 3%, and 4% gives the sample sizes, n, in Table B-6. The sample sizes for each flight sector are then found, based on the proportion of passengers in each sector, Wi, to be as shown in Table B-7. Comparing the total sample size with that found with random sampling, we find that stratifi- cation by flight segment has reduced the required sample size for the 2% case from 2,218 to 1,879--a reduction of 15%. Note that the size of the reduction is very dependent on the vari- ation in the mean responses across the different strata. Table B-6. Sample sizes for stratified sampling of passengers by sector of flight for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. 95% C.I. w Sample Mean w w as % of mean n 2.00% 3.40% 1,879 3.00% 5.09% 854 4.00% 6.79% 484 Table B-7. Sample sizes by sector for stratified sampling of passengers by sector of flight for 95% confidence interval widths 2%, 3%, and 4%. Sample Size for C.I. Width Sector of Flight 2% 3% 4% Short-Haul Domestic 788 358 203 Long-Haul Domestic 635 288 163 International 456 207 118 Total 1,879 853 484

OCR for page 161
B-12 Guidebook for Conducting Airport User Surveys Table B-8. Calculation of standard deviation of sample mean for stratified sampling of passengers by day of week. Originating Passengers Est. Avg. SD Day Short-Haul Long-Haul Total % of Proportion Wi Xi 2 International Xi Domestic Domestic Ni Total Wi at Curb, pi Monday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427 Tuesday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427 Wednesday 3,000 2,400 1,360 6,760 0.1400 0.572 0.495 0.03427 Thursday 3,000 2,400 1,530 6,930 0.1435 0.580 0.494 0.03496 Friday 3,000 2,400 2,040 7,440 0.1540 0.602 0.490 0.03692 Saturday 2,500 1,920 2,040 6,460 0.1337 0.617 0.486 0.03160 Sunday 2,750 2,400 2,040 7,190 0.1489 0.609 0.488 0.03546 Total 20,250 16,320 11,730 48,300 1.0000 0.589 0.24175 Stratified Sampling of Passengers--Stratified by Day of Week Now consider the case where the passengers are stratified by the day of the week. This form of stratification is easy to implement during the conduct of the survey, and numbers of passengers are known, at least approximately, at the sample design stage. Again consider the simple case where proportional stratified sampling is used. Thus the pro- portion of the sample size on each day of the week is equal to the proportion of the total passen- gers in each day of the week. It was assumed that the proportion of people using the curb on each weekday was entirely explained by the sector of their flight. Thus, the average percentage of passengers using the curb was estimated for each day by the weighted average of the percentages for each flight sector, with weights equal to the numbers of passengers on that day to each sector. The sample size is determined using Equation 8. Table B-8 shows the calculation of the sum- mation over the days of the week. The standard deviation for each day is found using the relationship applicable for categorical variables: p (1 - p ) , where pi is the proportion of the population in the category of interest (i.e., percentage dropped at curb) on day i. Substituting 0.24175 from Table B-8 for i(Wi Xi 2 ) in Equation 8 for three C.I. widths of 2%, 3%, and 4% gives the sample sizes, n, in Table B-9. The sample sizes for each day of the week are then found, based on the proportion of passen- gers on each day of the week, Wi, to be as shown in Table B-10. Table B-9. Sample sizes for stratified sampling of passengers by day of week for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. 95% C.I. w Sample Mean w w as % of mean n 2.000% 3.40% 2,215 3.000% 5.09% 1,010 4.000% 6.79% 574

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-13 Table B-10. Sample sizes for each day for stratified sampling of passengers by day of week for 95% confidence interval widths 2%, 3%, and 4%. Sample Size for C.I. Width Day of Week 2% 3% 4% Monday 310 141 80 Tuesday 310 142 80 Wednesday 310 142 80 Thursday 318 145 82 Friday 341 156 89 Saturday 296 135 77 Sunday 330 150 86 Total 2,215 1,011 574 Comparing the total sample size with that found with random sampling, we find that stratifi- cation by day of the week has reduced the required sample size for the 2% case from 2,218 to 2,215--a reduction of only 0.1%. Thus, in this case stratification makes almost no difference to the required sample size. This is due to the low variability in the average percentage of passen- gers at the curb, pi, over the various days of the week. Note that the size of the reduction is very dependent on the variation in the mean responses across the different strata. This varies depending on the variable of interest and in some cases could vary greatly over the days of the week making stratification by day of the week worthwhile. Cluster Sampling of Flights--Additional Assumptions A very common form of sampling for passenger surveys is to select a sample of flights to sur- vey and to sample either all, or a portion, of passengers on those flights. This is a form of cluster sampling where each flight represents a cluster. Using the same example as above, the pertinent characteristics required to estimate the sam- ple size are given in Table B-11. Cluster sampling is very dependent on one parameter not relevant to the passenger sampling considered above--the variation in the mean value for each flight of the characteristic of interest (i.e., percentage of passengers dropped at the curb) over the range of flights, ci. An estimate of Table B-11. Assumed characteristics for flights in three market sectors for illustrative examples of cluster sampling. For Flight Sector, i Total Quantity Short-Haul Long-Haul Symbol International Symbol Value Domestic Domestic No. of Departing Flights Ni 405 136 69 N 610 Avg. Originating Pass./Flight Mi /Ni 50 120 170 M/N 79.2 Total Originating Passengers Mi 20,250 16,320 11,730 M 48,300 Proportion of Flights in Sector Wi = Ni /N 0.6639 0.2230 0.1131 1.0000 % Dropped at Curb Xi = pi 40% 60% 90% X= p 58.9% Difference from Overall Avg. Xi X -19% 1% 31% SD in % Between Flights: c i 10% 10% 10% c 21.1%

OCR for page 161
B-14 Guidebook for Conducting Airport User Surveys this variation, expressed in terms of the standard deviation, is given in Table B-11. It is estimated from previous surveys and knowledge of passengers at the airport that the standard deviation is 10% around the mean value for each sector33. Thus, for short-haul flights the mean value of the percentage using the curb for each flight would be expected to be between 20.4% and 59.6% for 95% of flights [= 40% 1.96 x 10%]. The standard deviation over all flights includes both the vari- ation between flights within each sector and the variation between sectors and is given by: c = { i Wi ci 2 + ( Xi - X ) 2 } where Wi is the proportion of flights in sector i (= Ni / N). Cluster Sampling with Random Sampling Flights If a random sample of flights is selected and all passengers on each of the selected flights are sur- veyed, the sample size is determined for a given width of confidence interval using Equation 3, where Total population (flights) N = 610 Mean proportion using curb X = p = 0.589 SD for individual flight X = c = 0.211 The number of flights to be sampled, n, was found using Equation 3 for three widths of the 95% confidence interval--2%, 3%, and 4%--as shown in Table B-12. Since all passengers on each flight are sampled, the number of passengers sampled is the number of flights sampled multiplied by the average number of passengers per flight (M = M / N). Comparing the total passenger sample size with that found with random sampling of pas- sengers, we find that cluster sampling by flight has increased the required sample size greatly--for 2% accuracy from 2,218 to 19,953. This is due to the high variation in the char- acteristics of interest between flights. In other cases, the additional sample size with cluster- ing may be much less. For example, if the mean percentage was 50% for each sector (instead of 40%, 60%, and 90%), the sample size for 2% accuracy would be 83 flights or 6,572 pas- sengers. The increase in the sample size is very dependent on the variation in the mean responses for a flight across the different flights and the above example may not be typical in general. Table B-12. Sample sizes for cluster sampling with random sampling flights for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. 95% C.I. w Sample Sample Mean w w as % of mean n (flights) Pass. 2.00% 3.40% 252 19,953 3.00% 5.10% 146 11,560 4.00% 6.79% 92 7,285 33 Note that if there was no difference between sectors, so that the mean percentage of passengers dropped off at the curb was 58.9% for all flights, the standard deviation in the percentage between flights, ci, would equal the standard deviation of the mean for each flight. Then ci 2 = pi (1 - pi) / (Ni / Mi) where Ni / Mi is the average num- bers of passengers on each flight in sector i. Thus the values of ci for the short-haul, long-haul, and international sectors would be 7.0%, 4.5%, and 3.8%, respectively, and c would be 6.2%.

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-15 Table B-13. Calculation of standard deviation of sample mean for cluster sampling with flights stratified by sector and all passengers on selected flights surveyed. Departing Flights Est. Avg. % SD Sector of Flight at curb, Wi ci 2 Total Wi = Ni /N ci Ni pi Short-Haul Domestic 405 0.6639 0.40 0.10 0.00664 Long-Haul Domestic 136 0.2230 0.60 0.10 0.00223 International 69 0.1131 0.90 0.10 0.00113 Total 610 1.0000 0.589 0.01000 Cluster Sampling with Stratified Sampling of Flights and All Passengers on Selected Flights Surveyed Now consider the case where flights to be surveyed are determined using stratified sampling and all passengers on each of the selected flights are surveyed. The flight sample size is deter- mined for a given width of confidence interval using Equation 8, where the units sampled in each stratum are clusters rather than individuals. Since flights are being sampled, rather than passen- gers, the standard deviation Xi in Equation 8 is the standard deviation of the average percent- age of passengers using the curb for each flight, ci, as shown in Table B-13. Substituting 0.01000 from Table B-13 for i (Wi ci2 ) in Equation 8, the number of flights to be sampled, n, was found for three widths of the 95% confidence intervals--2%, 3%, and 4%--as shown in Table B-14. The numbers of flights in each sector and estimated number of passengers (based on average numbers of passengers per flight in that sector) are as shown in Table B-15. Table B-14. Sample number of flights for cluster sampling with flights stratified by sector and all passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. 95% C.I. w Sample Mean w w as % of mean n (flights) 2.00% 3.40% 83 3.00% 5.09% 40 4.00% 6.79% 24 Table B-15. Sample sizes by sector for cluster sampling with flights stratified by sector and all passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. C.I. Width 2% C.I. Width 3% C.I. Width 4% Sector of Flight Flights Pass. Flights Pass. Flights Pass. Short-Haul Domestic 55 2,750 27 1,350 16 800 Long-Haul Domestic 19 2,280 9 1,080 5 600 International 9 1,530 5 850 3 510 Total* 83 6,560 41 3,280 24 1,910 * Total may be higher than previous table as number of flights must be an integer

OCR for page 161
B-16 Guidebook for Conducting Airport User Surveys The stratification of flights by sector results in a large reduction in the numbers of flights and passengers to be surveyed. In this example, much of the variation in the variable of interest is explained by the flight sector, which results in a large reduction in sample size compared to ran- dom sampling of flights. By sampling the flights by sector, the likelihood of selecting a sample with close to the actual proportions of passengers in each sector is much greater than when ran- domly sampling flights. Again note that the results here reflect the assumptions regarding vari- ation considered in this example and will vary in other situations. Cluster Sampling with Stratified Sampling of Flights and a Sample of Passengers on Selected Flights Now consider the case where flights to be surveyed are determined using stratified sampling and a sample of passengers on each of the selected flights are surveyed. Assume initially that 50% of passengers on the selected flights are surveyed. The variance of the estimate is greater than with 100% sampling of each flight as it includes both the variation between flights (as before) and the variation due to sampling of passengers on individual flights. It is calculated from Equa- tions 17 and 18 as follows: = (1 - n N ) n i =1 ( N i N ) ci + i =1 (1 N i )(1 - f ) pi (1 - pi ) (mi - 1) ns ns X 2 2 where ci is the standard deviation of the mean percentage using the curb across flights in sector i Ni is the number of flights in sector i (N is total over all sectors) ni is the number of flights sampled in sector i (n is total over all sectors) Mi is the average number of passengers on a flight in sector i mi is the average number of passengers sampled on a flight in sector i ( = f Mi ) pi is the probability of a passenger on a flight in sector i being dropped off at the curb f is the proportion of passengers sampled on a flight ( = mi / Mi, assumed the same for all flights). The flight sample size is determined for a given confidence interval X w by solving the fol- lowing relationships for n: 2 2 w = 1.96 X where X is given by the equation above. 2 The summations over the sectors for calculating X are determined for a given n value as shown in Table B-16 (n = 117 used in table). The number of flights to be sampled, n, was found by setting an approximate value initially and determining the width, w, then adjusting the value of n until the appropriate value of w was 2 is evaluated for n = 117 and the resulting value of w is 0.0200 or 2.00%. obtained. In the table, X Samples sizes for three widths of the 95% confidence intervals--2%, 3%, and 4%--were found to be as shown in Table B-17. Table B-16. Calculation of standard deviation of sample mean for cluster sampling with flights stratified by sector and a 50% sample of passengers on selected flights. Calculate X 2 for n = 117 Departing Est. Pass. on Sampled on SD Between Within Between Flights Avg. % Each Each Flight Between Cluster Cluster + Within Sector of Flight (1 / Ni) (1 f) Total, Wi = at Curb, Flight, % # Flights, (1 n/N ) / n ci (Ni / N) ci 2 pi (1 pi) / Total Ni Ni / N pi Mi f mi (mi 1) Short-Haul Dom. 405 0.6639 0.40 50 50% 25 0.100 0.0000459 0.0000123 0.0000582 Long-Haul Dom. 136 0.2230 0.60 120 50% 60 0.100 0.0000154 0.0000150 0.0000304 International 69 0.1131 0.90 170 50% 85 0.100 0.0000078 0.0000078 0.0000156 Total 610 1.0000 0.59 X 2 = 0.0001041 w = 1.96 X = 0.0200

OCR for page 161
Sample Sizes, Sample Estimates, and Confidence Intervals B-17 Table B-17. Sample number of flights for cluster sampling with flights stratified by sector and a 50% sample of passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. 95% C.I. 95% C.I. w Sample Mean w w as % of mean n (flights) 2.00% 3.40% 117 3.00% 5.09% 47 4.00% 6.79% 26 The numbers of flights in each sector and estimated number of passengers (based on average numbers of passengers per flight in that sector) are as shown in Table B-18. The surveying of only a 50% sample of passengers on each flight resulted in an increase in the number of flights to be surveyed from 83 to 117 for the 2% accuracy case. However, since only 50% of passengers on these flights are to be surveyed, the total number of passengers decreased from 6,560 to 4,615. For surveys conducted in the departure lounge, it is almost impossible to sur- vey all passengers on a flight due to reasons given in Chapter 5 of the guidebook. In practice it may be possible to obtain complete responses from 50% of passengers, in which case the number of flights to be surveyed based on the 50% passenger sample should be used. It is evident that the total number of passengers that need to be surveyed can be reduced by reducing the percentage of passengers sampled on each flight, but the number of flights surveyed increases. Several other cases were examined using this example: If 75% of passengers on each of the selected flights were to be surveyed, a sample of 92 flights and 5,580 passengers would be required. If 30% of passengers on each of the selected flights were to be surveyed, a sample of 268 flights and 6,360 passengers would be required. The optimal balance for a particular survey will depend on the variation in responses between and within flights, and on the relative costs of surveying passengers and flights, which vary from survey to survey. Another important consideration with interview surveys in gate lounges, discussed in Chapter 5 of the guidebook, is the limitation on the number of interviews that each interviewer can complete in the time window between when passengers start to arrive in the gate lounge and the start of flight boarding. As a practical matter, this limits the number of passengers who can be surveyed on a given flight. Again note that the results here reflect the assumptions regarding variation considered in this example and will vary in other situations. Table B-18. Sample sizes by sector for cluster sampling with flights stratified by sector and a 50% sample of passengers on selected flights surveyed for 95% confidence interval widths 2%, 3%, and 4%. C.I. Width 2% C.I. Width 3% C.I. Width 4% Sector of Flight Flights Pass. Flights Pass. Flights Pass. Short-Haul Domestic 78 1,950 31 775 17 425 Long-Haul Domestic 26 1,560 11 660 6 360 International 13 1,105 5 425 3 255 Total 117 4,615 47 1,860 26 1,040