Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
67 C H A P T E R 4 This chapter presents quantitative results from econometric models of annual airport passenger enplanements. The analysis compares the results from model specifications that incorporate a specific type of disaggregated socioeconomic data into a set of baseline models of airport passenger demand. These baseline models rely on the sorts of aggregate socioÂ economic variables customarily used in airport analyses. The section also presents results from alternative models of air passenger activity that reflect the differences between disaggreÂ gated socioeconomic data elements and the aggregate socioeconomic data that these models typically use. The comparative model analysis uses a case study approach. The case study airports and airport systems are introduced, along with the equation specifications for modeling air pasÂ senger activity that are being compared. With this information about the modeling framework in place, we then present the modeling results. For each case study of airport or airport system, both model goodness of fit results, which address how well econometric model estimates fit the observed data on airport passenger enplanements, and the forecasting performance of the models. A detailed analysis of annual O&D enplanements for the BaltimoreÂWashington regional airport system is then presented. This analysis explores the effectiveness of more complex models of air passenger demand. The estimated models make use of dummy variables to isolate the influence of unusual events that may have influenced air travel demand and also explore the use of an alternative disaggregated measure for household income from that used in the case study model specifications. This more complex approach was applied to the total regional enplaneÂ ments for the threeÂairport BaltimoreÂWashington system because the analysis of aggregate regional demand is not affected by issues that might influence the modeling of enplanements at a single airport faced with competition from neighboring airports. Case Study Airport Selection To examine the potential contribution of disaggregated socioeconomic data to the fit and forecasting accuracy of regression models of airport annual enplanements, a case study analysis of seven individual U.S. airports and one metropolitan airport system was conÂ ducted. The case study airports included a range of airport sizes and service types. The roster of case study airports was selected to provide a sample of airports representative of the variety of circumstances and service settings that exist for commercial service airports in the United States. Table 26 shows these seven case study airports and the single regional system together with the airport characteristics that motivated its selection as part of the case study group of airports. Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data
68 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies Specifying Simple Models for Airport Enplanements Regression models of air passenger demand or enplanements typically use historical data on regional economics or demographics as independent variables that underlie the demand for air travel that drives airport passenger enplanements from year to year. Examples of the types of analyses and organizations that make use of models like these are discussed in Chapter 2. Once estimated, these econometric models can be used with forecasts of the independent variables to create forecasts of future airport passenger enplanements. The accuracy of such forecasts is affected by uncertainty in at least two ways. First, there may be uncertainty about the continued validity of the estimated model that relates changes in the independent variÂ ables to annual airport enplanements. Such uncertainty can reflect changes in factors that affect airline decisions to provide service at an airport, or to changes in relevant variables that were not included in the model. Second, the accuracy of the forecasts for the independent variables themselves is uncertain. Forecasts of the modelâs independent variables that turn out to be inacÂ curate can result in inaccurate airport activity forecasts even if the parameters estimated from the historical model remain valid over the forecast period. For the case study analyses, the models were estimated in log linear form. With such a specificaÂ tion, the independent variable coefficients represent elasticities of the airportâs annual passenger enplanements with respect to the independent variable in question. The principal purpose of the case study analysis was to assess the extent to which a simple regression model for an airportâs annual passenger enplanements based on aggregate socioÂ economic variables (the baseline model) can be improved by using an additional disaggregated socioeconomic variable as well (the alternative model). Airport Characteristics Associated Metropolitan Region Washington DCâ Baltimore Airport System (BWI, DCA, and IAD) Three large hubs, strong business and federal government market, available air passenger survey data Baltimore-Washington Combined Statistical Area (CSA) LAX (Los Angeles International) Large hub, international gateway, primary airport in multi-airport region, available air passenger survey data, potential use in the new data source case study Los Angeles-Long Beach- Anaheim Metropolitan Statistical Area (MSA) PHX (Phoenix Sky Harbor) Large hub, major vacation market, airline hub, mid- continent location, some air passenger survey data available Phoenix MSA TUL (Tulsa International Airport, Tulsa, OK) Small hub, strong low-cost carrier presence, potential use in the new data source case study, some air passenger survey data available Tulsa MSA PVD (T.F. Green Providence) Small hub, secondary airport in multi-airport region Providence MSA EUG (Eugene, Oregon) Small hub, primarily regional airline service, isolated location Eugene MSA MDT (Harrisburg International) Small hub, presence of nearby major hub Harrisburg-York-Lebanon CSA MSO (Missoula International) Non-hub, mid-continent, isolated location Missoula MSA Table 26. Case study airports and airport system and regions served.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 69 For each of the case study airports or airport systems, the analysis was conducted in the following steps: 1. For the MSA served by each of the case study airports, socioeconomic data was collected, using databases from Woods and Poole. Aggregate socioeconomic data variables used for case study modeling include MSA population, employment, total earnings, wage and salary earnings, gross regional product, and average household income. Annual data on Oil Prices were also collected from the Energy Information Administration, and annual case study airÂ port O&D passenger enplanements were collected using the DB1b database from the U.S. DOT. For each case study airport, the correlations between these variables over the years 1990 to 2010 were calculated. 2. For each of the case study airport MSAs, information about the distributions of regional populations by age group and regional households by income is used to create the disÂ aggregated socioeconomic data series, the percentage of regional households with incomes exceeding $100,000. 3. Log linear regressions are run, first for the baseline case using models that rely only on aggreÂ gate socioeconomic variables (along with the oil price variable meant to capture the role of an economywide cost factor that affects consumer spending and airline costs) and then for the alternative case in which the disaggregated socioeconomic variable related to household income distributions [the percentage of MSA households with incomes exceeding $100,000 (in 2009 dollars)] is added to the regression specifications. These regressions are estimated using annual data for the years 1990 to 2010. 4. For each of the case study airports, outÂofÂsampleÂperiod forecasts are created using the estiÂ mated model parameters for one of the aggregate socioeconomic variables (GRP) and the types of forecasts typically provided for these variables by Woods and Poole. For each of the case study airports, the forecasting performance of the two estimated models over the years 2010 to 2015 is assessed. For each case study airport, the log linear regression results of Step 3 are those of 13 distinct logÂlinear regression specifications used to define the baseline regressions and assess the additional explanatory or forecasting power that might be contributed by using the disaggregated household income variable in the alternative regressions along with the baseline aggregate socioeconomic variables. In the six Baseline Regression Equations, there are two explanatory variables, the annual Oil Price variable and one of the six aggregate socioeconomic variables. )()( )(= Î² +Î² Ã +Î² Ãln Ann Enp ln Oil Price ln Aggregate SE Var0 1 2t t t Mirroring the baseline regressions there is also a two variable model specification, a Baseline Regression Equation Using the Disaggregated Socioeconomic Variable, in which the aggregate socioeconomic variable is replaced by the disaggregated socioeconomic variable, so the disÂ aggregated variable is used instead of the aggregate one. )()( )(= Î² +Î² Ã +Î² Ãln Ann Enp ln Oil Price ln Disagg SE Var0 1 2t t t Finally, for each case study airport six Alternative Regression Equations are estimated, in which the disaggregated socioeconomic variable is added as an additional explanatory variable to each of the six baseline equation specifications. ) )( ()( )(= Î² +Î² Ã +Î² Ã +Î² Ãln Ann Enp ln Oil Price ln Aggregate SE Var ln Disagg SE Var0 1 2 3t t t t
70 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies Case Study Model Estimation Results and Model Performance Results from the case study model regressions using the model specifications outlined earlier are reported in this chapter, following a discussion of variable correlation issues in the socioÂ economic data (aggregated and disaggregated). The presence of strong correlation among indeÂ pendent variables has important impacts for model estimation when two or more correlated variables are included as independent regression variables. Insights into the regression model results for the case study airports can be gained from an analysis of the correlations between the variables used in the estimations of these equations. These correlation results are presented in detail for two of the case study airports, one of them a large hub and the other a small hub airport. These correlation patterns are then presented in less detail for the remaining six case study examples. Following the discussion of correlation issues, two types of diagnostic results are considered for the case study regressions. First, we report the goodness of fit of the regression models for each of the case study airports. This includes both an overall assessment of model fit, with comÂ parisons across the case study airports, including the statistical significance of the regression coefficient estimates, and also, for each case study airport, a comparison of the model statisÂ tics for the baseline regression results with those of the regression equations that include the disaggregated socioeconomic variable, the percentage of households with incomes exceeding $100,000 in the MSA or CSA served by the case study airport. Second, the accuracy of outÂofÂ sample forecasts of annual enplanements is examined for each case study airport. Correlations Among Case Study Model Variables Before reporting these modeling results, insights into the regression model results for the case study airports were taken from a preliminary analysis of the correlations among the variables used in the estimations of these equations. These correlation results are presented in detail for two of the case study airports, one of them a large hub and the other a small hub airport. Then, these correlation patterns are presented in less detail for the remaining six case study examples. We first examine the values taken by the variables used in the regression models over the 1990â2010 sample period for Phoenix, Arizona, which is served by Phoenix Sky Harbor InterÂ national Airport (PHX), a large hub. Table 27 reports the values taken by the socioeconomic and other variables used in the estimation of annual enplanements at PHX, reported at 5 year interÂ vals between 1990 and 2010. These include both the aggregated socioeconomic variables used in Variable 1990 1995 2000 2005 2010 Population (,000) 2,249.1 2,744.0 3,273.5 3,774.7 4,209.3 Employment (,000) 1,266.3 1,508.6 1,933.7 2,249.8 2,226.8 Total Earnings (millions) $45,970.2 $60,514.0 $91,479.6 $110,809.3 $107,593.0 Wages and Salaries (millions) $35,902.3 $45,496.3 $69,180.9 $81,655.4 $79,057.4 Gross Regional Product (millions) $74,756.6 $104,205.9 $147,324.2 $180,929.1 $177,019.1 Avg HH Income $72,399 $77,626 $94,127 $99,809 $94,975 % HH >$100,000 Income 14.7% 17.2% 22.1% 22.0% 20.8% DB1B O&D Enplanements 2,511,285 4,438,203 5,935,671 6,007,806 5,789,690 Oil Price (Composite Refiner Acquisition Cost) $40.16 $27.16 $38.86 $60.44 $83.99 Table 27. Phoenix socioeconomic trends and O&D enplanementsâ 1990 to 2010.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 71 the baseline and the case study alternative regression specifications, and the disaggregated socioÂ economic variableâthe percentage of Phoenix MSA households with annual incomes exceedÂ ing $100,000 (in 2009 dollars)âused in the alternative regression specification. This table also shows values in those years for annual O&D enplanements at PHX and for the oil price variable used as an independent regression variable. All dollar values are real values adjusted for inflation, expressed in 2015 dollars. As can be seen in both Table 27 and Figure 5, while population in the Phoenix MSA has grown steadily throughout the study period, the MSA was negatively affected in the national recession that began in 2007. This is expressed in the declines experienced in employment, earnings (both total earnings and wage and salary earnings), gross regional product, average household income, and the proportion of Phoenix households with incomes exceeding $100,000 (in 2009 dollars). Figure 5 does not include the indexed series for the real price of oil because the variability of this series is much greater than that of the other variables shown. Figure 6 adds the indexed real oil price series to those depicted in Figure 5 to indicate this difference. To complete the presentation of the comparative behaviors of the regression data for the PHX case study analysis, Table 28 reports the correlations between the variables in the annual series that are shown in the figures. As seen in the tabulated correlation values, the socioÂ economic series (aggregate and disaggregated series) are highly positively correlated, with all displaying a general increasing trend that is reversed somewhat in the latter years of the sample period. (The population variable for Phoenix does not have this falling off in later years.) There is also strong positive correlation between the socioeconomic variables and the series for annual PHX enplanements, and, to a lesser degree, with the oil price variable, which rose and declined dramatically in the later years of the sample period (shown in Figure 6). These strong positive correlations among the socioeconomic variables have implications for their use in regression models, especially since they are also positively correlated with the annual enplanements series. The strong positive correlation means that each of the socioeconomic variÂ ables contains very similar explanatory information with respect to the annual enplanements at PHX that the regression models are meant to explain or represent. In particular, estimating a regression model that uses two or more of these highly correlated socioeconomic variables as independent, or explanatory, variables is likely to result in unusual coefficient estimates while Figure 5. Indexed socioeconomic variables, Phoenix MSA, 1990 to 2010.
72 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies adding very little explanatory power compared to a model estimated with fewer of these indeÂ pendent variables. We will see numerous examples of this in the case study regression model estimates. As a second example, consider the underlying data for the case study analysis of annual O&D enplanements at Tulsa International Airport (TUL), a small hub airport serving Tulsa, Oklahoma. Table 29 presents the Tulsa MSA socioeconomic data used in the analysis as well as the TUL annual enplanements data at 5 year intervals over the sample period. The Tulsa MSA is smaller than the Phoenix MSA, reporting less than a quarter of the Phoenix population at the beginning of the sample period and growing more slowly than Phoenix over the course of the period. These comparisons can also be seen in Figure 7, which reports the indexed values of the aggreÂ gate socioeconomic variables used in the case study regression analyses for TUL along with the indexed values for TUL annual enplanements and the disaggregated socioeconomic variable Figure 6. Indexed Phoenix socioeconomic variables with indexed real oil price, 1990 to 2010. Variable Population Emp Total Earn Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.982 1 Total Earnings 0.975 0.997 1 Wages & Salaries 0.973 0.997 1.000 1 GRP 0.978 0.998 0.998 0.998 1 Avg HH Inc 0.943 0.987 0.991 0.993 0.988 1 % > $100K 0.880 0.929 0.933 0.938 0.938 0.952 1 DB1B Enplanements 0.888 0.925 0.914 0.919 0.927 0.929 0.968 1 Real Oil Price 0.771 0.740 0.725 0.723 0.717 0.685 0.464 0.504 1 Table 28. Correlations among Phoenix case study model variables 1990 to 2010.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 73 used for the analysis, the percentage of Tulsa MSA households with incomes exceeding $100,000 (in 2009 dollars). As in the Phoenix MSA, the population rose steadily throughout the sample period, while other socioeconomic variables experienced growth and then some decline in the aftermath of the 2007 recession. TUL annual enplanements have been volatile while showing no overall growth over the period. Table 30 reports the correlations among these variables for the Tulsa MSA over the sample period. As with Phoenix, these socioeconomic variables are highly correlated, although these correlations are not as strong between the socioeconomic variables and TUL annual enplaneÂ ments. Over the sample period there is also a modest negative correlation between annual enplanements at TUL and the oil price series. The data series used in the case study analysis for the other six case study airports or airÂ port systems show similar patterns of change and correlation over the sample period. These are reported in greater detail in Appendix D but the historical values for the enplanement and socioeconomic variables used in the case study analysis for the other six case study airports are reported in Tables 31 and 32. These socioeconomic and enplanement data are reported in five Variable 1990 1995 2000 2005 2010 Population (,000) 505.3 531.1 563.7 568.2 605.1 Employment (,000) 349.3 370.9 425.4 420.1 430.0 Total Earnings (millions) $14,048.4 $15,129.8 $19,992.1 $23,104.7 $23,262.7 Wages and Salaries (millions) $10,399.2 $10,757.0 $13,866.1 $14,253.7 $14,935.7 Gross Regional Product (millions) $21,109.1 $22,791.1 $28,852.4 $32,555.4 $34,424.1 Avg HH Income $74,275 $77,989 $94,664 $107,035 $109,308 % HH >$100,000 Income 11.4% 12.2% 15.4% 15.3% 15.4% DB1B O&D Enplanements 1,426,236 1,485,519 1,707,647 1,492,890 1,431,896 Oil Price (Composite Refiner Acquisition Cost) $40.16 $27.16 $38.86 $60.44 $83.99 Table 29. Tulsa socioeconomic trends and O&D enplanements 1990 to 2010. Figure 7. Indexed socioeconomic variables, Tulsa MSA, 1990 to 2010.
74 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies Variable Population Emp Total Earn Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.936 1 Total Earnings 0.932 0.940 1 Wages & Salaries 0.947 0.975 0.983 1 GRP 0.954 0.951 0.992 0.988 1 Avg HH Income 0.916 0.928 0.991 0.972 0.990 1 % > $100K 0.919 0.976 0.949 0.973 0.951 0.928 1 DB1B Enplanements 0.141 0.383 0.115 0.208 0.137 0.127 0.330 1 Real Oil Price 0.687 0.630 0.755 0.714 0.783 0.826 0.605 â0.107 1 Table 30. Correlations among Tulsa case study model variables 1990 to 2010. Baltimore-Washington (BWI//DCA//IAD) 1990 1995 2000 2005 2010 Population (,000) 7,089.4 7,525.9 8,014.1 8,573.5 9,088.9 Employment (,000) 4,613.6 4,675.0 5,271.0 5,718.1 5,879.3 Total Earnings (millions) $206,643.1 $225,701.4 $298,961.0 $359,596.0 $389,567.9 Wages and Salaries (millions) $160,076.5 $171,242.8 $226,206.6 $264,885.0 $285,829.1 Gross Regional Product (millions) $314,863.6 $348,991.4 $445,151.1 $549,959.9 $604,278.6 Avg HH Income $96,246 $100,271 $120,610 $130,421 $138,616 % HH >$100,000 Income 25.4% 27.1% 32.3% 34.0% 35.9% DB1B O&D Enplanements 12,476,709 14,570,808 20,548,395 23,277,051 22,140,825 Eugene Airport (EUG) 1990 1995 2000 2005 2010 Population (,000) 284.3 306.7 323.5 335.8 351.9 Employment (,000) 154.5 167.6 186.6 197.1 183.7 Total Earnings (millions) $5,043.6 $5,901.8 $7,272.4 $7,805.9 $7,285.3 Wages and Salaries (millions) $3,524.0 $4,058.2 $5,015.4 $5,526.1 $5,179.6 Gross Regional Product (millions) $7,367.5 $8,963.2 $11,111.7 $12,727.6 $14,170.1 Avg HH Income $61,422 $68,029 $76,687 $76,991 $76,272 % HH >$100,000 Income 9.1% 10.7% 14.3% 13.2% 12.5% DB1B O&D Enplanements 231,011 300,909 338,636 283,763 354,543 Harrisburg Airport (MDT) 1990 1995 2000 2005 2010 Population (,000) 475.9 498.8 509.5 525.8 550.3 Employment (,000) 327.4 353.8 377.2 386.1 383.8 Total Earnings (millions) $13,033.7 $14,841.4 $17,563.1 $19,565.0 $19,710.8 Wages and Salaries (millions) $9,649.0 $10,905.9 $12,983.9 $14,074.0 $14,171.7 Gross Regional Product (millions) $19,524.2 $23,136.0 $26,296.2 $29,819.8 $30,729.5 Avg HH Income $77,339 $81,163 $90,572 $92,515 $96,073 % HH >$100,000 Income 12.5% 14.1% 18.1% 18.5% 19.2% DB1B O&D Enplanements 514,185 554,327 629,003 617,432 650,801 Table 31. Other case study airport socioeconomic trends and O&D enplanementsâ1990 to 2010 (Baltimore-Washington, Eugene, and Harrisburg).
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 75 year intervals over the 1990 to 2010 sample period. All dollar values are inflation adjusted to 2015 dollars. All of the case study regions experienced growing populations and increasing economic activity over the sample period, although all six case study airports experienced a reduction (or slowdown in the rate of growth, in the cases of the BaltimoreÂWashington airport system and the Missoula International Airport), a reflection of the Great Recession that began in 2007. Tables 33 and 34 report the correlations among the regression analysis variables for the six other case study airports. As can be seen for each of the six airports, there are very strong posiÂ tive correlations among all the socioeconomic variables that were selected for use in the case study regression analyses, including the disaggregated socioeconomic variable, the percentage of regional households with incomes exceeding $100,000 (in 2009 dollars). The weakest correlaÂ tions are those between the real oil price and annual O&D enplanements at each of the case study airports, although these correlations are still positive for each of the six airports. Los Angeles Int'l (LAX) 1990 1995 2000 2005 2010 Population (,000) 11,297.1 11,692.7 12,392.7 12,726.4 12,845.3 Employment (,000) 6,881.7 6,550.1 7,236.8 7,542.9 7,310.5 Total Earnings (millions) $348,773.9 $339,349.0 $430,361.1 $481,291.6 $457,515.0 Wages and Salaries (millions) $262,676.1 $241,952.5 $305,626.4 $331,955.8 $322,306.0 Gross Regional Product (millions) $567,496.6 $549,222.4 $663,513.7 $784,224.7 $781,679.2 Avg HH Income $105,118 $105,482 $126,253 $136,670 $141,191 % HH >$100,000 Income 25.5% 24.9% 28.2% 27.5% 26.4% DB1B O&D Enplanements 14,530,215 16,411,458 19,714,223 18,462,728 18,453,834 Missoula Int'l Airport (MSO) 1990 1995 2000 2005 2010 Population (,000) 79.1 90.4 96.2 102.3 109.4 Employment (,000) 47.6 58.1 66.2 73.4 73.9 Total Earnings (millions) $1,455.0 $1,833.6 $2,324.4 $2,755.4 $2,761.2 Wages and Salaries (millions) $1,017.2 $1,245.0 $1,597.1 $1,853.3 $1,943.1 Gross Regional Product (millions) $2,259.6 $2,821.1 $3,436.7 $4,226.8 $4,390.5 Avg HH Income $58,650 $64,351 $74,201 $78,042 $76,847 % HH >$100,000 Income 8.4% 9.1% 11.5% 12.9% 13.9% DB1B O&D Enplanements 130,830 168,200 219,030 230,916 285,537 Providence T.C. Green (PVD) 1990 1995 2000 2005 2010 Population (,000) 1,513.2 1,535.8 1,586.1 1,613.4 1,602.2 Employment (,000) 786.5 784.8 852.1 878.5 846.2 Total Earnings (millions) $29,380.0 $30,928.0 $37,851.5 $42,470.1 $42,262.2 Wages and Salaries (millions) $22,067.2 $22,810.9 $27,961.0 $30,496.2 $29,987.0 Gross Regional Product (millions) $45,428.6 $48,171.8 $58,077.4 $68,226.1 $67,387.2 Avg HH Income $75,569 $77,861 $89,608 $95,285 $101,564 % HH >$100,000 Income 15.0% 15.9% 19.6% 21.1% 21.8% DB1B O&D Enplanements 1,148,711 1,022,280 2,780,558 2,943,633 2,079,410 Table 32. Other case study airport socioeconomic trends and O&D enplanementsâ1990 to 2010 (Los Angeles, Missoula, and Providence).
76 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies The strong correlations, or collinearity, between the socioeconomic variables reported for each of the regions served by the case study airports raise the same statistical challenges for modeling approaches that would use more than one socioeconomic variable as explanatory facÂ tors. This is because the close similarity in behavior over time by the socioeconomic variables in each region (a similarity expressed in the high positive correlation between the variables) means that adding a second independent socioeconomic variable from these candidate variables adds little or no new information to the regression. The use of two highly correlated explanatory variables leads to statistical estimation problems and difficulties in interpreting model estimates, sometimes referred to as âvariance inflationâ (Kennedy 1985), described as a consequence of collinearity or âillÂconditioned dataâ (Belsley, et. al 1980). Baltimore- Washington (BWI//DCA//IAD) Population Emp Total Earnings Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.981 1 Total Earnings 0.990 0.995 1 Wages & Salaries 0.988 0.997 1.000 1 GRP 0.994 0.994 0.999 0.998 1 Avg HH Income 0.984 0.996 0.995 0.997 0.994 1 % > $100K 0.973 0.982 0.978 0.982 0.977 0.984 1 DB1B Enplanements 0.938 0.958 0.951 0.956 0.949 0.961 0.979 1 Real Oil Price 0.780 0.805 0.787 0.786 0.797 0.804 0.716 0.685 1 Eugene Airport (EUG) Population Emp Total Earnings Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.923 1 Total Earnings 0.941 0.987 1 Wages & Salaries 0.949 0.989 0.997 1 GRP 0.990 0.938 0.944 0.956 1 Avg HH Income 0.938 0.974 0.984 0.979 0.939 1 % > $100K 0.801 0.879 0.911 0.888 0.778 0.928 1 DB1B Enplanements 0.733 0.745 0.713 0.714 0.733 0.807 0.704 1 Real Oil Price 0.723 0.642 0.612 0.659 0.772 0.580 0.293 0.448 1 Harrisburg Airport (MDT) Population Emp Total Earnings Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.918 1 Total Earnings 0.936 0.972 1 Wages & Salaries 0.928 0.981 0.997 1 GRP 0.956 0.976 0.996 0.994 1 Avg HH Income 0.939 0.972 0.989 0.994 0.987 1 % > $100K 0.906 0.974 0.980 0.990 0.977 0.991 1 DB1B Enplanements 0.427 0.539 0.483 0.518 0.499 0.531 0.546 1 Real Oil Price 0.792 0.635 0.665 0.642 0.675 0.659 0.596 0.030 1 Table 33. Correlations among other case study region model variables 1990 to 2010 (Baltimore-Washington, Eugene, and Harrisburg).
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 77 Case Study Model Estimation Results Tables 35 and 36 present the regression results for all the equation specifications used for the eight case study airports. Table 35 shows the regression results for seven baseline regression specifications (including the specification that includes only the disaggregated socioeconomic variable). Table 36 contains the results from the alternative regression specifications. Table colÂ umns report regression results for individual case study airports, and table rows report results for each regression specification, using each of the candidate aggregate socioeconomic indepenÂ dent variables. Cells containing estimated coefficients are color coded to indicate each estimateâs degree of statistical significance. Los Angeles Int'l (LAX) Population Emp Total Earnings Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.880 1 Total Earnings 0.956 0.971 1 Wages & Salaries 0.926 0.989 0.992 1 GRP 0.934 0.964 0.978 0.983 1 Avg HH Income 0.944 0.961 0.975 0.981 0.992 1 % > $100K 0.729 0.835 0.793 0.799 0.716 0.744 1 DB1B Enplanements 0.739 0.731 0.709 0.711 0.694 0.751 0.788 1 Real Oil Price 0.615 0.724 0.692 0.741 0.806 0.795 0.307 0.436 1 Missoula Int'l Airport (MSO) Population Emp Total Earnings Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.984 1 Total Earnings 0.968 0.990 1 Wages & Salaries 0.980 0.991 0.996 1 GRP 0.979 0.990 0.995 0.997 1 Avg HH Income 0.962 0.987 0.991 0.991 0.985 1 % > $100K 0.969 0.979 0.983 0.990 0.986 0.981 1 DB1B Enplanements 0.972 0.966 0.968 0.980 0.974 0.970 0.975 1 Real Oil Price 0.732 0.693 0.696 0.734 0.749 0.696 0.732 0.770 1 Providence T.C. Green (PVD) Population Emp Total Earnings Wages & Salaries GRP Avg HH Inc % > $100K DB1B Enp Real Oil Price Population 1 Employment 0.950 1 Total Earnings 0.979 0.976 1 Wages & Salaries 0.978 0.984 0.998 1 GRP 0.970 0.966 0.996 0.992 1 Avg HH Income 0.942 0.947 0.985 0.981 0.984 1 % > $100K 0.940 0.956 0.978 0.981 0.978 0.989 1 DB1B Enplanements 0.912 0.913 0.881 0.903 0.863 0.828 0.861 1 Real Oil Price 0.559 0.650 0.692 0.675 0.705 0.763 0.712 0.364 1 Table 34. Correlations among other case study region model variables 1990 to 2010 (Los Angeles, Missoula, and Providence).
78 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies Aggregate Socioeconomic Variable Regression Variable Baltimore- Washington EUG LAX MDT MSO PHX PVD TUL Population Constant coefficient -12.45 -1.45 -4.99 -4.50 0.49 0.93 -123.16 10.28 t-statistic -3.766 -0.476 -0.794 -1.170 0.611 0.781 -8.188 3.863 Oil Price coefficient -0.087 -0.106 -0.016 -0.189 0.026 -0.297 -0.184 -0.080 t-statistic -1.460 -1.455 -0.328 -3.572 0.676 -4.346 -1.842 -1.568 Population coefficient 3.28 2.50 2.37 2.96 2.55 1.93 18.79 0.63 t-statistic 8.474 4.485 3.391 4.626 12.901 11.444 9.058 1.515 Adj R squared 0.847 0.707 0.424 0.465 0.948 0.836 0.797 0.043 Employment Constant coefficient -6.20 4.68 1.00 4.39 5.61 1.93 -46.21 10.83 t-statistic -3.624 2.782 0.213 2.264 13.856 1.695 -9.434 9.025 Oil Price coefficient -0.160 -0.062 -0.044 -0.121 0.062 -0.313 -0.320 -0.100 t-statistic -3.521 -0.931 -0.773 -2.848 1.754 -4.356 -3.811 -2.482 Employment coefficient 2.74 1.57 1.84 1.58 1.54 1.98 9.20 0.60 t-statistic 12.718 4.467 3.252 4.603 13.313 11.051 12.113 2.909 Adj R squared 0.922 0.705 0.437 0.464 0.945 0.824 0.905 0.254 Total Earnings Constant coefficient 3.21 5.22 7.40 7.21 4.03 3.94 -18.71 12.41 t-statistic 2.749 2.976 2.888 5.150 8.174 4.590 -6.533 11.507 Oil Price coefficient -0.095 -0.037 -0.044 -0.131 0.050 -0.216 -0.371 -0.091 t-statistic -1.942 -0.543 -0.801 -2.866 1.480 -3.658 -3.939 -1.783 Total Earnings coefficient 1.10 0.86 0.74 0.67 1.03 1.08 3.26 0.21 t-statistic 10.609 3.983 3.478 4.362 14.077 12.313 11.133 1.775 Adj R squared 0.905 0.672 0.448 0.416 0.943 0.878 0.873 0.074 Wages & Salaries Constant coefficient 2.52 5.71 5.85 7.21 4.12 3.81 -20.49 11.47 t-statistic 2.202 3.579 1.895 5.406 9.745 4.490 -8.284 9.058 Oil Price coefficient -0.099 -0.058 -0.059 -0.125 0.019 -0.218 -0.352 -0.099 t-statistic -2.146 -0.824 -0.998 -2.889 0.605 -3.770 -4.583 -2.118 Wages & Salaries coefficient 1.18 0.84 0.90 0.69 1.08 1.12 3.53 0.32 t-statistic 11.433 4.067 3.391 4.579 16.225 12.610 13.610 2.254 Adj R squared 0.915 0.679 0.454 0.448 0.960 0.886 0.909 0.148 GRP Constant coefficient 2.47 5.81 5.25 6.88 3.35 3.01 -18.95 11.78 t-statistic 2.066 4.442 1.818 4.939 6.223 3.888 -6.201 9.465 Oil Price coefficient -0.106 -0.123 -0.114 -0.134 0.009 -0.201 -0.397 -0.109 t-statistic -2.202 -1.748 -1.747 -3.035 0.254 -4.132 -3.928 -2.037 GRP coefficient 1.12 0.78 0.90 0.67 1.08 1.11 3.16 0.27 t-statistic 10.998 4.892 3.832 4.626 14.172 14.867 10.512 2.043 Adj R squared 0.905 0.732 0.478 0.455 0.945 0.906 0.843 0.112 Avg HH Income Constant coefficient -6.08 -6.45 4.81 -2.03 -11.75 -15.24 -39.18 10.15 t-statistic -3.602 -1.894 2.045 -0.648 -7.312 -5.373 -7.236 5.400 Oil Price coefficient -0.134 -0.047 -0.125 -0.138 0.032 -0.213 -0.491 -0.123 t-statistic -3.094 -0.856 -2.280 -3.216 0.923 -3.099 -4.201 -2.214 Avg HH Income coefficient 1.99 1.72 1.05 1.38 2.12 2.76 4.83 0.40 t-statistic 12.823 5.467 4.893 4.900 14.091 10.452 9.660 2.217 Adj R squared 0.932 0.769 0.584 0.484 0.943 0.844 0.809 0.133 Pct HH Inc>100k Constant coefficient 18.91 13.97 18.67 14.72 14.95 18.62 21.15 15.31 (Disaggregated t-statistic 87.867 28.728 37.169 44.059 43.103 53.301 26.287 41.074 Socioeconomic Oil Price coefficient -0.029 0.049 0.040 -0.111 0.015 -0.009 -0.354 -0.090 Variable) t-statistic -1.004 0.875 1.194 -2.692 0.380 -0.200 -3.625 -2.312 Pct HH Inc>100k coefficient 1.80 0.72 1.52 0.56 1.26 1.95 3.21 0.39 t-statistic 16.678 4.085 4.841 4.595 12.332 13.443 10.572 2.815 Adj R squared 0.949 0.687 0.608 0.453 0.951 0.906 0.830 0.243 Statistical significance of coefficient estimates: Not significantly different from 0 20% level or better (t-statistic > 1.282) 10% level or better (t-statistic > 1.645) 5% level or better (t-statistic > 1.96) 1% level or better (t-statistic > 2.576) Counterintuitive sign Table 35. Case study airport baseline regression results.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 79 Aggregate Socioeconomic Variable Regression Variable Baltimore- Washington EUG LAX MDT MSO PHX PVD TUL Population Constant coefficient 27.40 2.75 11.51 4.33 7.16 12.16 -33.10 26.10 t-statistic 4.431 0.351 1.460 0.525 2.798 3.254 -0.990 4.679 Oil Price coefficient -0.001 -0.068 0.014 -0.161 0.008 -0.124 -0.319 -0.041 t-statistic -0.030 -0.689 0.328 -2.833 0.227 -1.573 -3.324 -0.946 Population coefficient -0.90 1.84 0.75 1.62 1.40 0.72 7.10 -1.48 t-statistic -1.373 1.434 0.910 1.263 3.063 1.736 1.622 -1.938 Pct HH Inc>100k coefficient 2.24 0.23 1.25 0.29 0.61 1.28 2.13 0.88 t-statistic 6.729 0.583 2.836 1.207 2.707 3.114 2.909 3.084 Adj R squared 0.952 0.698 0.599 0.467 0.962 0.909 0.866 0.362 Employment Constant coefficient 23.53 5.41 35.71 9.36 9.51 12.88 -20.62 11.31 t-statistic 2.646 0.801 3.221 1.090 4.309 3.791 -1.552 1.694 Oil Price coefficient -0.002 -0.054 0.138 -0.118 0.035 -0.124 -0.361 -0.099 t-statistic -0.041 -0.546 1.930 -2.716 0.949 -1.546 -4.516 -2.329 Employment coefficient -0.51 1.45 -1.84 0.83 0.91 0.69 5.75 0.54 t-statistic -0.519 1.271 -1.538 0.625 2.491 1.697 3.148 0.600 Pct HH Inc>100k coefficient 2.12 0.06 2.66 0.28 0.54 1.32 1.33 0.04 t-statistic 3.384 0.111 3.329 0.595 1.792 3.346 2.049 0.073 Adj R squared 0.946 0.688 0.621 0.438 0.954 0.909 0.917 0.210 Total Earnings Constant coefficient 22.43 11.75 19.80 14.83 7.67 16.26 -2.58 25.57 t-statistic 5.525 1.205 3.689 1.872 2.738 2.290 -0.200 6.917 Oil Price coefficient -0.008 0.027 0.050 -0.110 0.033 -0.044 -0.378 0.006 t-statistic -0.222 0.229 0.831 -2.199 0.912 -0.383 -4.076 0.119 Total Earnings coefficient -0.25 0.22 -0.08 -0.01 0.69 0.17 1.95 -0.86 t-statistic -0.868 0.229 -0.210 -0.014 2.611 0.332 1.841 -2.786 Pct HH Inc>100k coefficient 2.18 0.55 1.64 0.57 0.43 1.65 1.35 1.41 t-statistic 4.824 0.681 2.543 0.977 1.319 1.746 1.280 3.654 Adj R squared 0.946 0.668 0.585 0.421 0.949 0.901 0.876 0.466 Wages & Salaries Constant coefficient 22.76 10.18 25.53 11.43 4.94 16.55 -19.11 24.50 t-statistic 4.717 1.100 3.452 1.069 1.645 1.896 -1.473 3.947 Oil Price coefficient -0.008 -0.001 0.103 -0.118 0.017 -0.039 -0.354 -0.036 t-statistic -0.202 -0.008 1.359 -2.468 0.529 -0.290 -4.405 -0.690 Wages & Salaries coefficient -0.28 0.39 -0.50 0.30 1.00 0.16 3.41 -0.81 t-statistic -0.798 0.410 -0.929 0.308 3.352 0.237 3.106 -1.483 Pct HH Inc>100k coefficient 2.21 0.40 2.17 0.32 0.10 1.68 0.11 1.18 t-statistic 4.257 0.491 2.845 0.398 0.275 1.465 0.108 2.136 Adj R squared 0.946 0.669 0.593 0.422 0.958 0.901 0.905 0.308 GRP Constant coefficient 23.02 5.69 16.54 10.35 7.11 6.78 1.31 25.67 t-statistic 5.043 1.289 2.841 1.434 2.388 1.134 0.095 5.007 Oil Price coefficient -0.004 -0.125 0.012 -0.126 0.005 -0.157 -0.389 0.022 t-statistic -0.100 -1.175 0.143 -2.593 0.143 -1.835 -3.973 0.332 GRP coefficient -0.28 0.79 0.15 0.38 0.73 0.84 1.57 -0.86 t-statistic -0.900 1.885 0.367 0.606 2.646 1.984 1.441 -2.026 Pct HH Inc>100k coefficient 2.23 -0.01 1.33 0.26 0.42 0.48 1.67 1.30 t-statistic 4.578 -0.028 2.173 0.491 1.285 0.636 1.505 2.770 Adj R squared 0.947 0.716 0.587 0.430 0.950 0.907 0.856 0.374 Avg HH Income Constant coefficient 23.87 -42.55 11.16 -16.02 -2.99 38.33 16.02 24.59 t-statistic 2.623 -2.984 2.287 -0.665 -0.436 2.458 0.483 3.313 Oil Price coefficient -0.006 -0.193 -0.058 -0.156 0.020 0.117 -0.368 -0.003 t-statistic -0.107 -2.602 -0.816 -2.900 0.583 1.072 -2.764 -0.036 Avg HH Income coefficient -0.40 4.72 0.59 2.53 1.43 -1.61 0.41 -0.75 t-statistic -0.545 3.965 1.549 1.276 2.613 -1.264 0.155 -1.252 Pct HH Inc>100k coefficient 2.15 -1.48 0.81 -0.49 0.43 3.05 2.95 0.97 t-statistic 3.329 -2.588 1.471 -0.586 1.308 3.471 1.686 2.001 Adj R squared 0.946 0.814 0.624 0.461 0.951 0.905 0.822 0.290 Statistical significance of coefficient estimates: Not significantly different from 0 20% level or better (t-statistic > 1.282) 10% level or better (t-statistic > 1.645) 5% level or better (t-statistic > 1.96) 1% level or better (t-statistic > 2.576) Counterintuitive sign Table 36. Case study airport alternative regression (with disaggregated SE variables) results.
80 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies These two tables are somewhat complex, but they present all the estimation results in an organized fashion, making it possible to read both how well the range of independent variables performed for each of the case study airports and regions and to see how coefficient estimates for a given independent variable compared across the eight airports and regions. In the remainder of this subsection we present observations on the parameter estimates reported in the tables. The regression constant term can be seen as a scaling term that takes values that vary with the magnitudes taken on by the independent variables and the dependent variable. Because of this, the estimated constant term can vary widely from equation to equation for an individual airport or airport system. The oil price annual variable is included in the regressions as a factor reflecting a basic input cost for airlines and for passengers (as consumers of a full range of goods and services affected by the level of oil prices). The expectation is that the coefficient estimate for the Oil Price variÂ able will be negative, since rising oil prices negatively affect both airline costs (and through those costs, airline service decisions) and consumer expenditures since petroleum is an input to a variety of household consumption, especially auto travel. Because the regressions are estimated as log linear regressions, the coefficients can be treated as elasticities of airport enplanements with respect to the independent variable. Across the baseÂ line regressions, the elasticity estimates for the Oil Price variable are clustered around â0.1 for the BaltimoreÂWashington region, EUG, LAX, MDT, and TUL, and have negative values of a slightly greater magnitude (around â0.3) for PHX and PVD. However, the Oil Price elasticity estimates for annual enplanements at MSO take on small but positive values, which is not completely surprising since oil extraction has become an important contributor to the economy of the Missoula area. For the other seven airports, these inelastic estimates indicate that a 1% increase (decrease) in oil prices leads to a decline (increase) in annual enplanements of 0.1 to 0.3% for the airports other than MSO. This may seem to be a negligible impact, but over the sample period oil prices were often volatile and changed much more than a few percentage points from year to year, sometimes doubling or halving themselves in a yearâs time. Arguably, it is the passenger demand (and airline industry) response to these large shifts in oil prices that is of greatest interest for airport decisionÂmakers. The baseline regression coefficient estimates on the oil price variÂ able are usually but not always statistically significant. The elasticities estimated for the oil price in the alternative regressions using the disaggregated socioeconomic variable were also almost always negative (again with the exception of MSO), but at smaller magnitudes with more frequent instances of estimates that are statistically insignificant. In the baseline regressions, the coefficients estimated for the aggregate socioeconomic variÂ ables were positive as expected, and nearly always statistically significant at levels of significance of 10% or greater. These estimates represent the influence of the regional economy on annual airport enplanements. For six of the case study airports and region (BaltimoreÂWashington, EUG, LAX, MDT, MSO, and PHX) there is broad consistency among the parameter estimates for each of the aggregate socioeconomic variables. This is true for those that have a more demoÂ graphic interpretation; for those airports, the coefficients for the population variable range from 1.93 to 3.28 and the coefficients for the employment variable range from 1.54 to 2.74, with all of the estimates significant at the 0.01 level of statistical significance in each case. It is also true for those that have a more economic interpretation, ranging between 0.67 and 1.10; between 0.69 and 1.18; between 0.67 and 1.12; between 1.38 and 2.76; and between 0.56 and 1.95 for, respecÂ tively, the total earnings, wages and salaries, GRP, average household income, and the disaggreÂ gated socioeconomic variable percentage of households with incomes exceeding $100,000 (2009 dollars)âconsidered here as the only independent socioeconomic economic variable in the equation. All of the parameter estimates for these variables are also statistically significant at the 0.01 level of statistical significance.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 81 The remaining case study airports, PVD and TUL, have somewhat anomalous patterns in the parameter estimates for the independent socioeconomic variables included in the baseline regressions. For PVD, the estimates are consistently higher than those for the same variables in the regression for the six case study locations. For example, the parameter estimate for the population socioeconomic variable is 18.79, and the estimate for the Providence MSA GRP socioeconomic variable is 3.16, about three times the magnitude of the parameter estimates for the six airports. Like the others, the parameter estimates for the Providence socioeconomic variables are all statistically significant at the 0.01 level of statistical significance. In contrast, the parameter estimates for the independent socioeconomic variables at TUL are all smaller in magnitude than the estimates at the other case study airports, and these estimates often have lower levels of statistical significance. For each of the case study airports, conditional on the annual oil price series that is an indeÂ pendent variable in all equations, each of the six socioeconomic variables appears to bring simiÂ lar information to the estimations (and the disaggregated household income variable that is used on its own and shown in the last rows of Table 35), in the sense that for each of the airports, the adjusted RÂsquared value is roughly the same for each of the baseline regressions using aggreÂ gate socioeconomic variables. For some of the case study airports (MSO, PHX, PVD, and the BaltimoreÂWashington airport system), the explanatory power of the regressionsâexpressed as the adjusted RÂsquared statisticâis relatively high, with values exceeding 0.800 for each of the socioeconomic variables used for that airport. For three of the case study airports (EUG, LAX, and MDT), the adjusted RÂsquared results are lower, ranging roughly between 0.400 and 0.750. Finally, for TUL, the adjusted RÂsquared value is consistently below 0.500, and sometimes well below 0.500, regardless of the socioeconomic variable used in the regression. The model goodness of fit in the alternative specifications (using the disaggregated regional share of households with incomes exceeding $100,000 in 2009 dollars along with the indiÂ vidual aggregate socioeconomic variables from the baseline models) tends to increase slightly compared to the baseline regressions, but only modestly. However, because the disaggregated household income socioeconomic variable is so strongly correlated with each of the aggregate socio economic variables, the variance inflation discussed earlier results in many of the paramÂ eter estimates for socioeconomic variables (aggregate and disaggregated) having much lower levels of statistical significance in the alternative regression results. In these cases, because of the correlation, the aggregate and disaggregated socioeconomic variables are bringing similar information to the regression, which reduces the precision with which their coefficients are estimated. This outcome has to be balanced against the relatively modest improvement in the overall model fit, as reflected in the estimated RÂsquared statistic, in deciding whether the alterÂ native regressions provide an improved explanation of the enplaned passenger traffic. The lower statistical significance of the estimated coefficients of the independent variables in the alternaÂ tive models compared to the baseline models, which is true across all eight case study models, indicates that including the disaggregated household income variable is providing very little new information to the regressions from a statistical perspective. The regression results for the eight individual case study airports are discussed in greater detail in Appendix D. Forecasting Results from the Case Study Regression Analysis For each of the case study airports, exÂpost âforecastsâ of annual passenger enplanements were calculated for the outÂofÂsample years 2011 to 2015 using the regression models estimated for each case study airport. For simplicity, outÂofÂsample forecasts are reported only for the baseline and alternative regression equations that used GRP as the aggregate socioeconomic
82 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies independent variable for the case study airportâs annual enplanements. For each case study airÂ port, model performance can be assessed by comparing the âforecastâ values to the actual annual enplanement totals for the years 2011 to 2015. To make such forecasts of case study airport annual O&D enplanements, it is necessary to define assumed values of the independent variables for those âfutureâ years. Simulating the forecasting situations that would have confronted analysts in 2011 or 2012 (when actual values for the regional MSA variables would have been available through 2010) requires that we use the forecasts for the independent variables that would have been provided to analysts in 2012. This approach differs from the more usual approach to exÂpost forecasts that are based on the actual values of the independent variables. Simulating forecasts that might have been made in 2011 introduces two different sources of error in the forecasts: errors due to the model itself and errors due to differences between the forecast and actual values of the independent variable. A forecast for the oil price variable starting in 2011 that could have been used that year is availÂ able from an FAA forecasting database, but Woods and Poole do not archive their past forecasts for socioeconomic variables. Discussions with Woods and Poole staff indicated that, because the Woods and Poole forecasting methodologies do not rely on business cycle fluctuations, the indeÂ pendent variable forecasts that would have been provided in 2012 would be similar in percentage terms to those provided at the present. We used this assumption to create outÂofÂsample foreÂ casts for the independent socioeconomic variables. For example, if for a particular MSA the 2016 Woods and Poole data release projects a 2% growth rate for GRP between 2015 and 2016, then for our outÂofÂsample forecasting exercise we assume that for that MSA GRP will grow at a 2% rate between 2010 and 2011. This procedure for translating the most recent Woods and Poole data projections was used to generate the outÂofÂsample forecasts in our case study analyses. For each case study airport, four forecast scenarios for the independent variables were created. The first three provided a âhigh, medium, and lowâ forecasting range by setting the medium scenario for each variable at the value estimated from the 2016 Woods and Poole projections. The high scenario for each variable is set at 1.5 times the growth rate for the medium scenario, and the low scenario at half that growth rate. For example, if for a given MA the Woods and Poole projection for growth in GRP was 2% growth between 2015 and 2016, and 2.2% growth between 2016 and 2017, then we would use those values for our medium scenario for GRP growth between 2010 and 2011, and between 2011 and 2012. For the high growth scenario we would use 3% growth between 2010 and 2011, and a 3.3% growth rate between 2011 and 2012. The low growth scenario calls for halving the growth rate in the medium scenario to 1% between 2010 and 2011, and 1.1% between 2011 and 2012. For the independent variable forecasts for the fourth forecast scenario, we use the observed or actual historical values from 2011 to 2015 as the regression model inputs for the 2011 to 2015 enplanement forecasts. Doing this provides an assessment of the estimated modelâs performance as a forecasting tool, assuming that it had been possible to forecast in 2010 the future values of the modelâs independent variables (in this example the oil price variable, the regional product, and, in the case of the alternative case study model, the proportion of regional households with incomes exceeding $100,000 in 2009 dollars) with perfect foresight. This provides an assessment of the accuracy of the estimated model as representative of the interactions between the regional economic environment and the case study airportâs annual enplanements over the forecast years, unaffected by any errors involved in forecasting the independent variables. With the values for the independent variables defined for each of the four forecast scenarios, forecasts for annual enplanements from 2011 to 2015 can be calculated for the baseline and alterÂ native GRP regression equations for each case study airport. The assessment of these forecasts considers the accuracy of the forecasts and more importantly considers the extent to which the
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 83 addition of the disaggregated household income variable improves the accuracy of the alternaÂ tive regression forecast (which uses the disaggregated income variable) compared to the baseline regression forecast. Detailed forecast results are shown for two of the case study airports, Phoenix Sky Harbor International Airport and Tulsa International Airport. Figure 8 shows the actual PHX annual O&D enplanements from 1990 to 2015 (a period that includes the outÂofÂsample forecast), the annual enplanements estimated by the baseline model that uses Phoenix MSA GRP as the independent aggregate socioeconomic variable, and the four forecasts based on the independent variable forecast scenarios described above. For PHX, there is considerable overlap between actual enplanements in the years 2011 to 2015 and the enplanements projected by each of the four forecast scenarios. The forecast perÂ formance of the model is especially striking for the forecasting exercise using the actual values for the oil price and Phoenix MSA GRP (âForecast Using Actualâ). The actual enplanements path ends up within the range of enplanement numbers defined by the high and low forecasts, although this is a consequence of the fact that the forecast growth under each scenario was applied to the modeled enplaned passengers in 2010, which overstated the actual passenger trafÂ fic. Had the forecast growth been applied to the actual enplaned passenger traffic in 2010, the projected enplaned passengers under the high growth scenario would have been fairly close to the actual traffic until 2015 when the actual traffic would have exceeded even the high growth scenario at a value between the high growth and âForecast Using Actualâ scenario. Figure 9 is a similar chart, but the model and forecast paths shown use the Alternative model specification for PHX, for which the independent variables comprise the oil price variable, the Phoenix MSA GRP, and the disaggregated socioeconomic variable used in the case study analyÂ sis, the percentage of Phoenix households with household income exceeding $100,000 (in 2009 dollars). The withinÂsample (1990 to 2010) model estimates and the outÂofÂsample forecasts for the PHX Alternative model are similar to those from the Phoenix Baseline model, except that the âForecast Using Actualâ projections are lower than those given by the Baseline model. This can be seen in Table 37, which reports the root mean squared error (RMSE) calculation for each of the four forecasts, the Phoenix Baseline model, and the Phoenix Alternative model. 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 20 05 20 06 20 07 20 08 20 09 20 10 20 11 20 12 20 13 20 14 20 15 A nn ua l O & D E np la ne m en ts (P H X) Actual PHX Model 1990-2010 Medium Forecast High Forecast Low Forecast Forecast Using Actual Figure 8. Case study baseline out-of-sample forecasts, Phoenix Sky Harbor.
84 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 20 05 20 06 20 07 20 08 20 09 20 10 20 11 20 12 20 13 20 14 20 15 A nn ua l O & D E np la ne m en ts (P H X) Actual PHX Model 1990-2010 Medium Forecast High Forecast Low Forecast Forecast Using Actual Figure 9. Alternative model out-of-sample forecasts, Phoenix Sky Harbor. Baseline Alternative Medium Forecast 318,590 241,766 24% High Forecast 554,205 367,045 34% Low Forecast 447,829 518,924 -16% Actual Values 326,203 369,242 -13% Root Mean Squared ErrorPHX (using GRP) % Difference BL v Alt Table 37. RMSE comparisons for PHX baseline and alternative case study model forecasts. For each of the four forecasting scenarios for Phoenix Sky Harbor, the RMSE of the baseline and alternative forecasts measures the average distance (over the five outÂofÂsample forecast years) between the model forecasts and the actual PHX enplanements in those years. A forecast with a lower RMSE is âmore accurateâ than another in the sense that on average it lies closer to the actual values of the PHX enplanements in those years. In this case for PHX, the medium and high forecasts from the alternative model have lower RMSE values than the same forecasts for the baseline model, while the low and actual value forecasts from the PHX baseline model have lower RMSE values than the same forecasts from the alternative model. Comparing the baseline model âforecastsâ for PHX with one another, the forecast using the medium and actual value scenarios for the independent variable outÂofÂsample values for 2011 to 2015 have better forecasting performanceâlower RMSE valuesâthan those using the high and low forecast scenarios for the independent model variables. Comparing the alternative model forecasts with one another, the forecasts using the medium and high scenarios for the independent variable outÂofÂsample values have better forecasting perforÂ mance (lower RMSE values) than those using the low and actual value scenarios for the independent model variable, although the difference between the performance using the high and actual value scenarios for the independent model variable is fairly small. Figure 10 shows the actual TUL annual O&D enplanements from 1990 to 2015 (a period that includes the outÂofÂsample forecast), the annual enplanements estimated by the baseline model that uses Tulsa MSA GRP as the independent aggregate socioeconomic variable, and the four forecasts based on the independent variable forecast scenarios described earlier.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 85 For TUL, the actual enplanement series is much more volatile than the baseline model estiÂ mates, which is reflected in the low adjusted RÂsquared scores for the TUL models. The forecastÂ ing performance of the baseline model is also quite inaccurate. For all four independent variable forecast scenarios, the model severely overforecasts compared to the actual enplanement results at TUL for 2011 to 2015, although this overprojection would be less if the projected growth had been applied to the actual enplaned passenger traffic in 2010 rather than the modeled traffic. Figure 11 shows the model and forecast values using the alternative model specification for TUL, for which the independent variables comprised the oil price variable, the Tulsa MSA GRP, and the disaggregated socioeconomic variable used in the case study analysis, the percentage of Tulsa households with household income exceeding $100,000 (in 2009 dollars). 1,200,000 1,300,000 1,400,000 1,500,000 1,600,000 1,700,000 1,800,000 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 A nn ua l O & D E np la ne m en ts (T U L) Actual TUL Model 1990-2010 Medium Forecast High Forecast Low Forecast Forecast Using Actual Figure 10. Case study baseline out-of-sample forecasts for Tulsa International. 1,100,000 1,200,000 1,300,000 1,400,000 1,500,000 1,600,000 1,700,000 1,800,000 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 A nn ua l O & D E np la ne m en ts (T U L) Actual TUL Model 1990-2010 Medium Forecast High Forecast Low Forecast Forecast Using Actual Figure 11. Case study alternative model out-of-sample forecasts for Tulsa International.
86 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies The withinÂsample (1990 to 2010) model estimates and the outÂofÂsample forecasts for the TUL alternative model differ in some ways from those shown above in the TUL baseline model estimates. Although like the baseline model, the alternative model does not match the volatility of the historical TUL enplanements series, but it does track the actual series more accurately at some points over the sample period. The medium, high, and low forecast scenarios also overÂ forecast compared to the actual TUL enplanement values for 2011 to 2015, but the forecast scenario based on the actual values of the independent variables over the forecast period is quite accurate for the first 2 years of the outÂofÂsample forecast period before sharply underforecasting TUL enplanements. The impacts of these characteristics in the forecasts on the average forecast accuracies can be seen in Table 38, which reports the RMSE calculation for each of the four forecasts, the TUL baseline model, the TUL alternative model. In the RMSE comparison for TUL, for each forecast scenario the alternative model has lower RMSE values than the same forecasts for the Baseline model, although the figures make it clear that none of the forecasts are very accurate predictions of the actual course of enplanements at TUL over the entire outÂofÂsample period. Comparing the TUL forecasts with one another, the forecasts using the Low scenario for the independent variable outÂofÂsample values have better forecasting performance (lower RMSE values) than the other forecast scenarios for the indepenÂ dent model variables. The purpose of the case study analysis is to determine whether the use of disaggregated socioÂ economic variables improves the performance of these relatively simple regression models. It was noted above that, for all of the case study airports, the addition of the disaggregated variable did not add much to the inÂsample model goodness of fit, as measured by the model RÂsquared estimates. The same is true in many cases for the assessment of model outÂofÂsample forecast performance, as measured by comparing the RMSE of the baseline and alternative forecasts for each of the case study airports. Table 39 summarizes these forecast performances. Table 39 provides a description of the behavior of the forecasts from the baseline model (which uses the oil price variable and the aggregate socioeconomic variable GRP) and the alterÂ native model (which uses the disaggregated socioeconomic variable, the percentage of regional households with incomes exceeding $100,000), and the percentage change in the RMSE when comparing the baseline model RMSE to the alternative model RMSE. A negative value for this measure indicates that the baseline forecast had a lower RMSE than the alternative model did (due to the baseline forecasts being closer to the actual airport enplanement values), and a posiÂ tive value indicates that the alternative model had the lower RMSE. The smaller the magnitude of the percentage number, the smaller the difference between the baseline and alternative RMSE scores, and therefore the smaller the difference between the two forecasts. In Table 39 the regression adjusted RÂsquared statistics are also reported, making it posÂ sible to consider the relationship, if any, between the modelâs goodness of fit at a case study Baseline Alternative Medium Forecast 94,079 72,038 23% High Forecast 104,440 86,068 18% Low Forecast 84,182 58,826 30% Actual Values 135,787 72,748 46% % Difference BL v Alt Root Mean Squared ErrorTUL (using GRP) Table 38. RMSE comparisons for TUL baseline and alternative case study model forecasts.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 87 airport and the accuracy of the forecasts produced by the model (using the scenario assumptions about the future values of the independent variables). There appears to be no clear relationship between these model evaluation measures, since some case study models had very high adjusted RÂsquared values and relatively poor forecasting results. The forecasts and their performance relative to the actual annual enplanements for the case study airports are addressed in greater detail in Appendix D. In most cases, neither the baseline model nor the alternative model provides an accurate forecast for a case study airportâs passenger enplanements over the 2011 to 2015 period, even when the actual data for the independent variables over those years are used to calculate the enplanement forecasts. This may be so because the models are structured as models of passenger Case Study Airport or System Adj R-squared when GRP is SE variable (BL/Alt) Characteristics of Out-of-Sample Forecasts Avg % RMSE Change across the 4 Forecasts (Alt vs BL) Washington DCâ Baltimore Airport System (BWI, DCA, and IAD) 0.905/0.947 Slightly overforecasts, Forecasts exceed actual values, by 10 to 20% (baseline) and by 5 to 15% (alternative) 30% (26% to 36%) LAX (Los Angeles International Airport) 0.478/0.587 Underforecasts, Forecasts less than actual values, by 5 to 20% (baseline) and by 5 to 20% (alternative) 20% (-40% to +10%) PHX (Phoenix Sky Harbor International Airport) 0.906/0.907 Forecasts around actuals, Forecasts differ from actual values, by -11 to 10% (baseline) and by -13 to 7% (alternative) 20% (-16% to +34%) TUL (Tulsa International Airport) 0.112/0.374 Overforecasts, Forecasts exceed actual values, by 3 to 16% (baseline) and by -9 to 8% (alternative) 30% (18% to 46%) PVD (T.F. Green Airport) 0.843/0.856 Overforecasts, Alternative closer than Baseline, Forecasts exceed actual values, by 8 to 96% (baseline) and by -9 to 70% (alternative) 35% (23% to 52%) EUG (Eugene Airport) 0.732/0.716 Underforecasts, Forecast values less than actual values, by 9 to 22% (baseline) and by 9 to 22% (alternative) 1% (BL and Alt similar) MDT (Harrisburg International Airport) 0.455/0.430 Generally underforecasts, Forecasts less than actual values (except in 2015), by 6% or less (baseline) and by 7% or less (alternative) 2% (-9% to +14%) MSO (Missoula International Airport) 0.945/0.950 Underforecasts, Forecast values less than actual values, by 3 to 20% (baseline) and by 2 to 20% (alternative) 3% (-1% to 6%) Table 39. Model performance diagnostics for case study airport regressions.
88 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies demand, and do not take sufficient account of the role of supply side decisions by airlines on the availability of seats and travel opportunities for prospective passengers. This may be especially significant for models of passenger enplanements at smaller airports. The case study analysis thus provides a comparison of the statistical and forecasting perÂ formance of the baseline or traditional modeling approach relying on aggregate regional socio economic variables and an alternative modeling approach that also includes a type of disaggregated socioeconomic variable that reflects changes in the share of relatively higher income among regional households. Based on this comparison for the case study airports, is it worthwhile for airport analysts to experiment with the alternative modeling approach? There are several factors to consider. Relative to the goodness of fit of the baseline models (adjusted RÂsquared), the alternative model specifications that include the regional disaggregated socioeconomic variable provided very little improvement in this area. This was true for case study airport baseline regressions with relatively high adjusted RÂsquared results and for case study examples of relatively low baseline regression goodness of fit. Another factor to consider when comparing the baseline and alternative model specification results is precision or statistical significance of the coefficient estimates. As shown in Table 35, baseline regression estimates frequently resulted in coefficient estimates that are statistically sigÂ nificant, but this frequency of statistically significant parameter estimates was in general reduced when the disaggregated regional socioeconomic variable was added to the equation specificaÂ tion. This general outcome can be seen in Table 36. This result is closely related to the strong correlation between the values taken by the disaggregated socioeconomic variable for each case study airport and the baseline aggregate socioeconomic variables. A third factor to consider is forecast accuracy. Does including the disaggregated socioecoÂ nomic variable related to regional household income distributions result in models that forecast more accurately? Again, the results from the case study analysis are mixed. In most cases, the 5 year outÂofÂsample forecasts for the case study airports are not very accurate. In some forecast scenarios for some case study airports, the alternative model specification using the disaggreÂ gated regional variable provides a more accurate 5 year forecast, as measured by the RSME for the forecasts, since a decline in the RSME for a given forecast scenario indicates that the forecast is on average closer to the actual 2011 to 2015 values. However, improvement on the baseline forecast is not always the case. In terms of these three criteria (model goodness of fit, precision or statistical significance of model parameter estimates, and outÂofÂsample forecast accuracy) the alternative case study models provided only modest or mixed improvements to the baseline regression outcomes, indicating that the benefits of applying the approach used to prepare the case study model comÂ parisons are limited. On the cost side, if the analyst would be using data such as that available from Woods & Poole, there is no additional cost for obtaining the disaggregated household income or other distributional data, since it is included along with the aggregate regional data provided, and only modest computational time and effort would be needed to make use of that data. Because of this second aspectâthe relative ease of including the disaggregated regional data in an analysisâthe airport may learn new nuances about the demographics and economy of the region it serves when it includes this additional analysis, even if the use of the disÂ aggregated variable provides only modest improvement to the performance of models of air passenger demand that rely on aggregate regional characteristics. In the remainder of this chapter, a more detailed analysis of other methods of including disaggregated socioeconomic
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 89 variables in air passenger demand models for the BaltimoreâWashington airport system provides different approaches that can extend beyond the simple case study comparisons described earlier. More Detailed Analysis of the BaltimoreâWashington Region The case study regression models used a fairly simple functional form with a limited numÂ ber of variables in each model. To explore the potential application of more complex models with additional variables, including dummy variables to account for yearÂspecific effects, a more detailed analysis was undertaken of air passenger demand in the BaltimoreâWashington metropolitan region. This region was chosen because it is served by three major commercial service airports and air traffic data has been assembled for all three airports, allowing an analysis of regional demand that avoids distortions from changes in the regional share of specific airports. In addition to exploring more complex models, the analysis also explored the use of an alternative disaggregated measure of household income to that used in the initial case study models. This alternative measure was chosen to avoid the problem of corÂ relation between the aggregate and disaggregate measures of household income used in the prior analysis. Alternative Disaggregated Income Measure The initial model estimation regressions used the percent of households with personal incomes of $100,000 or more (in constant 2009 dollars) as the disaggregated measure of income. One potential difficulty with this measure is that it is not independent of the average household income; as the average household income has increased over time, the percent of households with incomes above any given threshold (e.g., $100,000) will also have increased as a growÂ ing percentage of households move into that income range, even if the relative distribution of incomes does not change. This is reflected in a strong correlation between average houseÂ hold income and the percent of households with incomes of $100,000 or more (0.984 for the BaltimoreâWashington region for 1990 to 2010). As a result of this correlation, adding the disÂ aggregated income variable to the model specification typically results in the average household income variable (or other aggregate economic variables that are strongly correlated with houseÂ hold income) becoming statistically insignificant, leaving the disaggregated income variable as the only statistically significant economic variable in the model. To explore alternative disaggregated measures of household income that are independent of the average household income, an analysis was undertaken of the household income distribuÂ tions for the BaltimoreâWashington region, using data from Woods & Poole Economics. The highest income category is $200,000 or more (in constant 2009 dollars), so the data provide no information about the shape of the distribution above $200,000. To estimate the shape of the disÂ tribution for households with incomes above $200,000, an analysis was performed of the houseÂ hold income distribution reported by respondents to the 2010 CES, who reported their actual household income in various categories. The result of this analysis is presented in Appendix D and the resulting distribution was used to estimate the cumulative distribution for household incomes above $200,000. As an alternative to the percent of households with a personal income of $100,000 or more, the percent of total personal income for all households that was received by the top 10% of households by income is a measure that is independent of changes in the average household income. This measure only reflects changes in the relative distribution of household income after adjusting for changes in average household income.
90 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies To calculate this measure from the Woods & Poole data, it is necessary to make two calculations: â¢ Determine the household income that corresponds to the 90th percentile of the cumulative distribution â¢ Estimate the average household income for all households below the 90th percentile of the income distribution Calculating the average household income for households below the 90th percentile, rather than directly calculating the total income for the 10% of households above the 90th percentile, was necessary due to the uncertainty about the shape of the income distribution above $200,000. The percent of total personal income received by households in the top 10% by income can then be easily calculated from the average income for all households and the average income for households below the 90th percentile. The household income corresponding to the 90th percentile was calculated by assuming that the distribution curve between the two data points on the cumulative income distribution on either side of the 90th percentile was approximated by the logistic function: )(= +1 1P eaY n where P is the cumulative percentage, Y is the household income, and a and n are parameters that are fitted to the data. The average income for households below the 90th percentile was calculated by integrating the cumulative income distribution curve below the 90th percentile, assuming that the curve between each pair of data points was approximated by a quadratic function. The parameters of the quadratic function between each pair of data points were determined from the two data points and the subsequent data point in the distribution. The resulting changes in the values of the percent of total personal income received by houseÂ holds in the top 10% by income from 1990 to 2010, together with the corresponding percent of households with a personal income of $100,000 or more and the average household income, is shown in Figure 12. The percent of total personal income received by households in the top 10% by income increased at a slower rate than percent of households with a personal income of $100,000 or more from 1990 to 2010, as would be expected given the increase in average household income over the period. The difference in the rate of increase became less after 2000. In fact, from 1999 to 2010 the increase in the two measures was essentially identical. The increase in the two meaÂ sures differed considerably from year to year, with the changes in the percent of total personal income received by households in the top 10% by income from year to year corresponding more closely to the changes in the average household income (the correlation coefficient for the two measures was 0.99). Since the percent of total personal income received by the top 10% of households is indeÂ pendent of the average household income, the combination of an increasing average houseÂ hold income and an increasing share of total income received by households in the top 10% means that the average household income of the top 10% of households increased significantly from 1990 to 2010, in fact by 75%. In contrast, the average household income of other houseÂ holds only increased by 21%. During the same period, the regional enplaned O&D passengers increased by a little over 77%. A trial regression adding a variable for the percent of total personal income received by the top 10% of households to a logÂlinear model of O&D enplanements per capita for the Baltimoreâ Washington region for the period 1990 to 2010 with variables for the average household income
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 91 and oil price gave statistically significant coefficients for all variables. In contrast, the same model with the percent of households with incomes of $100,000 in place of the percent of total personal income received by the top 10% of households resulted in a statistically significant coefficient for the percent of households with incomes of $100,000 but the coefficients for the average household income and oil price became statistically insignificant and the coefficient for average household income had a counterintuitive sign. Modeling Approach and Model Development The change in the enplaned O&D passengers per person and selected socioeconomic variÂ ables, as well as the U.S. average airline yield and the oil price measure adopted, from 1990 to 2010 is shown in Figure 13, expressed as an index relative to the value of each data series in 1990. This allows data expressed in very different units to be shown on the same chart and proÂ vides a direct comparison of the relative change in each data series over time. The enplaned O&D passenger and employment data were expressed on a per person basis to account for the effect of population growth on demand while avoiding problems from the correlation that exists between growth in population and growth in other socioeconomic factors. It seems clear form the data shown in Figure 13 that the level of enplaned passengers per capita after 2000 was influenced by factors other than the ongoing trends in socioeconomic factors and airfares as represented by average airline yield, most notably changes in the airline industry following 9/11. To avoid the effect of these factors distorting the estimates of the effect of the socioeconomic, airline yield, and oil price variables on the level of enplaned passengers and to provide a way to quantify the magnitude of the effect of any additional factors that occurred after 2000, demand models were first estimated using data for the period 1990 to 2000. These models were then used to project enplaned passengers per capita for the period 2001 to 2010 and the resulting projected traffic compared to the actual traffic. This gave the ratio by which Figure 12. Changes in disaggregated household income metrics over time.
92 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies the projected traffic exceeded the actual traffic. Based on the resulting pattern of this ratio, yearÂ specific dummy variables were defined to account for the level of overprediction in each year. The models were then reÂestimated for the full period including the dummy variables. The inclusion of yearÂspecific dummy variables forces the model to fit the data for those years. Their inclusion in the model provides two important benefits: â¢ It allows the estimation of the coefficients of the continuous variables to be based on the full 21 years of data without being distorted by yearÂspecific effects (such as the effect of 9/11 in 2001 and 2002). â¢ It provides an estimate of the magnitude of any yearÂspecific effects during the period from 2001 to 2010 that reduce the projected enplaned passengers per capita below the level attributÂ able to the effects of the continuous variables. Naturally, the inclusion of the dummy variables provides no information on what factors may have caused the effect measured by each dummy variable. It merely measures the magnitude of the effect in each year. Interpreting the likely or potential cause of these effects requires addiÂ tional analysis or thought. However, knowing how the magnitude of the effect changes from year to year may lead to the addition of new continuous variables or changes in the definition of the continuous variables (e.g., using the average fare for the airports in question from the U.S. DOT 10% O&D survey in place of the national average yield) that account for these effects without the need for dummy variables. Even without such additional analysis, separating out the yearÂspecific effects from the effects of the continuous variables included in the model has value from the perspective of the use of the models for forecasting. If the yearÂspecific effects are considered unlikely to occur in the future or to recur under assumed conditions (e.g., future recessions), the effects measured by Figure 13. Changes in enplaned O&D passengers and socioeconomic and other data: BaltimoreâWashington Region, 1990 to 2010.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 93 the dummy variables can be omitted from the forecasts or used to define future scenarios in which these effects continue but at a different level of frequency. In contrast, developing models that rely on a limited set of continuous variables without considering any yearÂspecific effects runs the risk of the resulting models accounting for these effects by distorting the estimated coefficients of the continuous variables. Use of such models to prepare forecasts can result in a situation where it is unclear whether and to what extent the forecasts implicitly assume a conÂ tinuation of the yearÂspecific effects that occurred during the model estimation period. However, one limitation of this approach is the limited number of data points that are availÂ able in the period 1990 to 2000 to estimate the models to be used to assess the pattern of overÂ or underprediction during the period from 2001 to 2010. Of course, including dummy variables in the resulting model estimated on the full period reduces the degrees of freedom in the model estimation, which needs to be carefully considered in deciding how many dummy variables to include. Therefore the model development started with very simple models and progressively added variables, retaining them if they improved the model fit and had statistically significant coefficients of the expected sign and the values seemed plausible. The detailed evolution of the model specification is described in Appendix D. The model development was based on the use of a multiplicative (logÂlinear) demand model with the total regional enplaned O&D passengers per capita as the dependent variable. This model specificaÂ tion ensures that the marginal change in the dependent variable for a given change in one of the independent variables varies with the overall level of the dependent variable, which seems intuiÂ tively reasonable. In addition, the coefficient estimates for logÂlinear models give the demand elasticity, which is helpful in interpreting the reasonableness of the model estimates. The final specification is shown in Table 40. All the coefficient estimates are statistically significant at the 5% level or better and all but that for the employment variable are statistically significant at the 1% level or better. The estimated coefficients have expected signs and the estimated values are intuitively reasonable. The estiÂ mated coefficient for average airline yield implies that total air travel demand is slightly inelastic with respect to airfares, which does not seem unreasonable since real airfares have been generally Variable Final Model Intercept Coefficient -21.50 t-statistic (-5.68) Average Household Income Coefficient 2.047 t-statistic (7.62) US Average Airline Yield Coefficient -0.849 t-statistic (-5.93) Employment/person Coefficient 0.520 t-statistic (2.40) Pct of HH Income by Top 10% Coefficient -1.697 t-statistic (-4.44) Dummy Variable 2001 Coefficient -0.1514 t-statistic (-8.31) Dummy Variable 2002 Coefficient -0.3059 t-statistic (-9.71) Dummy Variable 2003 Coefficient -0.02347 t-statistic (-10.86) Dummy Variable 2005 Coefficient 0.0539 t-statistic (3.20) Dummy Variable 2009 Coefficient -0.0901 t-statistic (-5.19) Adjusted R Squared 0.994 Table 40. Model estimation resultsâ 1990â2010 (Final Model Specification).
94 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies declining over the period from 1990 to 2010 and as airfares decline some households and busiÂ nesses are likely to have chosen to use the savings for other purposes rather than purchasing more air travel. The estimated coefficient for employment per capita also appears reasonable. It seems reasonable that a given increase in regional employment would lead to a proportional increase in business travel, if all other factors remain unchanged. However, business travel only accounts for about half of all air travel, so a priori one would expect an elasticity of demand with respect to employment of around 0.5. It may at first seem counterintuitive that the coefficient for the alternative disaggregated income variable would be negative, but this is what would be expected. If the top 10% of houseÂ holds have a higher share of total income, this implies that the rest have a lower share. As their income drops relative to the average income, their air travel propensity would tend to drop as well, and there are far more of them than those in the top 10% of households, whose air travel propensity in any case is probably not as greatly affected by changes in their income as lower income households. Another effect of including the changing income distribution in the model is that the coefficient of average household income is much higher than would typically be found in a model that does not include this effect, implying an elasticity of demand with respect to average income of over two. This too is not unreasonable since as real incomes rise, one would expect households to spend an increasing share of their income on discretionary spending, including air travel, particularly higher income households. However, from 1990 to 2010 the positive effect of increasing average real income in air travel demand has been partly offset in the model by the negative effect of the increase in the percentage of total household income received by the top 10% of households. The overall fit of the enplaned O&D passenger projections using the final model to the actual data is shown in Figure 14. The overall model fit is very close. Of course, one would expect a Figure 14. Comparison of projected and actual enplaned O&D passengers for BaltimoreâWashington region (final model specification), 1990 to 2010.
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 95 nearÂperfect fit from 2001 on, since this is forced by the choice of dummy variables. However, the fit from 1990 to 2000 (which did not include the effect of any dummy variables) is essentially unchanged from that given by the same model specification but without the dummy variables that were estimated on data for 1990 to 2000. Although the model fits the actual enplaned O&D passenger data closely, there are a numÂ ber of important caveats that are discussed in more detail in Appendix D. The first is that the model is based on the socioeconomic characteristics of the BaltimoreâWashington region, but the enplaned O&D passenger data includes both air trips by residents of the region and by visiÂ tors to the region. The model thus implicitly assumes that the proportion of resident to visitor air passengers is constant, so that relationships based on the characteristics of the residents of the region also predict changes in visitor trips. The second is that the model uses U.S. average airline yield as a surrogate measure for the average airfares that were available at the airports serving the region. To the extent that these average airfares differed from the national average yield, this could have introduced biases or errors in the model. However, beyond these concerns the estimated values of the dummy variables provide a strikÂ ing, and unexpected, indication of the effect of factors beyond the variables included in the model during the period for 2001 to 2010. The increase in the effect of the dummy variables each year from 2002 to 2010 raises some unexpected and interesting issues. It seems unlikely that potential air travelers would be finding the hassle or inconvenience of the postÂ9/11 secuÂ rity measures increasingly onerous 8 years after they were introduced, so the dummy variables may be reflecting other factors that were changing progressively from 2002 to 2010 but are not included in the model. Either way, the dummy variables reflect the effect of these factors (whatÂ ever they are) on demand. Improvement from Including the Disaggregated Household Income Variable To quantify the contribution to final model of the disaggregated household income variable, the model was reÂestimated without the disaggregated income variable. The revised model gave a poorer fit to the data, with an adjusted R squared of 0.985 compared to 0.994 for the final model, in spite of having one more degree of freedom. The estimated coefficients had lower statistical significance with the exception of the coefficients for airline yield and employment. However, the increase in statistical significant for those two variables was fairly small and the estimated values of both coefficients were considerably larger than for the final model and in fact appear too high. The overall fit of the revised model to the actual traffic was not as close as the final model, parÂ ticularly for the period from 1990 to 2000, although it would probably be considered a perfectly acceptable fit to the data for most air passenger demand studies. However, the other important difference between the two models is the difference in the estimated coefficients for average household income, average airline yield, and employment per person, which have significantly different implications for the effect of future assumed changes in these variables. In particular, assuming that the relative household income distribution (measured by the percent of total income received by the top 10% of households) remains constant, the elasticity of demand with respect to average household income changes from 2.05 in the final model to 0.97 in the model without the disaggregated household income variable. This difference points out that failing to take account of changes in income distribution in air passenger demand models can result in significant biases in estimated model coefficients. In order to understand how well these two models would predict future enplaned O&D pasÂ senger traffic, exÂpost forecasts were performed using both models for the period from 2010 to 2015, for which the actual traffic and values of the socioeconomic data were available. This required assumptions about values of the dummy variables for 2002 on and 2003 on for the years
96 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies after 2010. It was found that keeping their combined effect constant at their values in 2010 gave a closer fit to the actual traffic than allowing them to continue changing at the rates estimated for the period up to 2010. Under this assumption, the projected traffic levels in each year from 2011 to 2015 given by the final model underestimated the traffic in 2011 by less than 1% and overestimated the traffic in subsequent years by between 1% and 3%, with an average error over the 5Âyear period of a 1.7% overestimate. In comparison, the revised model without the disagÂ gregated household income variable underestimated the traffic in all 5 years by between 1% and 4%, with an average underestimate of 2.2%. Although this difference between the two models may not seem very large, if the two models are used to develop projections over a longer time frame, as is typically done for airport master planning and similar studies, the difference in the estimated model coefficients can lead to very different outcomes. For example, developing a 25Âyear forecast from 2015 to 2040 assuming an annual increase in real household income of 1.8% per year (approximately the average annual increase in real household income from 1990 to 2010) and holding the other socioeconomic factors constant (including the relative distribution of household incomes and the average airÂ line yield) gave an increase in enplaned O&D passengers per capita of 149% using the final model but only 49% using the model without the disaggregated household income variable. Summary and Conclusions The two most important findings from the more detailed demand modeling of the BaltimoreÂ Washington region, beyond the greatly improved fit of the modeled enplanements to the actual data, are (1) that including a variable for the percentage of total household income received by the top 10% of households by income not only improved the model fit to the data but signifiÂ cantly changed the effect of average household income on predicted air passenger enplanements, and (2) that the changes in enplaned passengers after 2001 appear to be greatly influenced by factors beyond the continuous variables included in the models that reduced the demand well below the levels that would have been expected from relationships estimated on the period from 1990 to 2000. While some of these factors are well recognized, such as the changes in the security measures after the 9/11 terrorist attacks or the 2007 recession, the dummy variables included in the model allow these effects to be quantified. More importantly, these effects do not appear to have dissipated by 2010 and may have continued at least for several years thereafter. Thus the more detailed analysis of the BaltimoreÂWashington Region confirms the findings from the initial analysis using a much simpler model specification and a limited number of variables that including a disaggregated household income variable improves both the model fit and forecast accuracy, although the improvement over the same model with only an aggregate household income variable is not large. However, it should be noted that the exÂpost forecast comparison was only performed for a relatively short 5Âyear period and the improvement in accuracy from accounting for changes in the distribution of household incomes could increase significantly over a longer forecast period, particularly if there are future changes in household income distribution. Of course, this will not be known at the time any forecasts are prepared, but what is important is that including a disaggregated household income variable in an air passenger demand model allows forecasts to be prepared assuming different scenarios of future changes in household income distribution and an analysis to be performed of the sensitivity of forecast levels of air passenger demand to possible changes in household income distribution. Analysis of Case Study Results The case study results can be analyzed along several dimensions, from focusing on strictly statistical considerations to applying a broader perspective that takes in what the results may say about the passenger aviation industry and its evolution within the broader economy. This
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 97 discussion will begin with the results from the case study regressions that were based on models with a limited number of explanatory variables, and then turn to the more detailed analysis of the BaltimoreâWashington airport system. Several largely statistical factors contribute to the case study results. Among the baseline case study regressions, which use a single aggregate socioeconomic variable (along with the oil price variable), there is some variation in the goodness of model fit from case study to case study airport. For each individual case study airport, the model goodness of fit measure (adjusted R squared) generally does not differ much when different aggregate socioeconomic variables are used as independent variables (along with the oil price variable). This is not surprising, since for each case study airportâs service area (MSA), the individual aggregate socioeconomic variables are strongly correlatedâeach one carries similar information about changes in the regional economy as the others. This strong correlation among socioeconomic variables also extends to the disaggregated socioeconomic variable used in the alternative case study regressions: the percentage of a regionâs households with income exceeding $100,000 in 2009 dollars. Because of this, the introduction of the disaggregated household income variable as an additional independent variable, together with one of the aggregate socioeconomic variables used in the baseline regressions for each case study airport, tends to slightly increase the regressionâs adjusted R squared but also generally results in much less precise coefficient estimates. The parameter estimates for the alternative equations are less likely to be statistically significant because of the higher standard errors on the socioeconomic variables caused by the presence of two strongly correlated variables as independent regression variables. This phenomenon, called variance inflation, results from the fact that the two correlated variables (because they are so similar statistically) bring largely overlapping information to the regression estimation. This correlation between the independent variables (they are not fully independent of one another statistically) limits the extent to which the disaggregated household income variÂ able can bring additional explanatory power to the baseline model specifications with only a single socioeconomic variable. Inclusion of the disaggregated household income variable as an independent regression variable in the alternative specifications nearly always causes a significant change in the co efficient estimate for the aggregate socioeconomic variable compared to the coefficient estimate in the baseline equation. This effect may be a statistical artifact of the strong positive correlation between the disaggregated household income variable and each of the aggregate socioeconomic variables. On the other hand, the result may not be particularly surprising, because the disaggregated household income variable (the percentage of households with incomes exceeding $100,000) inherently increases as average household incomes rise, since the growth in average income moves an increasing proportion of households above $100,000. This in turn explains the strong correlation between the disaggregated household income variable and the average household income variable (as well as the other socioeconomic variables that are strongly correlated with average household income). Naturally, since the change in the disaggregated household income variable is partly reflecting the growth in average household income, including it in the model will impact the estimated contribution of the other aggregate socioeconomic variable to enplaned O&D passengers, as reflected in the model coefficients. It is true that the additional inclusion of the disaggregated socioeconomic variable in the alterÂ native regression estimations raises the modelâs goodness of fit to the actual airport enplanement data, as measured by the slight increases in adjusted R squared for the alternative regression specifications. However, these improvements are not large, which prompts the question of whether the improvement in model fit is sufficient to justify the resulting loss of statistical significance of
98 Using Disaggregated Socioeconomic Data in Air Passenger Demand Studies the estimated regression coefficients for the other socioeconomic variables. Although it may at first appear that the simpler models that use only aggregate socioeconomic variables perform well enough without the disaggregated household income variable, this places undue emphasis on model fit over the reasonableness of the estimated values of the model coefficients. What matters for a model that is to be used to project future air travel demand is not only how well it fits historical traffic data but whether the demand relationships implied by the estimated model coefficients and variables included in the model appear reasonable. Omitting any measure of household income distribution from a model implies that future levels of air travel demand are unaffected by changes in household income distribution. This not only defies common sense but runs counter to the findings of the analysis of a large number of air passenger and household travel surveys undertaken as part of the current project, which clearly show that the distribution of household incomes has a major effect on the average number of air trips that those households make each year. Although these relatively simple baseline and alternative model specifications used in the case study comparison result in relatively high goodness of fit results for several of the case study airports, this is not the case for all the case study airports and the forecast performances are mixed for both baseline and alternative case study model specifications. This suggests that other variables than the two or three included in the models may also matter for determining annual enplanements at individual airports. These other influences likely include oneÂtime historical events, such as the 9/11 terrorist attacks, or continuing effects of macroeconomic events such as the Great Recession which began in 2007. In addition, as shown in the more detailed analysis of the BaltimoreâWashington region, the oil price variable turned out to be a poor measure of changes in airfares over the estimation period of the models. In addition, the U.S. air transportation industry has itself experienced tremendous change over the period from 1990 to 2010. These industry changes have led to changes in passenger service, especially at smaller airports, that are only partially related to, or not easily attributable to, changes in the national or regional economy. These âoutside the modelâ influences on an airÂ portâs annual enplanements will reduce the explanatory power of the chosen independent model variables even if those variables can reasonably be expected to influence airport air passenger demand. Of course, what this is saying is that, to adequately reflect these external influences, one needs a more complex model. Hence, many of these conclusions about the case study models are related to the relaÂ tive simplicity of the equations used in those models. The more detailed analysis of annual enplanements for the BaltimoreÂWashington airport system also conducted as part of the case study analysis is an effort to develop model specifications that can account for some of these limitations. In particular, an alternative disaggregated socioeconomic variable was used to address the correlation between the aggregate and disaggregated household income variables used in the simpler case study regression models. Further, dummy variables were used to address the importance of historical and industry supply side changes for airport system O&D enplanements, especially in the years following the 9/11 attacks (which include the 2007 Great Recession). The statistical significance of the parameter estimates on those dummy variables indicates that, at least for the BaltimoreÂWashington region, the annual O&D enplanements for the regional airport system were strongly influenced during the period from 2001 to 2010 by factors not accounted for by the air service and socioeconomic variables included in the models. As with the initial, simpler case study models, the more detailed model of annual enplaneÂ ments in the BaltimoreâWashington system found that the inclusion of a disaggregated houseÂ hold income variable did not result in a large improvement in the (already high) model goodness of fit compared to a model relying only on an aggregate household income variable, although
Case Studies in Modeling Airport Passenger Enplanements Using Disaggregated Socioeconomic Data 99 as noted above, model goodness of fit is not the only consideration in deciding whether any particular variable adds value to a model. As found with the initial case study results, the inclusion of the disaggregated household income variable in the more detailed models of BaltimoreâWashington airport system O&D enplanements led to a significant change in the parameter estimate for the aggregate household income variable that was included in both the baseline specification and the specification that also included the disaggregated variable.