Read "An Assessment of NASA's National Aviation Operations Monitoring Service" at NAP.edu

Page 16 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

4
Assessment of NAOMS Sampling Design

As noted in Chapter 1, the goals of the NAOMS survey were to estimate event rates and trends in the rates for a variety of safety-related events. This chapter assesses the impacts of the sample design and potential coverage biases on the accuracy of these estimates.

4.1
INTRODUCTION

Research by the NAOMS team indicated that there were more than 600,000 active pilots in 1998, with about 130,000 having Airline Transport Pilot Certificates.¹^,² The team identified two ways to build the sampling frame of pilots for the NAOMS survey: use the FAA’s Airmen Certification Database (ACD)³ or partner with industry trade groups and unions to obtain pilots’ contact information. The team eventually decided to use the ACD because it was logistically simpler and did not run the risk of compromising the independence of the survey.

The NAOMS team decided that pilots in the ACD who met four criteria—(1) being based in the United States, (2) having an airline transport pilot (ATP) certificate, (3) having a multi-engine rating, and (4) having a flight engineer (FE) certificate—were eligible for selection for the AC survey. All other pilots were eligible for the GA survey. Budgetary and statistical considerations led the NAOMS team to set a goal of 8,000 completed AC survey questionnaires each year.⁴

The criteria outlined above resulted in a pool of 52,570 pilots for the AC survey.⁵ The NAOMS team further narrowed the sample pool by eliminating any pilot who could not be linked to a telephone number.⁶ Actual

¹	Battelle, NAOMS Reference Report, 2007, Appendix 2, p. 2.
²	For a description of the Airline Transport Pilot Certificate, see Federal Aviation Administration, Airline Transport Pilot, Aircraft Dispatcher, and Flight Navigator Knowledge Test Guide, Washington, D.C., September 2008, available at http://www.faa.gov/training_testing/testing/airmen/test_guides/media/FAA-G-8082-1D.pdf, accessed July 15, 2009.
³	Available at http://www.faa.gov/licenses_certificates/airmen_certification/interactive_airmen_inquiry/, accessed July 15, 2009.
⁴	Mary Connors, Chief, Aviation System Safety Research Branch, NASA Ames Research Center, presentation at Meeting 3 of the National Research Council (NRC) Committee on NASA’s National Aviation Operations Monitoring Service (NAOMS) Project, NASA Ames Research Center, Moffett Field, California, January 14, 2009, p. 14.
⁵	Battelle Memorial Institute, NAOMS Completion Rate Summary: Air Carrier and General Aviation Surveys, Columbus, Ohio, August 31, 2008, p. 4.
⁶	According to its NAOMS Completion Rate Summary, Battelle could not “obtain good telephone numbers” for 9,480 out of 52,570 pilots (3,590 out of 12,363 for the GA survey).

Page 17 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

implementation of the survey reached approximately 7,000 air carrier pilots each year.⁷ A total of 29,882 pilots were surveyed for the NAOMS study over the period from April 2001 to December 2004.⁸ Of these pilots, 25,105 participated in the AC survey. The GA survey was conducted for a much shorter period (August 2002 through April 2003) and involved 4,777 pilots. Groups of pilots for both surveys were selected monthly using a simple random sampling from the public ACD.

The sampled pilots were contacted first by mail with a pre-notification letter from the NAOMS team. This letter was followed by a telephone call during which the survey was administered. If the respondent was not available, a callback time was arranged. The survey questionnaire included a computer screen to allow checking for qualifying activity during the recall period, which consisted of the last n days before the survey, with the number n varying initially from 30 to 90 days but fixed at 60 days after March 2002. The survey was conducted by professionally trained interviewers using a computer-assisted telephone interview system.

Each pilot who responded to the survey was asked a set of questions—described in detail in Chapter 5—about his or her background, the number of hours and flights flown, the number of numerous possible safety-related events observed, some topic-specific questions, and feedback about the survey. The information in the responses was restricted to the recall period.

4.2
TARGET POPULATION AND SAMPLING METHOD

For the NAOMS surveys, the two target populations were all flight legs meeting the criteria for AC and GA during the recall period. The NAOMS questionnaires⁹ indicate that the qualifying AC flight legs were intended to be those conducted under Federal Aviation Regulations (FAR) Part 121 (under which the major passenger and large cargo airlines such as FedEx fly).¹⁰ Considering air carrier operations as those operating under Part 121 is consistent with the practices of the U.S. Department of Transportation.¹¹ The flights of interest in the GA questionnaire were those conducted under FAR Parts 91¹² and 135. However, because FAR Part 135 governs the operation of scheduled commuter air carriers and on-demand, for-hire air taxi and charter providers,¹³ the inclusion of flights operated under Part 135 in the general aviation survey extended the notion of general aviation well beyond normal usage of the term. In its general aviation safety statistics, the U.S. Department of Transportation specifically excludes Part 135 operations and considers Part 135 scheduled operations to be a segment of aviation separate from both general aviation and also from Part 135 on-demand operations.¹⁴ The GA survey did not collect the information that would have enabled events from these disparate segments to be disaggregated.

The ideal sampling frame for this population would be the list of all flight legs that occurred in the appropriate flight regimes, that is, Part 121 flights in the AC survey and Part 91 and Part 135 (given the NAOMS definition of general aviation) in the GA survey during the recall period. However, collecting data for a simple random sample of flight legs would not have been economical or even feasible. The NAOMS team decided to draw samples of pilots and to ask them about all events that occurred during the recall period. This strategy results in a cluster sampling of flight legs: pilots are the primary sampling units, and all the flights flown by the sampled pilots during the recall period are then included in the sample.

Such a cluster sample of flights differs from a simple random sample in several ways. In particular, the flight

⁷	Battelle, NAOMS Reference Report, 2007, p. 34.
⁸	Ibid., p. 13. Other sources provide slightly different numbers, in part because of reclassifications, different data releases, and so on.
⁹	Ibid., Appendixes 11 and 12.
¹⁰	FAR Part 121 refers to a section of the FAA Federal Aviation Regulations that prescribes safety rules governing the operation of air carriers and commercial operators of large aircraft. The term Part 121 carriers refers to carriers operating under these regulations; see Air Transport Association, The Learning Center: Glossary, 2009, available at http://learningcenter.airlines.org/Pages/Default.aspx?Filter=p, accessed July 15, 2009.
¹¹	See, for example, Bureau of Transportation Statistics, National Transportation Statistics 2009, Research and Innovative Technology Administration, U.S. Department of Transportation, Washington, D.C., 2009, Table 2-9: U.S. Air Carrier Safety Data.
¹²	FAR Part 91 refers to a section of the FAA Federal Aviation Regulations that includes principally general aviation.
¹³	Air Transport Association, The Learning Center: Glossary, 2009.
¹⁴	See, for example, Bureau of Transportation Statistics, National Transportation Statistics 2009, Table 2-9: U.S. Air Carrier Safety Data.

Page 18 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

legs of any particular pilot are either sampled or not sampled as a group. This typically reduces the information content relative to a simple random sample of the same size because the responses within clusters are likely to be correlated.¹⁵ However, it is often much more economical to use cluster sampling, in which case cost reductions lead to greater overall efficiency. This is the case with the NAOMS survey. Cluster sampling occurs in many surveys—for example, samples of students within schools or patients within hospitals.

A second problem arose in the NAOMS survey owing to the fact that there can be multiple pilots on any given flight. This implies that the probability of sampling flight legs varies with the number of pilots on the flight. There is also a non-zero (although small) probability that a flight leg will be selected multiple times in the sample. Estimates of rates of events must account for these unequal sampling probabilities, but this requires knowledge of the probabilities, which is difficult to obtain. Section 7.3 discusses this issue further.

Finding: The decision to use pilots as the sampling units in the NAOMS project was appropriate for efficiency reasons. While this led to a cluster sampling of the basic units of interest (flight legs or hours), the costs involved in sampling flight legs would have been prohibitive. However, since there can be multiple pilots on flights, this scheme results in unequal sampling probabilities for flight legs. While the NAOMS study team was aware of the problem, the team did not examine its extent or consequences and the team did not develop methods to address the problem for estimating rates of events.

There are two issues with the use of the sampling frame in the NAOMS survey. The first is whether the appropriate pilots were sampled, which is discussed in the next section of this chapter. The second issue is whether the selected pilots’ flight legs were confined to those that are in the operations of interest, as many pilots fly in more than one type of operation during the recall period. This issue is addressed in Chapter 5.

4.3
COVERAGE ISSUES

4.3.1
Opting Out of the Database

One source of potential problems with the publicly available version of the ACD is that starting in 2000, the FAA allowed pilots to “opt out” of the database. This opt-out option resulted in an incomplete sampling frame. The NAOMS project team considered other options, including the possibility of obtaining pilot names from industry trade groups and/or organized labor. Those options were not adopted because of the challenges of merging multiple lists and because of concerns about limitations that list providers might place on the project.

The committee was able to get data on opt-outs only for 2008, which showed that only 6 percent of all pilots opted out. However, pilots with certificates associated with commercial activity opted out at much higher rates: 20 percent for those with ATP certification and 36 percent for those with FE certification.¹⁶ The apparently large coverage gap for the AC sampling frame raises the potential for substantial bias in observed outcomes if event rates for pilots who opted out differed from those who did not.

The public version of the ACD was also used as the initial sampling frame for the GA survey that was conducted during August 2002 through April 2003. The opt-out provision does not appear to have posed much risk to the GA sampling frame. In 2008, coverage of pilots without an FE certificate (the closest available approximation to the GA sampling frame) was 96 percent.¹⁷

Finding: If the event rates for pilots who opted out of the public Airmen Certification Database differed considerably from those who did not, the high opt-out rates would have resulted in substantial biases in the AC survey. The use of the ACD does not appear to have been a serious problem for the GA survey.

¹⁵	Cochran, Sampling Techniques, 1977.
¹⁶	Harold Everett, Manager, Airmen Certification Branch, Civil Aviation Registry, FAA, personal communication to Anthony Broderick, Member, NRC Committee on NASA’s National Aviation Operations Monitoring Service (NAOMS) Project: An Independent Assessment, June 13, 2008.
¹⁷	Ibid.

Page 19 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

4.3.2
Stratification for Creating the AC and GA Sampling Frames

Most pilots in the public ACD are not commercial pilots. Consequently, a random sample from the ACD would include many pilots who are ineligible for the AC survey. (At the beginning of the NAOMS study, the ACD included about 640,000 pilots, while estimates for the number of AC pilots ranged between 75,000 and 90,000.¹⁸) Because there was no information in the database to directly identify commercial pilots, the use of the full ACD as a sampling frame for the AC survey would have required contacting and screening at least seven times as many pilots as were desired for the final sample.

The NAOMS team deemed that this screening would be too costly, so it decided instead to filter the ACD for pilots with certifications indicative of pilots who fly for air carriers. Specifically, the AC sampling frame was limited to U.S.-based pilots who had an ATP certificate, multi-engine rating, and an FE certificate. However, some active AC pilots do not meet all of these criteria. Many AC pilots, including captains and first officers, do not hold an FE certificate. In addition, it is not necessary for a first officer to hold an ATP certificate; a commercial certificate is all that is required to be a first officer. Thus, the criteria used in the NAOMS survey meant that some members of the target population were eliminated from the sampling frame.

The GA sampling frame was essentially the complement of the AC frame—that is, it included all of the pilots in the ACD who lacked one of the certificates needed for inclusion in the AC frame.¹⁹ Before starting with the GA questionnaire, pilots were asked about their involvement in various types of flying activity during the preceding 60 days. Pilots with AC activity but no GA activity were administered the AC questionnaire. Pilots with both AC and GA activity were administered either the AC or GA questionnaire at random. In contrast, pilots selected from the AC sampling frame were not screened for administration of the GA questionnaire.

Table 4.1 shows the raw distribution of the aircraft types in the NAOMS survey for the 4-year period (2001 through 2004). Information from the U.S. Department of Transportation and Bureau of Transportation Statistics is also shown for comparison. It is clear that wide-body aircraft are over-represented in the NAOMS survey and that small aircraft are under-represented. There are several possible reasons for these differences. They could be due, at least in part, to the requirement for an FE certificate. In the earlier days of jet travel, the typical cockpit crew consisted of three people (a pilot, a first officer or copilot, and a flight engineer), and it was common to start as a flight engineer, advancing to the first officer and then to the pilot position. More recently, jet aircraft have been designed to eliminate the flight engineer position and require only a pilot and a first officer. As a result, pilots with an FE certificate are more likely to be older and more senior pilots and are also more likely to be flying wide-body aircraft. A second possible reason for the difference in representation of wide-body and small aircraft in the NAOMS survey is the unequal sampling probabilities for the flight legs owing to the differences in the number of pilots in the aircraft, with wide-body aircraft having more pilots than other aircraft. However, in the committee’s view, the large disparities in the numbers in Table 4.1 are more likely to have been caused by the FE certificate requirement.

To the extent that these differences led to different event rates, estimates from the AC sample would be biased. Appropriately weighting the results by aircraft or aircraft type (for example, by giving more weight to results from pilots who flew small aircraft) might reduce the size of some of these biases (see the discussion in Chapter 7). But weighting of the data from the main AC sample could not compensate for biases within aircraft type owing to pilots without one of the certificates required for the AC sampling frame.

Just as the AC sampling frame excludes some of the AC target population, the GA sampling frame excludes some of the GA target population. Pilots with the certifications required for the AC sampling frame (and therefore excluded from the GA frame) can and often do participate in general aviation. Because pilots contacted as part of the AC survey were not asked about GA activity, there does not appear to be any way to estimate the size of the coverage gap with the NAOMS data.

Finding: The NAOMS AC sampling frame was restricted to pilots who hold ATP and FE certifications. This restriction excluded many active air carrier pilots and appears to have led to biases such as over-representation of wide-body aircraft and under-representation of small aircraft in the NAOMS sample. The

¹⁸	Connors, presentation to NRC Committee on NAOMS, 2009, p. 13.
¹⁹	Ibid., p. 17.

Page 20 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

TABLE 4.1 Proportion of Aircraft by Model Size in NAOMS Survey Compared to Bureau of Transportation Statistics (BTS) Data for the Same Period

Model Size	2001		2002		2003		2004		2001-04
Model Size	NAOMS	BTS	NAOMS	BTS	NAOMS	BTS	NAOMS	BTS	NAOMS	BTS
Wide-body	29.0	17.2	27.8	17.9	29.3	16.9	30.9	16.3	29.2	17.0
Large	17.0	13.7	15.8	14.0	15.4	25.2	16.3	12.0	16.0	16.1
Medium	49.1	52.6	48.6	52.5	48.1	38.9	47.4	48.7	48.3	48.2
Small	4.9	16.5	7.8	15.6	7.2	19.1	5.3	22.9	6.5	18.7
SOURCE: Data gathered from NASA, NAOMS 2008 Air Carrier Responses by Category , NASA, Washington, D.C., available at http://www.nasa.gov/news/reports/NAOMS_08_ac_resp_category.html, accessed June 11, 2009; and Bureau of Transportation Statistics, Air Carrier Summary Data (Form 41 and 298C Summary Data), T2: U.S. Air Carrier Traffic and Capacity Statistics by Aircraft Type, Aircraft Type Analysis, Bureau of Transportation Statistics, Washington, D.C., March 2009, available at http://www.transtats.bts.gov/Fields.asp?Table_ID=254, accessed July 22, 2009.

NAOMS study team should have investigated the potential impact of these biases and evaluated alternatives such as the use of less stringent filters.

In summary, the NAOMS team faced substantial challenges in developing and implementing sampling designs for the AC and GA surveys. As with most real applications, the team had to make compromises in the final design—most notably in the development of the sampling frames. Stratification based on certification status and, to a lesser extent, the use of the public version of the ACD both introduced the potential for bias in results from the AC sample. While neither decision was made without reason, the NAOMS study team should have investigated the potential impact and magnitude of these biases in order to evaluate these and alternative decisions. Such analyses are critical both for understanding the value of the data being collected at that time and, more importantly, for learning how to improve the sample design for ongoing data collection.

4.3.3
Failure to Locate and Other Nonresponse Issues

Two steps during the field implementation of the two surveys may have contributed to additional coverage errors in the AC and GA samples. One was the failure to locate sampled pilots for whom telephone numbers could not be obtained. The second was noncompletion (refusal to participate) by pilots.

Because the questionnaires were administered by telephone, the project team needed current telephone numbers to contact sampled pilots. Telephone numbers were not available on the public ACD, so NAOMS used a service called Telematch to find telephone numbers based on names and addresses from the ACD, supplemented by change-of-address information from the Post Office. NAOMS tried to interview only those pilots for whom it could obtain telephone numbers, either from Telematch or in response to a mailing to the best address on record. These procedures yielded location rates of 82 percent and 71 percent, respectively, for the AC and GA samples. These rates are reasonable, given that the address information in the ACD was likely to be out-of-date.

Estimates of the rates of events will be biased if the pilots who were not located had rates of events substantially different from those of located pilots. The committee has no reason to expect that such a difference exists between located and nonlocated pilots, although additional data would be needed to verify that failure to locate is not a source of bias.

Nonresponse occurred at two other points for both surveys—at initial contact and after screening for eligibility. Because the NAOMS team could not know whether initial nonresponders were eligible, there was no way to compute response rates for eligible cases. Assuming independence of eligibility and initial response, the committee computed completion rates of 85 percent and 70 percent, respectively, for the AC and GA samples. This response rate for the AC survey is excellent by most standards for a survey of this length, possibly reflecting the interest

Page 21 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

level of commercial pilots in aviation safety. While lower, the GA response rate is also quite good. However, because the decision to cooperate with this type of survey might be influenced by recently experienced safety-related events, data comparing respondents with nonrespondents would be particularly valuable for assessing the potential bias due to nonresponse.

Taking into account both failure to locate and noncompletion, the estimated overall response rates were 69 percent²⁰ and 50 percent, respectively, for the AC and GA surveys.

Finding: The NAOMS team should have collected supplemental data to assess the potential biases related to opt-out issues, the certificate requirement, and nonresponse issues during the early phase of the survey. An analysis of the additional data would have provided a more reliable assessment of the various biases and may have led, if necessary, to the development of alternative sample design strategies to address the problems.

4.4
CROSS-SECTIONAL DESIGN VERSUS PANEL DESIGN

The term cross-sectional design refers to the selection of different samples of respondents at each time period. Panel design, on the other hand, involves the selection of groups of respondents who participate in the survey for a period of time (two or more successive periods). The NAOMS team evaluated both approaches during the first year of full operation of the survey by randomly assigning half of the participants to each design. The results of the first year indicated that the NAOMS survey would achieve a very high response rate and quality if the cross-sectional design was used.

The NAOMS team finally settled on a cross-sectional design. The committee agrees with this decision for several reasons:

Response rates from panel designs are generally lower than those from similarly designed cross-sectional designs.²¹ For example, respondents are less likely to participate in a survey if they are faced with the prospect of responding repeatedly to the survey over time. This problem may manifest itself as attrition, reducing follow-up response rates over time.
A panel design would have risked “conditioning effects,” by which a pilot’s responses systematically change as a result of having taken the same survey previously. For example, respondents might become more sensitized to certain events after being asked about those events. Alternatively, a respondent who reports an event during one period may be less inclined to report a similar event in a later period because it seems “less interesting.” While it is hard to know whether such effects would lead to more, or less, accurate reports, the possibility of their existence would complicate the interpretation of a survey with a primary goal of estimating trends.
Finally, maintaining the confidentiality of pilots would be more difficult with a panel design.

Finding: The choice by the NAOMS team of a cross-sectional design over a panel design was appropriate.

4.5
RECALL PERIOD

The NAOMS project needed to specify an appropriate recall period (the previous n days from the date of the interview) for the survey. The NAOMS team conducted research as part of its planning process to determine some of the impacts of using a different recall period. As may be expected, the tests found that longer recall periods resulted in more reported events but lower rates of reported events, as well as a decline in the respondents’ confidence in the accuracy of their responses. The team experimented with 30, 60, and 90 days in the initial stages of the study and eventually settled on a 60-day recall period.

²⁰	The committee’s estimate of 69 percent differs from the results of the calculations conducted by the GAO because the latter calculation assumed (1) that all pilots who were located but not screened for eligibility were indeed eligible and (2) that the eligibility proportion was higher for nonlocated pilots than for those who were eventually screened.
²¹	Nicole Watson and Mark Wooden, “Identifying Factors Affecting Longitudinal Survey Response,” in Peter Lynn, ed., Methodology of Longitudinal Surveys, Wiley, New York, 2009.

Page 22 Cite

Suggested Citation:"4 Assessment of NAOMS Sampling Design." National Research Council. 2009. An Assessment of NASA's National Aviation Operations Monitoring Service. Washington, DC: The National Academies Press. doi: 10.17226/12795.

×

The committee did not undertake a review of the methodology used by the NAOMS team to determine the recall period because it did not have access to data from these studies. Therefore, the committee cannot comment authoritatively on the choice of a 60-day recall period versus a recall period of a different length. However, analysis of the redacted survey data (discussed in Chapter 7) indicated several problems: (1) considerable rounding effects in reported numbers of hours and flight legs flown (see Figure 7.1 in Chapter 7) and (2) a high fraction of anomalous data for both Section A (number of hours/legs flown) and Section B (event counts) of the survey questionnaires. It is possible that these problems are not related to the recall period. Nonetheless, it is surprising that Battelle’s final report notes the existence of a “downward bias” without a systematic investigation of the size of the bias, how it varies across the different types of events, and so on.²² This information is critical to the validity of the survey results, especially given the rare nature of some of the events being surveyed. Finally, the committee agrees with the Battelle report that the effect of the bias would be smaller on trends than on actual rates, provided that the nature of the bias remains constant over time—hence the need to investigate the magnitude and nature of the potential biases.

4.6
DATA-COLLECTION METHOD

The NAOMS team considered three different ways to conduct the survey: in-person interviews, self-administered questionnaires, and computer-assisted telephone interviews. Each type was weighed against several criteria, including cost, respondent satisfaction, response rate, and quality of the data.

The NAOMS team tested the different methods of conducting the survey in a field trial. Early in the field trial, the NAOMS team determined that in-person interviewing required too much time and cost, and it was dropped from consideration. The results of the trial also demonstrated clear differences between the CATI system and the self-administered questionnaires. While the CATI system took longer to complete and cost more, it had a higher response rate and fewer incomplete responses. In the end, the team opted to use the CATI system for the full implementation of the survey.²³

The NAOMS team decided to use professionally trained interviewers rather than aviation-safety professionals to conduct the interview. There are advantages and disadvantages to both options. The advantage of using aviation-safety professionals lies in their ability to clarify the intent of the questions or to ask the respondents to verify answers that seem implausible. However, there is the possibility of interviewer bias, by which the interviewer may lead the respondent in the direction of expected responses, and it would be difficult to quantify the nature of the assistance and clarifications provided by the interviewer and whether the interviews led to reproducible answers. For this reason, most surveys use professionally trained interviewers who have no subject-matter knowledge in the area of investigation. On balance, if the survey instrument is well defined and the possible questions and ambiguities are anticipated and addressed, the use of professionally trained interviewers will lead to more statistically reliable results.

Finding: The decision by the NAOMS team to use professionally trained interviewers was reasonable. The use of the CATI method for the survey was also appropriate.