7
An Assessment of the Utility of NAOMS Data

This chapter continues with task 4 of the charge to the committee requiring that it “conduct an analysis of the NAOMS project survey data provided by NASA to determine its potential utility…. This analysis may include an assessment of the data’s validity using other known sources of information.” Issues on data quality, data validation, and estimation of rates and trends are discussed in this chapter.

7.1
DATA QUALITY

7.1.1
Anomalies

As noted in Chapter 6, NAOMS survey data in the Phase 2 release were not cleaned or modified (unlike those released in Phase 1). One can thus examine the quality of the raw data from the survey. The committee’s analysis found that a significant proportion of the non-zero numerical values were implausible, both for event counts (numerators) and for numbers of legs/hours flown (denominators). Selected examples are discussed below.

Table 7.1 shows the distributions of data values for the number of flight legs flown for all pilots who reported that they flew more than 60 flight legs in the 60-day recall period. Data for 3 of the 4 years of the survey (2002 through 2004) are shown separately.1 Note the high numbers for flight legs flown, with responses as high as 300-650 during the recall period in some cases. Even values in the 150-200 range may be unlikely for a 60-day recall period in an FAR Part 121 operation because of the limitations on pilot flying in regulations and operator scheduling policies. Further, the number of pilots who reported these numbers are not small (15 percent of the pilots reported having flown more than 150 hours). Table 7.2 shows the corresponding distributions for number of hours flown for all pilots who reported that they flew more than 150 hours. Again, note the implausibly high numbers for hours flown and their frequencies, including responses numbering as many as 400-600 hours flown during the recall period.

An equally serious problem exists with event counts. Since many of these events, such as in-flight engine failure, are rare (that is, most of the responses are zero, and only a small fraction of the pilots reported non-zero counts), it is clear that even a few anomalous values can completely distort the estimates of event rates.

1

Because the recall period was not constant during the first year of the NAOMS survey, data for 2001 are excluded from Tables 7.1 and 7.2.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 39
7 An Assessment of the Utility of NAOMS Data This chapter continues with task 4 of the charge to the committee requiring that it “conduct an analysis of the NAOMS project survey data provided by NASA to determine its potential utility. . . . This analysis may include an assessment of the data’s validity using other known sources of information.” Issues on data quality, data validation, and estimation of rates and trends are discussed in this chapter. 7.1 DATA QuALITY 7.1.1 Anomalies As noted in Chapter 6, NAOMS survey data in the Phase 2 release were not cleaned or modified (unlike those released in Phase 1). One can thus examine the quality of the raw data from the survey. The committee’s analysis found that a significant proportion of the non-zero numerical values were implausible, both for event counts (numerators) and for numbers of legs/hours flown (denominators). Selected examples are discussed below. Table 7.1 shows the distributions of data values for the number of flight legs flown for all pilots who reported that they flew more than 60 flight legs in the 60-day recall period. Data for 3 of the 4 years of the survey (2002 through 2004) are shown separately.1 Note the high numbers for flight legs flown, with responses as high as 300-650 during the recall period in some cases. Even values in the 150-200 range may be unlikely for a 60-day recall period in an FAR Part 121 operation because of the limitations on pilot flying in regulations and operator scheduling policies. Further, the number of pilots who reported these numbers are not small (15 percent of the pilots reported having flown more than 150 hours). Table 7.2 shows the corresponding distributions for number of hours flown for all pilots who reported that they flew more than 150 hours. Again, note the implausibly high numbers for hours flown and their frequencies, including responses numbering as many as 400-600 hours flown during the recall period. An equally serious problem exists with event counts. Since many of these events, such as in-flight engine failure, are rare (that is, most of the responses are zero, and only a small fraction of the pilots reported non-zero counts), it is clear that even a few anomalous values can completely distort the estimates of event rates. 1 Because the recall period was not constant during the first year of the NAOMS survey, data for 2001 are excluded from Tables 7.1 and 7.2. 

OCR for page 39
0 AN ASSESSMENT OF NASA’S NATIONAL AVIATION OPERATIONS MONITORING SERVICE TABLE 7.1 Distributions of the Number of Flight Legs Flown During the 60-Day Recall Period, for the Years 2002-2004 No. of Legs Flown 2002 2003 2004 61-100 944 907 805 101-150 267 209 175 151-200 66 58 42 201-250 13 14 6 251-300 14 9 7 301-350 2 4 2 351-400 2 8 2 401-450 3 1 0 451-500 2 3 2 501-550 1 1 0 551-600 1 0 0 601-650 1 0 0 NOTE: For Category 5: greater than 60 legs. TABLE 7.2 Distributions of the Number of Hours Flown During the 60-Day Recall Period, for the Years 2002- 2004 No. of Legs Flown 2002 2003 2004 151-200 713 716 802 201-250 234 32 18 251-300 19 6 2 301-350 2 2 1 351-400 2 3 1 401-450 2 0 1 451-500 2 0 0 501-550 1 0 0 551-600 1 0 0 NOTE: For Category 6: greater than 150 hours. The committee’s analysis showed that the problem with anomalous values is common across many event types. Table 7.3 provides selected examples. For the event AH4 (“inadvertently landed without clearance at an airport with an active control tower”), a total of 541 events were reported by 161 pilots. Of these, 4 pilots reported 10, 20, 30, and 303 events (the last response corresponding to a pilot who flew between 46 and 70 hours and fewer than 14 legs during the recall period). These 4 pilots accounted for 363 (67%) of the 541 events reported. Table 7.3 shows several other examples with unusually high numbers of events reported. If the instances of such anomalies were limited to only a few event types, one might be able to investigate them in greater detail. Unfortunately, however, the problem was extensive, with one, and often several, implausible values for many event types. There are at least two possible reasons for these anomalous values: (1) the pilots gave erroneous answers, or (2) errors were made during data entry. A verification question for high values for hours flown was included in the questionnaire, but the committee does not know if other data-audit procedures were in place to flag implausible values reported by the respondents or entered into the database. 7.1.2 Rounding Another characteristic common in the survey data was the rounding of the responses (raw data) by the respondents (pilots). A disproportionate number of observations were rounded, either to have zero or 5 as the last

OCR for page 39
 AN ASSESSMENT OF THE UTILITY OF NAOMS DATA TABLE 7.3 Examples of Implausibly High Non-Zero Counts for Events Total Number of Pilots Number of Who Reported at Unusually High Numbers of Events as Reported by Individual Pilotsa Survey Question Events Least One Event AH4: 541 161 303, 30, 20, 10. Of the 161 pilots who reported Number of times respondent inadvertently a non-zero count, 4 pilots accounted for 363 of landed without clearance at an airport with an the 541 events (or 67% of the total). active control tower. AH8: 80 24 30, 10(2), 9. Of the 24 pilots who reported a Number of times respondent experienced a tail non-zero count, 4 pilots accounted for 59 of strike on landing. the 80 events (or 74% of the total). AH13: 508 450 100, 60, 30, 12, 11. Of the 450 pilots who Number of times respondent experienced an reported a non-zero count, 5 pilots accounted unusual attitude for any reason. for 213 of the 508 events (or 42% of the total). ER6: 365 215 30(3), 20, 10, 9. Of the 215 pilots who reported Number of times an in-flight aircraft a non-zero count, 6 pilots accounted for 129 of experienced a precautionary engine shutdown. the 365 events (or 35% of the total). ER7: 132 82 30, 10, 3(2). Of the 82 pilots who reported a Number of times an in-flight aircraft non-zero count, 4 pilots accounted for 46 of experienced a total engine failure. the 82 events (or 56% of the total). GE5: 350 33 90, 70, 30(4), 20, 10(2), 9(2). Of the 33 pilots Number of times respondent went off the edge who reported a non-zero count, 11 pilots of a runway while taking off or landing. accounted for 338 of the 350 events (or 97% of the total reported). GE9: 928 240 100, 80, 50, 40(2), 20, 15, 12, 10(14). Of the Number of times respondent landed while 240 pilots who reported a non-zero count, 20 another aircraft occupied or was crossing the pilots accounted for 497 of the 928 events (or same runway. 54% of the total). a Number in parentheses refers to number of pilots who reported that number of events. digit. Figure 7.1 shows an example for the number of hours flown in Category 2 (46-70 hours) during the 4-year period of the survey. Similar problems arose with the number of events reported. This type of rounding is common when respondents cannot recall the exact numbers. The committee did not conduct an extensive analysis to assess the magnitude of the rounding bias on the computed event rates. Nevertheless, the distribution of the numbers in Figure 7.1 suggests that it may be significant. This problem could have been alleviated in part by asking the respondent to retrieve his or her logbook to verify the answers. A request along these lines could have been included in the pre-notification letter that was sent to the respondents. Finding: There are several problems with the quality of NAOMS data: • Substantial fractions of the reported non-zero counts of events had implausibly large values, as did the reported flight legs and hours flown. Simple audits to alert for such values should have been used during the computer-assisted telephone interviews and data-cleaning steps to reduce the occurrence of these problems. • It appears that respondents often rounded their answers to convenient numbers (for example, there were unusually high occurrences of numbers with final digits of “0” and “5”). The extent and magnitude of these problems raise serious concerns about the accuracy and reliability of the data. The development of appropriate strategies for handling some of these problems will require access to the unredacted data.

OCR for page 39
 AN ASSESSMENT OF NASA’S NATIONAL AVIATION OPERATIONS MONITORING SERVICE 500 450 400 350 300 Frequency 250 200 150 100 50 0 2001 46 2002 47 48 49 50 51 2003 52 53 54 55 56 2004 57 58 59 60 61 Hours 62 63 64 65 66 67 68 69 70 FIGURE 7.1 Rounding of responses for numbers of hours flown in Category 2 (46-70 hours). 7.2 EXTERNAL DATA VALIDATION 7.2.1 Comparisons with Other Data Sources Figure 7.1 One type of external validation involves comparing the attributes of the respondents in the sample to corre - R01624 sponding population data from other sources. For example, if the proportion of certain characteristics (distribution of aircraft or distribution of pilots by experience editab quite vector,levels) is le different from the proportion of the same char- acteristics in another reliable source, the survey results might not be representative. Table 4.1 in Chapter 4 shows that the distribution of the aircraft types in the NAOMS survey differed markedly from that in the BTS data, with the proportion of wide-body aircraft being over-represented in the NAOMS survey. Similarly, if other data sources were available for event counts, these sources could be used for an external validation of the counts. NAOMS representatives indicated to the committee that they saw no point in asking ques - tions to which answers could be obtained elsewhere. While this is valid point, a limited amount of redundancy is often included in surveys for the purposes of validation. The committee recognizes the potential for problems in comparing data across different sources (differences in contexts, in the way that data were collected, etc.), but such comparisons are often conducted in other surveys and have been extremely valuable. 7.2.2 use of Logbooks Another potential source of external validation is the use of respondents’ logbooks during the survey. The invitation letter requesting survey participation suggested that respondents have their logbooks readily available during the survey. However, the committee did not find information in the survey or other documents indicating

OCR for page 39
 AN ASSESSMENT OF THE UTILITY OF NAOMS DATA whether the respondents actually referred to their logbooks while answering their questions. The survey could have included a question on this matter, such as, “Are you using your logbook in providing responses during this survey?” This information would have been helpful in assessing the validity of the responses. The response to question D1 in the final section (“How confident are you that you accurately counted all of the safety-related events that I asked you about?”) provides a rough measure of a respondent’s confidence in the accuracy of the responses, but it is unclear how this information could be incorporated into the estimation process. Finding: Limited comparison of NAOMS data with those of other sources indicates an over-representation of some groups and an under-representation of others. Such sampling biases must be addressed in the estima - tion of event rates and trends. More effort should have been spent on ensuring data accuracy at the interview stage, such as by asking respondents to refer to their logbooks. Preliminary analysis of the data would likely have raised these problems in time to modify the survey implementation accordingly. 7.3 ESTIMATION AND WEIgHTINg 7.3.1 Overall Rates Consider the estimation of a particular event type, k, during a given recall period, t, in the AC survey (the issues are similar for the GA survey). Let Dkt be the number of events of type k that were observed by all AC pilots during the recall period t. Similarly, let Mt be the total number of flight units (legs or hours as appropriate) flown by all AC pilots during the recall period t. Then, the true population rate for event k during period t is Rkt = Dkt /Mt . (7.1) For example, event k may refer to landings in an occupied runway in this example, and t may denote the time period January 1 through March 31, 2003. In this case, the appropriate denominator (flight units) is the number of flight legs; for other events, it may be the number of flight hours. Let dkt be the total number of events of type k that were observed in the sample of AC pilots during the recall period t. Similarly, let mt be the total number of flight units (legs or hours, as appropriate) flown by all AC pilots in the sample during the recall period t. If the survey results are based on a simple random sample (or more generally, an equal-probability design), then the population ratio Rkt can be estimated by the corresponding sample ratio rkt = dkt /mt . (7.2) The properties of this estimate and expressions for its variance under simple random sampling can be found in most textbooks on sample surveys.2 However, there are several types of biases present in the NAOMS study that preclude the use of the simple estimate in Equation 7.2. Chapter 4 discussed various types of coverage biases. The over-representation of wide- body aircraft and under-representation of smaller aircraft in the study were noted there. In addition, the sampling probabilities of flight legs varied with the number of pilots in the aircraft, and these unequal probabilities have to be accounted for when estimating event rates. If there is sufficient information about the precise nature and the magnitude of these biases, it is possible that at least some of them can be accounted for by weighting the responses appropriately. For example, if one knew the unequal sampling probabilities for the flight legs due to the presence of multiple pilots in the aircraft, the responses could be weighted inversely by the sampling probabilities. There is extensive discussion of these methods in the sampling literature.3 However, this type of information must be documented during the planning and implementation stages of the study and does not appear to be available for the NAOMS survey. 2 See, for example, Cochran, Sampling Techniques, 1977. 3 See, for example, Lohr, Sampling: Design and Analysis, 1999.

OCR for page 39
 AN ASSESSMENT OF NASA’S NATIONAL AVIATION OPERATIONS MONITORING SERVICE For the unredacted data, event rates can be computed for a period as small as 2 months (the recall period). For the redacted data, the information is grouped into years, so periods of length of 1 year are the smallest periods for which rates can be calculated. As noted in Chapter 5, this level of categorization severely limits the usefulness of the data. It is difficult to detect any effects because of seasonal variations, short-term effects of changes in aviation procedures on safety data, and other effects likely to be of interest for safety-monitoring purposes. 7.3.2 Rates by Subpopulations In addition to the overall event rates in Equation 7.1, users of aviation safety data will also be interested in event rates for various subpopulations, such as rates by aircraft size or by pilot experience. Consider, for example, the event “landing on occupied runways,” and suppose that one wants to compare how the rate for this event varies by three subpopulations of pilot experience: low, medium, and high levels. Let Djkt be the number of flights that landed in an occupied runway (event type k) during the recall period t by pilots with experience level j. Similarly, let Mjt be the total number of flights during the recall period t by pilots with experience level j. Then, the rate of interest is Rjkt = Djkt /Mjt . (7.3) Let djkt be the number of flights that landed in an occupied runway (event type k) that were observed in the sample of AC pilots during the recall period t. Further, let mjt be the number of flights during the recall period t by pilots with experience level j in the survey. Then, if the survey results in a simple random sample of pilots and the full data are available, one can estimate the population ratio Rjkt by the sample ratio rjkt = djkt /mjt . (7.4) However, it is not possible to estimate these rates from the redacted data, as the counts djkt and mjt are not available for subpopulations. As noted for Equation 7.2, the estimates in Equation 7.4 are not valid when there are substantial biases, as appears to be the case with the NAOMS project. Since the nature and extent of the biases were not documented at the planning stage, it was not possible for the committee to examine the use of weighting or other adjustment methods to account for the biases. Finding: The intended simple random sampling for the NAOMS study would have facilitated the easy computation of event rates. However, the final sample does not appear to be representative of the target population as indicated by the limited data analysis conducted by the committee. The survey sampling literature contains many approaches that attempt to address such coverage problems, but they require detailed information on how the survey was implemented, including the type and nature of problems that were experienced, and access to the original data. 7.3.3 Estimation of Trends The most consistently articulated goal of the NAOMS project was to use survey data to learn about trends. Information on trends allows one to assess the effects of safety innovations on event rates. Preliminary analyses by the NAOMS team appear to indicate that the trends for a number of safety events were consistent over time. However, the committee did not conduct any analysis to verify the results, as it had access only to redacted data, in which the time variable was aggregated to full years. It is important to recognize that event rate biases discussed thus far in this report would not affect trends to the extent that the biases are constant over time. For example, if any biases because of nonresponse were con- stant across years, those biases would cancel out in estimates of trends. However, some type of biases may not have been constant or may have drifted over the survey period. For example, as discussed in Chapter 5, the AC questionnaire included operations and events from a broad array of aviation industry segments. If the mix of these

OCR for page 39
 AN ASSESSMENT OF THE UTILITY OF NAOMS DATA operations changed over time, this would have caused biases in the trend estimates. In addition, biases associated with subjective assessments by pilots may have changed abruptly in response to external events such as those of September 11, 2001. Finding: Many of the biases that affect the estimates of event rates may be mitigated for trend analysis to the extent that the biases remain relatively constant over time. However, the degree of mitigation might vary substantially across event types. 7.4 CONFIDENCE INTERVALS The charge to the committee asked for specific recommendations on how to compute error bars for the esti- mates, or in statistical terminology, confidence intervals. The key information needed for computing a confidence interval is the variance of the estimated event rate. Under the equal-probability sampling scheme, the variance of the simple ratio estimate in Equation 7.2 can be computed easily.4 Given the variance estimate, a normal approxi- mation is generally used to compute the confidence interval. Since these issues have been discussed extensively elsewhere, the committee will not repeat the details here. The development of confidence intervals (error bars) for the NAOMS study faces the same difficulties that were discussed for the estimates in Section 7.3 and would require knowledge of the nature and extent of the biases that was not available to the committee. Without such information, the committee cannot provide recommendations that will be useful in this particular context. 7.5 SuMMARY Careful planning of any statistical investigation, including surveys, involves the following steps: (1) the devel - opment of initial methods for data analysis and estimation, (2) the analysis of pilot data or early survey data to discover potential problems, and (3) the use of this information to refine the survey methodology. The committee was surprised by the apparent lack of these activities (or at least lack of documentation of these activities). The NAOMS team also did not conduct a formal analysis of the survey data as they became available. This was an especially serious oversight for a project with a research component in which one goal was to learn and to refine the ideas to improve the methodology. In the committee’s view, many of the problems that have been identified with the NAOMS survey might well have been detected and corrected if these aspects of the survey planning had been better executed. Finding: The committee did not find any evidence that the NAOMS team had developed or documented data analysis plans or conducted preliminary analyses as initial data became available in order to identify early problems and refine the survey methodology. This is consistent with any well-conducted research study. The final charge to the committee asks for recommendations regarding the most effective ways to use the NAOMS data. Because the committee did not have access to the unredacted data, a recommendation on this front, by necessity, only relates to the redacted, publicly available data. As in any research study, a full description of the NAOMS project and the results of any analysis should be submitted for possible publication, which will involve a peer review. Because of the problems associated with analyzing the redacted data set, discussed in Chapter 6, the analysis would have to be based on the unredacted data and would need to address challenges such as treatment of data errors and potential effects of biases on trends. However, because of the methodological and implementation problems cited in Chapters 4 and 5 as well as the difficulties associated with data analysis discussed in this chapter, the committee does not recommend using the publicly available data set of NAOMS to identify system-wide trends in the rates of safety-related events. 4 Cochran, Sampling Techniques, 1977.

OCR for page 39
 AN ASSESSMENT OF NASA’S NATIONAL AVIATION OPERATIONS MONITORING SERVICE Recommendation: The publicly available NAOMS data should not be used for generating rates or trends in rates of safety-related events in the National Airspace System. The data could, however, be useful in developing a set of lessons learned from the project.