3
Sample Surveys: Overview, Examples, and Usefulness in Studying Aviation Safety

Task 2 of the charge to the committee is to “assess the advantages and disadvantages of using a survey method to collect … a statistically meaningful data set” to estimate and characterize the safety of the NAS. This chapter begins with an overview of sample surveys, which are not widely used in the aviation community. Other federal agencies have been using sample surveys successfully for a long time. Section 3.2 discusses some of these surveys, including a detailed description of the National Crime Victimization Survey (NCVS), which shares some key features with the NAOMS survey. A few additional examples of government surveys are given in Appendix E. Section 3.3 discusses the usefulness and limitations of sample surveys for collecting aviation safety data.

3.1
OVERVIEW OF SAMPLE SURVEYS

In statistical terminology, a sample refers to a subset of the population of interest. Note that many of the aviation data sets discussed in Section 2.1 are samples, because the available data are a subset of the data sets for the whole aviation system.

Samples can be grouped broadly in two categories: probability samples and nonprobability samples. In probability sampling, the subset is selected according to a specified probability mechanism. This provides a basis for using sample data to draw appropriate statistical inference (point and interval estimates, statements about statistical bias and precision, and so on) about the population characteristic(s) of interest. The uncertainty in the estimate because of sampling variability is referred to as the sampling error or margin of error. Nonprobability sampling (such as judgment or convenience sampling techniques) does not allow one to make a similar inference about the population characteristics without additional assumptions.

The term survey refers to techniques for collecting data from the target population of interest. While surveys are generally identified with human populations (for example, opinion polls, consumer surveys, demographic and economic surveys), surveys of other types of populations (such as geological surveys and administrative records) are also common. A survey that collects data from the entire population is called a census. In most situations, however, data are collected from only a subset of the population, in which case the survey is called a sample survey. As noted above, one must use probability sampling methods in order to make statistically valid conclusions about the target population.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 10
3 Sample Surveys: Overview, Examples, and Usefulness in Studying Aviation Safety Task 2 of the charge to the committee is to “assess the advantages and disadvantages of using a survey method to collect . . . a statistically meaningful data set” to estimate and characterize the safety of the NAS. This chapter begins with an overview of sample surveys, which are not widely used in the aviation community. Other federal agencies have been using sample surveys successfully for a long time. Section 3.2 discusses some of these surveys, including a detailed description of the National Crime Victimization Survey (NCVS), which shares some key features with the NAOMS survey. A few additional examples of government surveys are given in Appendix E. Section 3.3 discusses the usefulness and limitations of sample surveys for collecting aviation safety data. 3.1 OVERVIEW OF SAMPLE SuRVEYS In statistical terminology, a sample refers to a subset of the population of interest. Note that many of the avia - tion data sets discussed in Section 2.1 are samples, because the available data are a subset of the data sets for the whole aviation system. Samples can be grouped broadly in two categories: probability samples and nonprobability samples. In prob - ability sampling, the subset is selected according to a specified probability mechanism. This provides a basis for using sample data to draw appropriate statistical inference (point and interval estimates, statements about statistical bias and precision, and so on) about the population characteristic(s) of interest. The uncertainty in the estimate because of sampling variability is referred to as the sampling error or margin of error. Nonprobability sampling (such as judgment or convenience sampling techniques) does not allow one to make a similar inference about the population characteristics without additional assumptions. The term survey refers to techniques for collecting data from the target population of interest. While surveys are generally identified with human populations (for example, opinion polls, consumer surveys, demographic and economic surveys), surveys of other types of populations (such as geological surveys and administrative records) are also common. A survey that collects data from the entire population is called a census. In most situations, however, data are collected from only a subset of the population, in which case the survey is called a sample survey. As noted above, one must use probability sampling methods in order to make statistically valid conclusions about the target population. 0

OCR for page 10
 SAMPLE SURVEYS: OVERVIEW, EXAMPLES, AND USEFULNESS IN STUDYING AVIATION SAFETY There are many ways of selecting probability samples, but the simplest method is simple random sampling. Under this method, every possible subset of a fixed size from the population has equal probability of being selected. For practical and statistical reasons, it may not always be desirable or feasible to use simple random sampling. A variety of other probability sampling techniques, such as stratified sampling, cluster sampling, and multistage sampling, as well as corresponding estimation methods, have been developed in the literature. 1 In surveys of human populations, the data are generally collected using a questionnaire as the survey instru - ment—the participants are asked to respond to a set of questions. The design of the questionnaire is critical to ensuring that the data collected are of good quality and can provide information that is generalizable to the target population. There is a vast literature on questionnaire design.2 There are also many ways of conducting surveys of human populations—for example, by mail, through telephone interviews, or in on-line surveys. Since a sample survey collects data from only a subset of the population, the estimates have sampling error. The use of probability sampling methods allows one to characterize and estimate this error. The sampling error depends on the probability sampling method used; methods for estimating the sampling error are discussed exten - sively in the literature.3 There are also several other types of errors (often called nonsampling errors) that occur commonly in sur- veys (including censuses). For example, coverage bias can occur when the sampling frame (a list of identifiable units from which to draw the actual sample, such as identification numbers, geographical coordinates, household addresses, or telephone numbers) is incomplete. The sampling frame may systematically miss some classes of population members entirely. The sampling frame may also include units that are not members of the target popula- tion. Also, failures to contact sampled subjects or nonresponse by those who are contacted can lead to additional biases. Measurement error can result when the survey instrument is poorly designed or if problems arise in the field implementation of the survey. The nature and the magnitude of added uncertainty because of nonsampling errors cannot be ascertained from the sample itself, regardless of whether it is a probability or nonprobability sample, or even if it is a census. Thus, an important part of the survey planning and implementation process is to determine ways to make these errors as small as possible.4 There are several steps in planning and implementing a good survey. For the purpose of the discussion here, the relevant steps include the following: • Identify the population of interest (the set of units from which the survey would ideally collect data in the absence of concerns over cost or respondent burden) and the characteristic(s) to be studied. • Determine the method(s) for conducting the survey (such as mail or telephone interviews) and implement - ing the survey. • Develop a sampling frame that will be used to select the sample. • Determine a sampling design, the probability sampling method and the sample size, and the number of elements to be selected (the latter depends on the sampling design and the desired precision as well as on available resources). • Design the data-collection instrument (for example, the questionnaire for a human population). • Examine possible sources of error, ways to reduce them, and ways to estimate them. • Analyze the data and report the results. Despite the presence of sampling errors, sample surveys have several advantages over censuses: • Samples are less costly than censuses. For example, for populations with large, hard-to-find, or highly 1 William G. Cochran, Sampling Techniques, Wiley, New York, 1977; and Sharon Lohr, Sampling: Design and Analysis, Duxbury Press, Pacific Grove, Calif., 1999, pp. 4-8. 2 Norman M. Bradburn, Seymour Sudman, and Brian Wansink, Asking Questions: The Definitive Guide to Questionnaire Design, Jossey- Bass, San Francisco, Calif., 2004. 3 Cochran, Sampling Techniques, 1977; and Lohr, Sampling: Design and Analysis, 1999. 4 Judith Lessler and William Kalsbeek, Nonsampling Error in Surveys, Wiley, New York, 1992.

OCR for page 10
 AN ASSESSMENT OF NASA’S NATIONAL AVIATION OPERATIONS MONITORING SERVICE dispersed (in space or time) units, the cost of locating and collecting data from the units can be high. Censuses also require more resources to train the data-collection staff. • Data from sample surveys can be collected and analyzed in a more timely manner than can data from censuses. • It is difficult to control nonsampling errors in large or difficult-to-reach populations. For example, collecting data from all the units in these populations would require a large and dispersed administrative staff, which would be harder to train and closely supervise. Nonresponse problems are also harder to manage in a larger operation. • If a census is repeated over time, it would require all the units in the population to be repeatedly monitored. For human populations, this would place considerable burden on the respondents and could lead to decreased cooperation and higher nonresponse rates. 3.2 THE uSE OF SAMPLE SuRVEYS IN THE gOVERNMENT SECTOR Sample surveys are used routinely by countries around the world to collect and analyze data in order to inform policy decisions, allocate resources, and assess national needs. Various federal agencies in the United States (for example, the Census Bureau, the Bureau of Labor Statistics, the Bureau of Justice Statistics) have been conduct - ing or sponsoring sample surveys to obtain high-quality data about the state of the economy, health, education, crime, and other issues. The U.S. decennial census is mandated by the U.S. Constitution for the purpose of allo - cating congressional seats, but virtually all other demographic information used for policy making is collected on a sample basis. Several other countries have replaced their censuses with sample surveys (for example, Germany in 1987 and France in 2004). Even in the United States, the long form for the census was replaced in 1996 by the American Community Survey, a monthly sample survey of about 250,000 households. 5 The rest of this section describes one particular survey, the National Crime Victimization Survey (NCVS), in some detail as it shares some key similarities with the NAOMS survey: it uses multiple data sources, many of which are self-reported; it is a national survey designed to collect sensitive data (crime versus aviation safety); and it must protect respondent confidentiality. It is also informative to see how this survey started and how it evolved over time. See Appendix E for additional examples of federal surveys. Several sources of crime data are used to inform U.S. policy decisions and to allocate funding for criminal justice to the states. The Uniform Crime Report (UCR) began in the late 1920s when the International Associa - tion of Chiefs of Police recognized a need for reliable data on crime in the United States in order to measure the effectiveness of local law enforcement and to provide data to help fight crime. In 1930, the job of collecting, sum - marizing, and publishing the UCR was turned over to the Federal Bureau of Investigation (FBI), which received data on monthly counts of eight types of crimes as well as the number of arrests for an additional 21 crimes from police jurisdictions. Although participation in the UCR is voluntary, over 98 percent of all police agencies in the nation reported to the UCR for at least some months in 2005.6 However, the system had some weaknesses in that some police agencies reported on a wide range of characteristics of all crimes, whereas others submitted informa - tion on a more limited set of crimes. In the 1970s, the criminal justice community determined a need for more in-depth information about reported crime incidents. The Bureau of Justice Statistics commissioned a study to determine how the UCR could be improved to meet these needs. Based on that study, the UCR was further refined into what is now known as the National Incident Based Reporting System (NIBRS). By 2007, only about 25 percent of the U.S. population was covered by a police jurisdiction reporting to the NIBRS,7 which has not yet replaced the UCR for national crime statistics. However, even if the NIBRS included data from all jurisdictions, there would still be gaps in the avail - able information about crime because not all crime is reported to the police. 5 U.S. Census Bureau, American Community Survey (ACS), Washington, D.C., available at http://www.census.gov/acs/, accessed July 15, 2009. 6 Nathan James and Logan Richard Council, How Crime in the United States Is Measured, CRS-RL34309, Congressional Research Service, Washington, D.C., 2008. 7 Federal Bureau of Investigation, NIBRS Frequently Asked Questions, Washington, D.C., April 2009, available at http://www.fbi.gov/ ucr/nibrs_general.html, accessed October 21, 2009.

OCR for page 10
 SAMPLE SURVEYS: OVERVIEW, EXAMPLES, AND USEFULNESS IN STUDYING AVIATION SAFETY Other developments were occurring in parallel. Several years of pilot tests showed that the URC seriously underestimated the level of crime in the United States and that the collection of data from the victims of crime was feasible. In 1965, President Lyndon Johnson established a commission to examine the data needs with respect to crime statistics and to propose a solution. The commission recommended in 1968 that a Justice Statistics Center be established and that a national crime survey be implemented on an ongoing basis. A multiyear period of research and experimentation was conducted by the Census Bureau, which was selected to implement the survey. The first National Crime Survey, or NCS, was conducted in July 1972. An NRC panel was asked to examine the NCS because of concerns about its data.8 Based on the recommendations in the panel’s 1976 report, Surveying Crime, the Census Bureau sharpened some questions to better define certain types of incidents for respondents and to collect additional data allowing better comparisons with other data sources. These changes were phased in from July 1986 through 1992, and the NCS was renamed the National Crime Victimization Survey. For 18 months, both surveys were administered, each to half of the sample, so that comparisons could be made between them and methods for bridging the time series could be developed. In 2005, the NCVS interviewed people in a sample of about 68,000 households. These two sources of crime data—the UCR/NIBRS and the NCVS—provide complementary information to policy makers. The NCVS provides coverage for the large number of crime incidents not reported to the police. Even for crimes that are reported, the NCVS collects information known only to the victim, such as the impact of the crime on his or her life. The Department of Justice says of the two systems that “the information they pro - duce together provides a more comprehensive panorama of the Nation’s crime problem than either could produce alone.”9 Comparison of crime rates from the two crime data-collection systems is inevitable. Some serious crime categories have high reporting rates, and for these, the magnitudes of criminal incidents obtained from the UCR/ NIBRS and the NCVS are similar. For other crime categories, the counts can be quite different. These differences have been investigated by many researchers and can be explained by the different methodologies, definitions, and error types of the two collection systems.10 Victims of crime may be reluctant to reveal details of a crime for fear of embarrassment or compromise to their safety. To encourage participation in the NCVS, victims are provided assurances of confidentiality. These promises are enforced through a variety of federal regulations, which provide restrictions on how the information victims reveal can be used (only for statistical purposes) and with whom it can be shared (the data are immune from legal processes). A program allowing researchers outside the federal data-collecting agency access to these data requires that they obtain a privacy certificate verifying the security of their data-management plan. 11 The NCVS is a useful and high-quality data-collection system for the geographically dispersed, sensitive, and diverse phenomenon of crime incidents. While this type of system can be costly, policy makers have deemed it valuable enough to justify the expense. It did not achieve its maximum utility immediately at its inception, but rather required adaptation and improvement over time. The NCVS example also illustrates some features that are relevant for NAOMS: (1) data available from a system of self-reports are not necessarily adequate in characterizing the complete picture on crime; (2) when two systems produce different estimates for certain categories of crimes, it does not necessarily invalidate the utility of either; the differences can provide insight into how to improve measurement and how to determine the most useful concepts or definitions being examined; and (3) confidentiality 8 National Research Council, Surveying Crime, Panel for the Evaluation of Crime Surveys, Bettye K. Eidson Penick, editor, and Maurice E.B. Owens III, associate editor, Committee on National Statistics, Academy of Mathematical and Physical Sciences, National Academy of Sciences, Washington, D.C., 1976. 9 U.S. Department of Justice, The Nation’s Two Crime Measures, U.S. Department of Justice, Washington, D.C., October 2004, p. 1, available at http://www.ojp.usdoj.gov/bjs/pub/pdf/ntcm.pdf, accessed June 10, 2009. 10 Michael R. Rand and Callie M. Rennison, “True Crime Stories? Accounting for Differences in Our National Crime Indicators,” Chance 15 (2002): 47-51, available at http://www.ojp.usdoj.gov/bjs/pub/pdf/tcsadnci.pdf, accessed June 12, 2009; and James P. Lynch and Lynn A. Addington, eds., Understanding Crime Statistics: Revisiting the Divergence of the NCVS and UCR, Cambridge University Press, New York, 2007. 11 See Bureau of Justice Statistics, Protection of Human Subjects and Privacy Certificate Requirements for Applicants for Funding from the Bureau of Justice Statistics, U.S. Department of Justice, available at http://www.ojp.gov/bjs/pub/pdf/bjshs.pdf, accessed March 19, 2009.

OCR for page 10
 AN ASSESSMENT OF NASA’S NATIONAL AVIATION OPERATIONS MONITORING SERVICE protections can be put in place by regulation so that the respondents can be protected at the same time that the data are serving useful purposes for policy making and research. In summary, most large-scale surveys evolve over time, and their survey methodologies are refined on the basis of experience before they attain excellence. Often, the changes are gradual, but some surveys have undergone major design changes. Examples include the NCVS as well as the National Assessment of Educational Progress. Another feature of successful government surveys is that they typically have a research team and resources to support the investigation of issues or problems of particular import. They also have a core staff dedicated to the survey’s ongoing improvement and adaptation to change. Many large-scale survey programs also develop an orga - nizational culture that fosters a professional approach to the development of survey methodology and produces staff in both technical and administrative areas that are very knowledgeable about a particular survey, its history, and its key issues. Finding: Successful large-scale surveys typically require a substantial commitment of time and resources to develop, refine, and improve the survey methodology and to ensure that the survey provides useful and high-quality data. 3.3 uSEFuLNESS OF SAMPLE SuRVEYS FOR ASSESSINg AVIATION SAFETY When NAOMS was proposed, the available sources of data on aviation safety included the following: (1) accident and incident data from the NTSB and the FAA, (2) data from the FAA’s NASA-operated ASRS, (3) FAA’s Near Midair Collision Database, and (4) FAA’s Operational Error Detection Program. As noted in Section 2.1, the NTSB and FAA accident and incident databases include only incidents that meet certain thresholds and so do not include all potentially unsafe occurrences. The ASRS database consists largely of self-reports of incidents by pilots. Though it is large and rich in information, it is not a probability sample, so it is impossible to obtain statistically valid estimates from ASRS data. The same limitation holds for the Near Midair Collision Database. The Operational Error Detection Program data cover only aircraft operating in controlled airspace, so planes flying under visual flight rules, which include a high proportion of general aviation flights, would not be covered. In recent years, the use of onboard data-acquisition systems to collect aircraft operations data is becoming common. Now, virtually all new commercial airliners and most high-end business jets are equipped with flight data recorders, which provide the basis for FOQA systems that provide detailed information about flight operations. These systems are not affected by the types of measurement errors that are present in surveys of pilots or other personnel. However, as noted in Section 2.1, FOQA data do not provide a complete picture of the entire airspace, as piston-engine and turboprop general aviation aircraft are not typically equipped with these data-collection systems. Even if it were possible to obtain FOQA data for the entire population of aircraft in the U.S. airspace, the resources involved in assembling the data from all the air carriers and in ensuring privacy and confidentiality so that the data could be shared among all the carriers, the government, and the public would be very high. The use of probability sampling techniques can be useful here, as one could collect and analyze a sample of the database that takes into account privacy and confidentiality considerations in order to obtain timely information at reasonable costs. The Aviation Safety Information Analysis and Sharing System (discussed in Section 2.1) has made progress and shows promise in allowing the access to, analysis of, and integration of multiple large aviation safety data sets. As it continues to develop and as more data sets are added, it will become more comprehensive. However, even then it is unlikely to cover the entire aviation system, particularly general aviation and small commercial carriers operating in remote locations. In addition, as more databases are added, issues of privacy and confidentiality are likely to take on increasing importance before the data can be shared among all parties and with the public. Sample surveys can be used to provide new or supplemental information about aviation safety, even in the presence of these other data-collection efforts. The scope and usefulness of NAOMS are explored in detail in the next two chapters, but generally speaking, NAOMS was an attempt to capture the experiences of the frontline personnel (pilots, flight attendants, air traffic controllers, and mechanics) regarding flight operations and aviation safety. In the committee’s view, such information could be potentially useful, particularly in those segments of

OCR for page 10
 SAMPLE SURVEYS: OVERVIEW, EXAMPLES, AND USEFULNESS IN STUDYING AVIATION SAFETY aviation that are not well covered by the other databases. In addition, carefully planned surveys can provide useful information not only about specific events, but also about the views and perceptions of the frontline personnel on flight operations. However, care must be taken to solicit information from these frontline personnel only when they are in a position to provide accurate and consistent responses. Finding: A sample survey is a scientifically valid and effective way to collect data and track trends about events that are potentially related to aviation safety. The sample survey has several advantages over other, currently available, data sources: • Sample surveys can be used to collect reliable information about all segments of civilian aviation. They can be especially useful for characterizing the safety of general aviation flights and the safety of flights of other segments of aviation where the data are more limited. • Sample surveys have the potential to generate statistically valid information about operations that may or may not result in an accident or incident. This information would provide a useful reference point for studying other event data and for learning why some events lead to accidents while other, similar events do not. • government-sponsored sample surveys can produce data that are accessible to the public and can be analyzed regularly and independently. However, information from any survey should be used in conjunction with other existing data to provide a holistic assessment of aviation safety levels and trends.