CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE
This chapter presents the committee’s approach to identifying the literature for review and its considerations in evaluating the strength of evidence presented in that literature. It provides information about the types of literature the committee identified how the committee assessed the strength of the evidence, and the categories of association that the committee used to summarize its findings. The committee’s approach was similar to that used in Gulf War and Health, Volume 1: Depleted Uranium, Pyridostigmine Bromide, Sarin, and Vaccines and Gulf War and Health, Volume 2: Insecticides and Solvents (IOM 2000b, 2003a). For each agent under consideration, the committee determined, to the extent that available published scientific data permitted, the strength of the evidence of associations between exposure to the agent and adverse health outcomes. The committee reviewed available epidemiologic studies of Gulf War veterans and epidemiologic studies of other populations known to have been exposed to the agents of concern.
As discussed in Chapter 1, the committee was charged with summarizing the strength of the scientific evidence regarding exposure to the putative agents and illnesses suspected to be associated with them. The legislation (PL 105–277 and PL 105–368) that directs the committee’s work did not provide a specific list of diseases or illnesses for study; the diseases and illnesses discussed in the report were those dealt with in the scientific and medical literature reviewed. As the searches were conducted on the agents of concern, the studies found identified the health outcomes for review.
The committee began its evaluation by presuming neither the existence nor the absence of associations. It has sought to characterize and weigh the strengths and limitations of the available evidence. The committee’s task was not to judge individual cases of particular diseases or conditions or to address questions of causation. Nor did the committee concern itself with policy issues, such as potential cost of compensation, policy regarding compensation, or any broader policy implications of its findings.
IDENTIFICATION OF THE LITERATURE
The committee’s first step was to identify the literature it would review. It began its work by overseeing extensive searches of the peer-reviewed medical and scientific literature (Appendix B). It identified epidemiologic studies of persistent health outcomes associated with exposure to hydrazines, red fuming nitric acid, hydrogen sulfide, oil-fire byproducts, and diesel-heater fumes, as directed by PL 105–277 and PL 105–368. At the request of the Department of Veterans Affairs, the committee also identified epidemiologic studies on persistent health outcomes associated with exposure to fuels (for example, jet fuel and gasoline) used during the Gulf War.
The searches retrieved over 33,000 potentially relevant references. All searches were completed early in 2004; relevant studies published later will be reviewed by future IOM committees. After an assessment of the titles and abstracts in the results of the initial searches, the committee focused on some 800 potentially relevant epidemiologic studies for review and evaluation. The committee reviewed epidemiologic studies of fuels and combustion products rather than the numerous components of those agents. Those studies were assessed for evidence of associations between the agents of interest and persistent health outcomes in humans. The committee used its collective judgment in selecting studies thought to reflect the types of exposures that Gulf War veterans might have experienced. Although Gulf War veterans were exposed to multiple complex mixtures, epidemiologic studies are not typically designed to address such types of exposures.
Because only a few studies were related directly to veterans’ exposures, the committee reviewed primarily occupational studies of populations that had been exposed to the agents of interest. Those studies often included people whose exposures had been over a lifetime (such as exposure to air pollution in their communities) or workers employed in particular industries over many years. In contrast, the exposures of veterans in the Persian Gulf were of relatively short duration with varying degrees of intensity. Therefore, the exposures experienced during the Gulf War might only approximate the exposures described in the occupational literature reviewed in this report. The conclusions as to statistical associations based on occupational and other types of studies are meant to serve as a guide to potential health effects associated with specific agents.
The committee adopted a policy of using only peer-reviewed published literature as the basis of its conclusions. Publications that were not peer-reviewed had no evidentiary value for the committee; that is, they were not used as evidence for arriving at conclusions about the degree of association between exposure to a particular agent and adverse health effects. The process of peer review by fellow professionals, which is one of the hallmarks of modern science, ensures high standards of quality but does not guarantee the validity of a study or the ability to generalize results. Accordingly, committee members read each study critically and considered its relevance and quality. In some instances, non-peer-reviewed publications provided background information for the committee and raised issues that required further literature searches. The committee did not collect original data, nor did it perform any secondary data analysis.
With that orientation to the committee’s task, the following sections provide a brief discussion of the value of epidemiologic studies, the committee’s inclusion criteria for review of those studies, considerations in evaluating the evidence or data provided by the studies, and the categories of association that are used to draw conclusions about the strength of the evidence presented in the studies.
Epidemiology deals with the study of the determinants, frequency, and distribution of disease in human populations. A focus on populations distinguishes epidemiology from medical disciplines that focus on the individual. Epidemiologic studies examine the relationship between exposures to agents of interest in a studied population and the development of health outcomes, so they can be used to generate hypotheses for study or to test hypotheses posed by investigators.
Epidemiologic studies can establish statistical associations between exposure to specific agents and health effects, and associations are generally estimated by using relative risks or odds ratios. To conclude that an association exists, it is necessary for exposure to an agent to be followed by the health outcome more frequently than it would be expected to by chance alone. Furthermore, it is almost always necessary to find that the effect occurs consistently in several studies. Epidemiologists seldom consider a single study sufficient to establish an association; rather, it is desirable to replicate the findings in other studies to draw conclusions about the association. Results of separate studies are sometimes conflicting. It is sometimes possible to attribute discordant study results to such characteristics as soundness of study design, quality of execution, and the influence of different forms of bias. Studies that result in a statistically precise measure of association suggest that the observed result was unlikely to be due to chance. When the measure of association does not show a statistically precise effect, it is important to consider the size of the sample and whether the study had the power to detect an effect of a given size.
Epidemiologic study designs differ in their ability to provide valid estimates of an association (Ellwood 1998). Randomized controlled trials on comparable populations yield the most robust type of evidence; cohort or case-control studies are more susceptible to bias. Cross-sectional studies generally provide a lower level of evidence than cohort and case-control studies (Appendix C). Determining whether a given statistical association rises to the level of causation requires inference (Hill 1965). As discussed by the International Agency for Research on Cancer in the preamble of its monographs evaluating cancer risks (for example, IARC 2004), a strong association in an epidemiologic study is demonstrated by the observed association in a number of different studies, an increased risk of disease with increasing exposure or a decline in risk after cessation of exposure, and specificity of an effect. Those characteristics all strengthen the likelihood that an association seen in an epidemiologic study is a causal effect. Inferences from epidemiologic studies, however, are often limited to population or ecologic associations because of a lack of individual exposure information. Exposures might not be controlled in epidemiologic studies, and in some cases there is large uncertainty in the assessment of exposure. To assess explanations other than causality, one must bring together evidence from different studies and apply well-established criteria, which have been refined over more than a century (Evans 1976; Hill 1965; Susser 1973, 1977, 1988, 1991; Wegman et al. 1997). For a recent discussion of those criteria, a discussion is offered in the 2004 report of the US Surgeon General (Office of the Surgeon General-HHS 2004). The strengths and limitations of the various epidemiologic designs, the issues to be considered in assessing epidemiologic studies, and the outcomes measured in the studies are discussed in Appendix C.
By examining numerous epidemiologic studies, the committee addressed the question, “Does the available evidence support a causal relationship or an association between exposure to a specific agent and a health outcome?” An association between a specific agent and a specific health outcome does not mean that exposure to the agent invariably results in the health outcome or that all cases of the outcome result from exposure. Such complete correspondence between
agent and disease is the exception in large populations (IOM 1994b). The committee evaluated the data and based its conclusions on the strength and coherence of the data in the selected epidemiologic studies that met its inclusion criteria.
The committee’s next step, after securing the full text of about 800 epidemiologic studies, was to determine which studies would be included in the review as primary or support studies. For a study to be included in the committee’s review, it had to meet these criteria: methodologic rigor, identification of class or agent, specificity of health outcome, an exposure assessment, and in some cases an exposure-free interval. Studies that met the committee’s criteria are referred to as primary studies. For relevance to the Gulf War veterans, the committee focused on long-term health outcomes that persist after exposure ceases.
The study had to be a published in a peer-reviewed journal, had to include details of its methodology, had to include a control or reference group, had to have the statistical power to detect effects, and had to include reasonable adjustment for confounders. Case studies and case series were generally excluded from the committee’s consideration (see Appendix C).
Identification of Class or Agent
The study had to identify fuels, combustion products, or propellants as specified in the legislation. Because it is more difficult to draw conclusions on specific agents in studies of multiple chemical exposures, studies of this type were not considered primary. If agents were not specifically identified, the study would have been included if it was of an occupation that involved a fuel or combustion product exposure similar to veterans’ presumed exposures in the Persian Gulf.
Specificity of Outcome
The study had to specify a distinct outcome rather than a nonspecific group of health outcomes. Studies of broad disease categories (for example, diseases of the nervous system) were not considered as primary studies. Lack of specificity occurs primarily in mortality studies that examine all-cause mortality (such as deaths from all nervous system diseases) as opposed to cause-specific mortality (such as from Parkinson’s disease). All-cause mortality studies were excluded unless they analyzed specific health outcomes.
Exposure Assessment and Exposure-free Interval for Reversible Effects
The committee preferred studies that had an independent assessment of exposure rather than self-reported exposure. For example, studies that used assessment by an industrial hygienist or with a job-exposure matrix (JEM) were weighted more heavily by the committee.
To be relevant to Gulf War veterans, a study had to examine long-term rather than short-term outcomes. For some outcomes (for example, dermatologic, neurologic, and respiratory), long-term effects can be determined only after an exposure-free interval of weeks to months
before evaluation of study subjects. The committee required an exposure-free interval specifically for effects that might be reversible (such as headache, light-headedness, poor coordination, rash, or cough) but not for irreversible effects (such as cancer).
The committee gave less weight to ecologic or toxicologic studies. Toxicologic studies had a small role in the committee’s assessment of association between the putative agents and health outcomes. Like previous committees, this one used evidence from toxicologic studies to assess biologic plausibility in support of epidemiologic data rather than as part of the weight of evidence to determine the likelihood that an exposure to a specific agent causes a long-term outcome. That is because toxicologic studies can inform about disease processes (for example, cancer) but are less informative about specific diseases (for example, esophageal cancer).
Studies that the committee might exclude or consider as support (that is, they carry less weight than primary studies) are studies of self-reported exposure, multiple exposure, or exposure to specific agents that cannot be assessed; studies whose outcomes are considered “subclinical” (that is, of altered functioning consistent with later development of a diagnosis but without clear predictive validity); studies with a lack of specificity of outcomes (for example, those with a broad range of International Classification of Disease (ICD) codes that refer to all diseases of the respiratory or nervous system); and studies without an exposure-free interval for reversible effects.
CONSIDERATIONS IN ASSESSING THE STRENGTH OF EVIDENCE
The committee’s process of reaching conclusions about the various agents and their potential for adverse health outcomes was collective and interactive. Once a study was included in this review because it met the committee’s criteria, there were several considerations in assessing the strength of associations. They were patterned after those introduced by Hill (1971) and include strength of the evidence of an association, presence of a dose-response relationship, presence of a temporal relationship, consistency of the association; specificity of the association; and biologic plausibility.
Strength of Evidence of an Association
The strength of an association is usually expressed as the magnitude of the measure of effect, for example, relative risk or odds ratio. Generally, the higher the relative risk, the greater the likelihood that the exposure-disease association is causal and the lower the likelihood that it is due to undetected error, bias, or confounding (discussed below). Measures of statistical significance, such as p values, are not indicators of the strength of an association. Small increases in relative risks that are consistent among studies, however, might be evidence of an association, and some forms of extreme bias or confounding can produce a high relative risk. The statistical power of a study was important for it had to be able to detect effects of a certain magnitude, especially important for negative results.
Thus, studies were evaluated for their rigor and analyses. Greater weight was given to studies that were conducted in a manner that reduced sources of error, bias, and confounding. More weight was given to studies in which there was independent assessment of exposure, either
based on knowledge of a specific industry; if specific exposures were associated with an occupational title or industry, such as when a JEM was used to categorize exposure; or if an assessment was made by an industrial hygienist. Studies that had self-reported exposures were considered, at best, support studies.
The existence of a dose-response relationship—that is, an increased strength of association with increasing intensity or duration of exposure or other appropriate relation—strengthens an inference that an association is real. However, the lack of an apparent dose-response relationship does not rule out an association, as in the case of a threshold exposure beyond which the relative risk of disease remains constant and high. If the relative degree of exposure among several studies can be determined, indirect evidence of a dose-response relationship may exist. For example, if studies of presumably low-exposure cohorts show only mild increases in risk whereas studies of presumably high-exposure cohorts show larger increases in risk, the pattern would be consistent with a dose-response relationship.
If an observed association is real, exposure must precede the onset of disease by at least the duration of disease induction. The committee considered whether a disease occurred within a period after exposure to the putative agent that was consistent with current understanding of the natural history of the disease. The committee interpreted the lack of an appropriate time sequence as evidence against association but recognized that insufficient knowledge about the natural history and pathogenesis of many of the diseases under review limited the utility of this consideration.
Consistency of Association
A consistent association requires that the association be found regularly in a variety of studies, for example, in more than one study population and with different study methods. However, consistency alone is not sufficient evidence of an association. The committee considered findings that were consistent in direction among different categories of studies to be supportive of an association. It did not require exactly the same magnitude of association in different populations to conclude that there was a consistent association. A consistent association could occur when the results of most studies were positive and the differences in measured effects were within the range expected on the basis of sampling error, selection bias, misclassification, confounding, and differences in dose.
Thus, for a health outcome to be considered associated with an agent there had to be corroboration, that is, replication of findings among studies and populations and under relevant conditions. The degree to which an effect could be consistently reproduced gave the committee confidence that they were observing a true effect.
Specificity of Association
Specificity of association is the degree to which exposure to a given agent predicts the frequency or magnitude of a particular outcome. A positive finding seems more strongly supported when the association between the exposure and the health outcome is specific to both
than when the association is nonspecific to the exposure or the health outcome. The committee recognized, however, that perfect specificity could not be expected, given the multifactorial etiology of many of the diseases under examination. The committee also recognized the possibility that many of the agents under study were associated with a broad array of diseases.
The committee members did, however, require that specific outcomes be identified. Studies that provided general outcomes (for example, diseases of the nervous system) or outcomes identified by broad ranges of ICD codes (for example, codes referring to all diseases of the respiratory system) were considered, at best, supportive of an outcome.
Biologic plausibility reflects knowledge of the biologic mechanism by which an agent can lead to a health outcome. That knowledge comes through mechanism-of-action or other studies in pharmacology, toxicology, microbiology, physiology, and other fields—typically in studies of animals. Biologic plausibility is often difficult to establish or may not be known when an association is first documented. The committee considered such factors as evidence from animal and human studies that exposure to an agent is associated with diseases known to have biologic mechanisms similar to that of the disease in question, evidence that some outcomes are commonly associated with occupational or environmental exposures, and knowledge of routes of exposure, storage in the body, and excretion that suggest that a disease is more likely to occur in some organs than in others. Biologic plausibility was required by the committee only in drawing a conclusion of “sufficient evidence of a causal association” (see below); for the other categories of association, it is not necessary to demonstrate a biologically plausible mechanism. The extent to which all the data are consistent and subject to a biologically plausible mechanism influences the weight attached to the results of a study, as does an indication that the mechanism is similar in the animal(s) under study and humans.
The committee carefully considered whether alternative explanations or errors—such as bias and chance—might account for the finding of an association.
Bias refers to systematic or nonrandom error. Bias causes an observed value to deviate from the true value. It can weaken an association or generate a spurious association. Because all studies are susceptible to bias, a goal is to minimize bias or to adjust the observed value of an association by using special methods to correct for bias. Three kinds of bias may compromise the results of an investigation: selection bias, information bias, and confounding.
Selection bias occurs when the participants in a study are not representative of the general population. The study participants differ from nonparticipants in characteristics that cannot be observed, that is, the groups differ in measured or unmeasured baseline characteristics because of how participants were selected or assigned.
Information bias results from the manner in which data are collected and can result in measurement errors, imprecise measurement, and misdiagnosis. Those types of errors may be uniform in an entire study population or may affect some parts of the population more than others. Bias may result from misclassification of study subjects with respect to the outcome
variable. Other common sources of information bias are the inability of study subjects to recall accurately the circumstances of their exposure (recall bias) and the likelihood that one group more frequently reports what it remembers than another group (reporting bias). Information bias is especially harmful in interpreting study results when it affects one comparison group more than another.
Confounding occurs when a variable or characteristic otherwise known to be predictive of the outcome can account for part or all of an apparent association. A confounding variable is an uncontrolled variable that influences the outcome of a study to an unknown extent, making precise evaluation of the effects of the independent variable impossible. Carefully applied statistical adjustments can often control for or reduce the influence of a confounder.
Chance is a type of error that can lead to an apparent association between an exposure to an agent and a health effect when none is present. An apparent effect of an agent on a health outcome may be the result of random variation due to sampling in assembly of the study population rather than the result of exposure to the agent under study. Standard methods that use confidence intervals, for example, allow one to assess the role of chance variation due to sampling.
Thus, the committee’s final judgment is based on a balance between the strength of support of an association and the degree of exclusion of alternatives. The evaluation of evidence to reach conclusions about statistical associations goes beyond quantitative procedures, and several stages during the review required thoughtful consideration and judgment and could not always be accomplished by adherence to a prescribed formula.
The approach described here evolved throughout the process of review and was determined in important respects by the nature of the evidence, exposures, and health outcomes being examined. Both quantitative and qualitative aspects of the process were important to the overall review. Ultimately, the conclusions expressed in this report about causation are based on the committee’s collective judgment.
CATEGORIES OF ASSOCIATION
The committee classified the evidence of an association between exposure to a specific agent and a specific health outcome in five categories. The categories have been developed by previous IOM committees and also have been used to evaluate vaccine safety (IOM 1991 1994a), herbicides used in Vietnam (IOM 1994b, 1996, 1999, 2001, 2003b), and indoor pollutants related to asthma (IOM 2000a).
Sufficient Evidence of a Causal Association
Evidence is sufficient to conclude that there is a causal association between exposure to a specific agent and a specific health outcome in humans. The evidence is supported by experimental data and fulfills the guidelines for sufficient evidence of an association (below). The evidence must be biologically plausible and satisfy several of the guidelines used to assess causality, such as strength of association, dose-response relationship, consistency of association, and temporal relationship.
Sufficient Evidence of an Association
Evidence is sufficient to conclude that there is an association. That is, a consistent association has been observed between exposure to a specific agent and a specific health outcome in human studies in which chance and bias, including confounding, could be ruled out with reasonable confidence. For example, several high-quality1 studies report consistent positive associations, and the studies are sufficiently free of bias, and have adequate control for confounding.
Limited/Suggestive Evidence of an Association
Evidence is suggestive of an association between exposure to a specific agent and a specific health outcome, but the body of evidence is limited by the inability to rule out chance and bias, including confounding, with confidence. For example, at least one high-quality study reports a positive association that is sufficiently free of bias, and has adequate control for confounding. Other corroborating studies provide support for the association, but they were not sufficiently free of bias, including confounding. Alternatively, several studies of lower quality show consistent positive associations, and the results are probably not2 due to bias, including confounding.
Inadequate/Insufficient Evidence to Determine Whether an Association Exists
Evidence is of insufficient quantity, quality, or consistency to permit a conclusion regarding the existence of an association between exposure to a specific agent and a specific health outcome in humans.
Limited/Suggestive Evidence of No Association
Evidence is consistent in not showing a positive association between exposure to a specific agent and a specific health outcome after exposure of any magnitude. A conclusion of no association is inevitably limited to the conditions, magnitudes of exposure, and length of observation in the available studies. The possibility of a very small increase in risk after the exposure studied cannot be excluded.
The committee endeavored to express its judgment as clearly and precisely as the available data allowed, and it used the established categories of association from previous IOM studies because they have gained wide acceptance over more than a decade by Congress, government agencies, researchers, and veterans groups. The five categories describe different levels of association and sound a recurring theme: the validity of an association is likely to vary
with the extent to which the authors reduced common sources of error—chance variation and bias, including confounding—in drawing inferences. Accordingly, the criteria for each category express a degree of confidence based on the extent to which sources of error were reduced.
Ellwood JM. 1998. Critical Appraisal of Epidemiological Studies and Clinical Trials. 2nd Edition. Oxford: Oxford University Press.
Evans A. 1976. Causation and disease: The Henle-Koch postulates revisited. Yale Journal of Biology and Medicine 49(2):175–195.
Hill A. 1965. The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine 58:295–300.
Hill A. 1971. Principles of Medical Statistics. New York: Oxford University Press.
IARC (International Agency for Research Cancer). 2004. Tobacco Smoke and Involuntary Smoking. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. 83. Lyon, France: International Agency for Research on Cancer.
IOM (Institute of Medicine). 1991. Adverse Effects of Pertussis and Rubella Vaccines. Washington, DC: National Academy Press.
IOM. 1994a. Adverse Events Associated with Childhood Vaccines: Evidence Bearing on Causality. Washington, DC: National Academy Press.
IOM. 1994b. Veterans and Agent Orange: Health Effects of Herbicides Used in Vietnam. Washington, DC: National Academy Press.
IOM. 1996. Veterans and Agent Orange: Update 1996. Washington, DC: National Academy Press.
IOM. 1999. Veterans and Agent Orange: Update 1998. Washington, DC: National Academy Press.
IOM. 2000a. Clearing the Air: Asthma and Indoor Air Exposures. Washington, DC: National Academy Press.
IOM. 2000b. Gulf War and Health, Volume 1: Depleted Uranium, Sarin, Pyridostigmine Bromide, Vaccines. Washington, DC: National Academy Press.
IOM. 2001. Veterans and Agent Orange: Update 2000. Washington, DC: National Academy Press.
IOM. 2003a. Gulf War and Health, Volume 2: Insecticides and Solvents. Washington, DC: The National Academies Press.
IOM. 2003b. Veterans and Agent Orange: Update 2002. Washington, DC: The National Academies Press.
Office of the Surgeon General-HHS. 2004. The Health Consequences of Smoking: A Report of the Surgeon General. [Online]. Available: http://www.surgeongeneral.gov/library/smokingconsequences [accessed October 26, 2004].
Susser M. 1973. Causal Thinking in the Health Sciences: Concepts and Strategies of Epidemiology. New York: Oxford University Press.
Susser M. 1977. Judgment and causal inference: Criteria in epidemiologic studies. American Journal of Epidemiology 105(1):1–15.
Susser M. 1988. Falsification, verification, and causal inference in epidemiology: Reconsideration in the light of Sir Karl Popper’s philosophy. In: Rothman KJ, ed. Causal Inference. Chestnut Hill, MA: Epidemiology Resources. Pp. 33–58.
Susser M. 1991. What is a cause and how do we know one? A grammar for pragmatic epidemiology. American Journal of Epidemiology 133(7):635–648.
Wegman DH, Woods NF, Bailar JC. 1997. Invited commentary: How would we know a Gulf War syndrome if we saw one? American Journal of Epidemiology 146(9):704–711.