This chapter discusses the methodology used by the committee in formulating its conclusions about associations between exposure to biological and chemical agents and adverse health effects. It provides information about the types of evidence the committee reviewed, how the committee assessed the strength of the evidence, and the categories of evidence the committee used to summarize its findings. Further, the chapter includes a discussion of the issues involved in the assessment of Gulf War veterans’ exposures1 to the agents of concern.
The committee has undertaken a review of the scientific and medical literature on the following agents: depleted uranium (DU), pyridostigmine bromide (PB), sarin and cyclosarin, and the anthrax and botulinum toxoid vaccines to determine whether they might be associated with adverse health effects. Although many chemical and biological agents were present during and after the Gulf War conflict, the committee chose these agents because they were of particular concern to the veterans (see Chapter 1).2 For each agent, the committee determined—to the extent that available published scientific data permitted meaningful determinations—the strength of the evidence for associations between exposure to the putative agent and adverse health effects. Because of the general lack of exposure measurements in veterans (with some exceptions), the committee reviewed studies of other populations known to be exposed to the agents of concern. These include uranium processing workers, individuals who
may have been exposed to sarin as a result of terrorist activity, healthy volunteers (including military populations), and clinical populations (e.g., patients with myasthenia gravis treated with PB). Studying these groups allowed the committee to address the issue of whether the agents could be associated with adverse health outcomes.
In this chapter, the committee describes its approach to enable the reader to assess and interpret its findings and to assist those who may update the committee’s conclusions as new information becomes available. The details of the analysis related to each agent and conclusions concerning health effects appear in subsequent chapters and in the Executive Summary. The committee’s analyses have both quantitative and qualitative aspects, and reflect the evidence and the approach taken to evaluate that evidence. The methodology described in this chapter, draws from the work of previous Institute of Medicine (IOM) committees and their reports on vaccine safety (IOM, 1991, 1994a), herbicides used in Vietnam (IOM, 1994b, 1996, 1999), and indoor pollutants related to asthma (IOM, 2000). However, the conclusions in the current report depart from previous studies by distinguishing between transient and long-term health effects, and dose-related health outcomes as they are reported in the literature.
METHODS OF GATHERING AND EVALUATING THE EVIDENCE
The committee reviewed and evaluated studies from the scientific and medical literature that were identified by searches of bibliographic databases and other methods (see Appendix C). As noted, the committee did not limit its review to health effects reported by Gulf War veterans but studied all health outcomes reported in populations exposed to the agents of concern. By taking this broad and inclusive approach the committee intends to provide the Department of Veterans Affairs (VA) with a range of information about potential health outcomes for their consideration as they develop a compensation program for Gulf War veterans. Further, studies of nonveteran populations are also important for understanding those health effects with a long latency period between the time of exposure to the agent and the health effect (e.g., cancer) since long-term effects might not yet be manifest in Gulf War veterans, yet could be important for compensation decisions later in life.
The committee adopted a policy of using only peer-reviewed published literature as the basis for its conclusions. Publications that were not peer-reviewed had no evidentiary value for the committee (i.e., they were not used as evidence for arriving at the committee’s conclusions about the degree of association between exposure to a particular agent and adverse health effects). The process of peer review by fellow professionals, which is one of the hallmarks of modern science, ensures high standards of quality but does not guarantee the validity or generalizability of a study. Accordingly, committee members read each article critically. In some instances, non-peer-reviewed publications provided background information for the committee and raised issues that required further
research. The committee, however, did not collect original data, nor did it perform any secondary data analysis.
In its evaluation of the peer-reviewed literature, the committee considered several important issues, including the quality and relevance of the studies; issues of error, bias, and confounding; the diverse nature of the evidence and the research; and the disparate populations being studied. Additionally, for many of the agents being studied (e.g., vaccines, PB, sarin) there were few epidemiologic studies, and much of the evidence was in the form of case studies and case reports, forms of publication that often do not provide sufficient evidence upon which to base conclusions about the statistical associations between illnesses and the agents under consideration.
TYPES OF EVIDENCE
The scientific literature on the putative agents varied from agent to agent in the number and type of published studies. For most agents, the epidemiological evidence was sparse. For only one agent was there a solid base of epidemiological studies from which the committee could draw conclusions. The extensive occupational studies of uranium workers provided a statistical foundation from which the committee could assess the strength of the association between uranium and adverse health effects. For the other agents, the committee had to rely primarily on a variety of human studies that had not been designed specifically to study adverse health effects related to the putative agents. Studies of patients with myasthenia gravis, for example, were designed primarily to examine the treatment effectiveness of PB, and had not been designed as robust epidemiologic studies (i.e., they often examined small populations or did not have control groups). The committee, however, adopted a uniform approach for evaluating the varied types of available evidence as reflected in the literature on each of the putative agents.
Animal and Other Nonhuman Studies
Studies of laboratory animals and other nonhuman systems are essential to understanding mechanisms of action, biologic plausibility, and providing information about possible health effects when experimental research in humans is not ethically or practically possible (Cohrssen and Covello, 1989; NRC, 1991). Such studies permit a potentially toxic agent to be introduced under conditions controlled by the researcher—such as dose,3 duration, and route of exposure—to probe health effects on many body systems. Nonhuman studies are also a valuable complement to human studies of genetic susceptibility. While nonhuman
studies often focus on one agent at a time, they more easily enable the study of chemical mixtures and their potential interactions.
Research on health effects of toxic substance includes animal studies that characterize absorption, distribution, metabolism, elimination, and excretion. Animal studies may examine acute (short-term) exposures or chronic (long-term) exposures. Animal research may focus on the mechanism of action (i.e., how the toxin exerts its deleterious effects at the cellular and molecular levels). Mechanism-of-action (or mechanistic) studies encompass a range of laboratory approaches with whole animals and in vitro systems using tissues or cells from humans or animals. Also, structure–activity relationships, in which comparisons are made between the molecular structure and chemical and physical properties of a potential toxin versus a known toxin, are an important source of hypotheses about mechanism of action.
In carrying out its charge, the committee used animal and other nonhuman studies in several ways, particularly as a marker for health effects that might be important for humans. If an agent, for example, was absorbed and deposited in specific tissues or organs (e.g., uranium deposition in bone and kidney), the committee looked especially closely for possible abnormalities at these sites in human studies.
One of the problems with animal studies, however, is the difficulty of finding animal models to study symptoms that relate to uniquely human attributes, such as cognition, purposive behavior, and the perception of pain. With the exception of fatigue, many symptoms reported by veterans (e.g., headache, muscle or joint pain) are difficult to study in standard neurotoxicological tests in animals (OTA, 1990).
For its evaluation and categorization of the degree of association between each exposure and a human health effect, however, the committee only used evidence from human studies. Nevertheless, the committee did use nonhuman studies as the basis for judgments about biologic plausibility, which is one of the criteria for establishing causation (see below).
Epidemiology concerns itself with the relationship of various factors and conditions that determine the frequency and distribution of an infectious process, a disease, or a physiological state in human populations (Lilienfeld, 1978). Its focus on populations distinguishes it from other medical disciplines. Epidemiologic studies characterize the relationship between the agent, the environment, and the host and are useful for generating and testing hypotheses with respect to the association between exposure to an agent and health or disease. The following section describes the major types of epidemiologic studies considered by the committee.
Cohort studies. The cohort, or longitudinal, study is an epidemiologic study that follows a defined group, or cohort, over time. It can test hypotheses about whether an exposure to a specific agent is related to the development of disease and can examine multiple disease outcomes that may be associated with exposure to a given agent. A cohort study starts with people who are free of a disease (or other outcome) and classifies them according to whether or not they have been exposed to the agent under study. A cohort study compares health outcomes in individuals who have been exposed to the agent in question with those without the exposure. Such a comparison can be used to estimate a risk difference or a relative risk, two statistics that measure association. The risk difference is the rate of disease in exposed persons minus the rate in unexposed subjects. It represents the absolute number of extra cases of disease associated with the exposure. The relative risk or risk ratio is determined by dividing the rate of developing the disease in the exposed group by the rate in the nonexposed group. A relative risk greater than 1 suggests a positive association between exposure and disease onset. The higher the relative risk, the stronger is the association.
One major advantage of a cohort study is the ability of the investigator to control the classification of subjects at the beginning of the study. This classification in prospective cohort studies is not influenced by the presence of disease because the disease has yet to occur, which reduces an important source of potential bias known as selection bias (see later discussion). A cohort study design also gives the investigator the advantage of measuring and correcting another potential source of bias—confounding. As explained in the next section, when it is possible to measure a confounding factor,4 the investigator can apply statistical methods to minimize its influence on the results. Another advantage of a cohort study is that it is possible to calculate absolute rates of disease incidence.5 A final advantage, especially over cross-sectional studies (discussed below), is that it may be possible to adjust each subject’s follow-up health status for baseline health status so that the person acts as his or her own control, that may reduce a source of variation and increase the power to detect effects. The disadvantages of cohort studies are high costs as a result of a large study population and prolonged periods of follow-up (especially if the disease is rare), attrition of study subjects, and delay in obtaining results.
A prospective cohort study selects subjects on the basis of exposure (or lack of it) and follows the cohort into the future to determine the rate at which the disease (or other health outcome) develops. A retrospective (or historical) cohort study differs from a prospective study in terms of temporal direction; the
investigator traces back in time to classify past exposures in the cohort and then tracks the cohort forward in time to ascertain the rate of disease. Retrospective cohort studies are commonly performed in occupational health. They often focus on disease mortality rates because of the relative ease of determining vital status of individuals and the availability of death certificates to determine the cause of death.
For comparison purposes, cohort studies often use general population mortality rates (age, sex, race, time, and cause specific) because it may be difficult to identify a suitable control group of unexposed workers. The observed number of deaths among workers (from a specific cause such as lung cancer) is compared with the expected number of deaths. The expected number is calculated by taking the mortality rate in the general population and multiplying it by the number of person-years6 of follow-up for the workers. The ratio of observed to expected deaths (which, by convention, is often multiplied by 100) produces a standardized mortality ratio (SMR). An SMR greater than 100 generally suggests an elevated risk of dying in the exposed group. Further, as discussed below many cohort studies refine their measures of health outcomes by using an internal comparison group, which may differ in exposure level but may otherwise be more similar to the cohort than the general population. Many of the studies of uranium workers are retrospective cohort studies (see Chapter 4).
The major problem with using general population rates for comparison with occupational cohorts is the “healthy-worker effect” (see Chapter 2; Monson, 1990), which arises when an employed population experiences a lower mortality rate than the general population, which consists of a mix of healthy and unhealthy people. The healthy-worker effect is usually due to lower cardiovascular and trauma deaths. A population with elevated external traumatic causes of death (e.g., Gulf War veterans), however, may be different from many occupational populations.
In calculating the SMR, the denominator (expected deaths) is derived from general population figures rather than from an otherwise comparable group of unexposed workers (which may be unavailable). The “artificially” higher denominator for expected deaths in the general population lowers the SMR, thereby underestimating the strength of the association between exposure to the agent and the cause of death. In other words, the healthy-worker effect introduces a bias that diminishes the true disease–exposure relationship.
To counter the influence of the healthy-worker effect, some studies divide the worker population into different groups, based on their levels of exposure to the agent being studied. Searching for dose–response relationships within the worker population itself is a way of reducing the potential bias introduced by the use of population controls. The problem, of course, is that measurements of dose may be imprecise or unavailable, particularly if the exposures occurred decades
ago. Consequently, epidemiologists often rely on job classification as a surrogate means of documenting dose. Reliance on job classification introduces the possibility of misclassification bias because the classification may not be a good proxy for the actual exposure or dose. Another problem is incompleteness of records, not only in determining job classification but especially in determining whether potential confounding exposures, such as cigarette smoking by individual workers, are present. Bias, introduced by misclassification and confounding, can systematically alter study results by diluting or enhancing associations (see discussion later in this chapter).
Case-control studies. The case-control study is useful for testing hypotheses about the relationships between exposure to specific agents and disease. It is especially useful for studying the etiology of rare diseases. When health outcomes are infrequent or rare, longitudinal or cross-sectional studies must be large enough and of sufficiently long duration to accumulate enough adverse events to accurately estimate the risk of a particular agent. In case-control studies, subjects (or cases) are selected on the basis of having a disease; controls are selected on the basis of not having the disease. Cases and controls are then asked about their past exposures to specific agents. Cases and controls are matched with regard to characteristics such as age, gender, and socioeconomic status, so as to eliminate these characteristics as the cause of observed differences in past exposure. The odds of exposure to the agent among the cases are then compared with the odds of exposure among controls. The comparison generates an odds ratio,7 a statistic that depicts the odds of having a disease among those exposed to the agent of concern relative to the odds of the disease for an unexposed comparison group. An odds ratio greater than 1 indicates that there is a potential association between exposure to the agent and the disease. The greater the odds ratio, the greater is the association. Thus, in a case-control study, subjects are selected on the basis of disease presence; prior exposure is then ascertained.
Case-control studies have the advantages of ease, speed, and relatively low cost. They are also advantageous for their ability to probe multiple exposures or risk factors. However, case-control studies are vulnerable to several types of bias, including recall bias. Other problems are identifying representative groups of cases, choosing suitable controls, and collecting comparable information about exposures on both cases and controls. These problems may lead to unidentified confounding variables that differentially influence the selection of cases or control subjects or the detection of exposure. For these reasons case-control studies are often the first, yet not the definitive, approach to testing a hypothesis.
Cross-sectional studies. In a cross-sectional study, the population of interest is surveyed at one point in time. Information is collected simultaneously about their health conditions and exposures to various agents, either present or
past. The selection of individuals into the study—unlike that for cohort and case-control studies—is independent of both the exposure to the agent under study and disease characteristics. Cross-sectional studies seek to uncover potential associations between exposure to specific agents and development of disease. They may compare disease or symptom rates between groups with and without the exposure to the specific agent or may compare exposure to the specific agent between groups with and without the disease. Although cross-sectional studies need not have control groups, studies with control groups are more methodologically sound. Several health studies of Gulf War veterans are controlled cross-sectional surveys that compare a sample of veterans previously deployed to the Gulf War with a sample of veterans who served during the same period but were not deployed to the Gulf War (see Chapter 2).
Cross-sectional surveys are easy to perform and inexpensive to implement relative to cohort studies. Cross-sectional surveys can identify the prevalence of diseases and exposures in a defined population. They are useful for generating hypotheses; however, they are much less useful for determining cause-and-effect relationships, because disease and exposure data are collected simultaneously and may be self-reported (Monson, 1990). It may also be difficult to determine the temporal sequence of exposure and symptoms or disease.
Experimental studies in humans are the foremost means of establishing causal associations between exposures and human health outcomes. Experimental studies are used most frequently in the evaluation of the safety and efficacy of medications, surgical practices, biological products, vaccines, and preventive interventions. In an experiment, the investigator has control over assigning the agent to be studied and recording the outcome. Two key features of experimental studies are prospective design and use of a control group. Randomized controlled trials are considered the gold standard in experimental studies.
In randomized controlled trials, each subject has a known, often equal, probability of assignment to either the treatment or the control group. Large randomized controlled trials are designed to have all possible confounding variables occur with equal frequency in the intervention and control groups. Blinding may be another aspect of randomized controlled trials.8 Blinding refers to shielding subjects or investigators from knowledge of whether the subjects were assigned to the treatment or the control group. Blinding is most readily accomplished when subjects in the control group receive a placebo. When both subjects and investigators are unaware of patient assignment, the study is said to be “double-blind.” The objective of blinding is to reduce bias introduced by patients’ and
investigators’ attitudes and expectations for study outcomes. In a study of the anthrax vaccine by Brachman and colleagues (1962), workers in four goat hair processing mills were randomized to receive the vaccine or a placebo and then followed to assess the vaccine’s safety and efficacy (see Chapter 7).
The value of randomized controlled trials has been so convincingly demonstrated that they are required for ensuring the safety and efficacy of all new medications introduced into the market in the United States (FDA, 1998). Estimates are that 300,000 randomized controlled trials have been carried out over the past 50 years (Randal, 1999). The main drawbacks of randomized controlled trials are their expense, the time needed for completion, and the common practice of systematically excluding many groups of patients so that the results apply to only a small fraction of potentially eligible patients.
Experimental studies are most often performed for therapeutic agents, where the only expected result is a good outcome or no effect for the subject; rarely are adverse health effects expected. Ethical considerations limit experimental studies of toxic compounds and adverse health outcomes, and guidelines for informed consent and protection of human subjects are strictly implemented (NIH, 1991).
Case Reports and Case Series
A case report is generally a detailed description of a patient’s illness reported by a clinician who may suspect that the illness is the result of exposure to a specific biological or chemical agent. A case series refers to a group of patients with the same or similar disease who experienced identical or similar exposures to a specific agent. Neither case reports nor case series are formal epidemiologic studies, but both are means for generating hypotheses about exposure and disease relationships. For Gulf War veterans, registry programs established by the VA and the Department of Defense (DoD) represent a type of voluntary case series. Any veteran may come forward to receive a clinical examination and a referral for treatment (see Chapter 2). Through documentation of veterans’ symptoms and diagnoses, these registries have been valuable in generating hypotheses, yet they are not designed for hypothesis testing or for establishing the prevalence of disease or specific exposures among Gulf War veterans.
The value of case reports and case series is that they can document possible associations between an environmental exposure and a particular health outcome. In some situations, they may be useful in suggesting causal relationships if the disease is rare and has a close temporal relationship to the exposure (Kramer and Lane, 1992). However, case reports and case series do not have control groups. Because case series are not population based, many cases caused by an exposure go unreported, and the prevalence of cases may be lower than in the population at large. Further, the cases may not have been caused by exposure to the specific agent (false-positive results).
CONSIDERATIONS IN ASSESSING THE STRENGTH OF THE EVIDENCE
The committee’s process of reaching conclusions about the various agents and their potential for adverse health outcomes was collective and interactive. As the committee reviewed the literature on the agents under study, it took into consideration a variety of criteria (discussed below) to help evaluate the strength of the evidence for or against an association between exposure to the agent under study and adverse health outcomes. The committee assessed the evidence by considering the six general criteria (strength of association, dose–response relationship, consistency of association, temporal relationship, specificity of association, and biological plausibility) patterned after those introduced by Hill (1971). The committee also assessed the extent for potential errors in the study due to a number of factors including chance and bias (discussed later in the chapter).
Strength of Association
The strength of association is usually expressed as the magnitude of the measure of effect, for example, relative risk or odds ratio. Generally, the higher the relative risk, the stronger the association between the agent and the health effect. Moreover, the greater is the likelihood that the agent–health effect association is causal (i.e., the less likely it is to be due to undetected error, bias, or confounding). Small increases in relative risk that are consistent across a number of studies may provide evidence of an association (IOM, 1994b).
A dose–response relationship refers to the finding of a greater health effect (response) with higher doses of an agent. A steep dose–response relationship strengthens the inference that an association is real. Generally, in a strong dose– response relationship, cohorts exposed to presumably low doses show only mild elevations in risk, whereas cohorts with exposure to presumably high doses show more extreme elevations in risk. However, the absence of such a relationship does not discount the possibility of an association. For example, a dose–response relationship would go undetected if the doses were all below a threshold level of exposure, beyond which the relative risk of disease increased steeply.
Many physiologic and pharmacologic actions have thresholds that result in a dose–response curve that is curvilinear rather than linear in shape. Furthermore, a particular agent may produce an effect after a brief latency or after decades (e.g., asbestos-induced mesothelioma). Other mechanisms that may alter the strictly linear dose–response curve are chemical interactions involving synergism and antagonism (e.g., a worker with significant asbestos exposure has a risk about five times greater for developing lung cancer than a person without asbestos exposure); reversibility (e.g., some toxic events are reversible as the
body has the potential for self repair); and susceptibility or resistance to a particular agent (Brooks et al., 1995).
Consistency of Association
A consistent association is similar in magnitude and direction across several studies representing different populations, locales, and times (Hill, 1965). The greater the number of studies with the same results, the more consistent is the association and the greater is the likelihood of a true association. However, consistency alone is not sufficient evidence of an association. The committee considered findings that were consistent in direction across different categories of studies to be supportive of an association. The committee did not require exactly the same magnitude of association in different populations to conclude that there was a consistent association. A consistent positive association could occur when the results of most studies were positive and the differences in measured effects were within the range expected on the basis of sampling error, selection bias, misclassification, confounding, and difference in actual dose levels (IOM, 1994b).
The finding of an agent–disease association begins the process of trying to decide whether the agent is a cause, correlate, or consequence of the disease. Determining causality requires that exposure to the agent precede the onset of the health outcome by at least the duration of disease induction. If, in a cohort study, exposure to the agent occurs after the appearance of the health outcome, the agent could not have caused that outcome. Establishing a temporal relationship is often difficult, especially with health outcomes that have long induction periods, such as cancer. The committee interpreted the lack of an appropriate time sequence as evidence against association, but recognized that insufficient knowledge of the natural history and pathogenesis of many of the health outcomes under review limited the utility of this criterion (IOM, 1994b).
Specificity of Association
Specificity refers to the unique association between exposure to a particular agent and a health outcome (i.e., the health outcome never occurs in the absence of the agent). Two examples of highly specific associations are the pathologically distinctive tumors mesothelioma of the lung and angiosarcoma of the liver in workers exposed to asbestos and vinyl chloride, respectively. The committee recognized, however, that perfect specificity is unlikely given the multifactorial etiology of many of the health outcomes noted in this study. Additionally, the committee recognized that the agents under review might be associated with a broad spectrum of health outcomes.
Biological plausibility reflects knowledge of the biological mechanism by which an agent can lead to a health outcome. This knowledge comes through mechanism-of-action or other studies from pharmacology, toxicology, microbiology, and physiology, among other fields, typically in studies of animals. Biological plausibility is often difficult to establish or may not be known at the time an association is first documented. The committee considered factors such as evidence in animals and humans that exposure to the agent is associated with diseases known to have similar biological mechanisms as the disease in question; evidence that certain outcomes are commonly associated with occupational or environmental exposures; and knowledge of routes of exposure, storage in the body, and excretion that would suggest the disease is more likely to occur in some organs rather than others (IOM, 1994b).
It is also important to consider whether alternative explanations might account for the finding of an association. The types of studies described earlier in this chapter are often used to demonstrate associations between exposures to particular agents and health outcomes. The validity of an association, however, can be challenged by error due to chance, bias, and confounding in assembling the study populations (which are more or less representative samples from the entire relevant populations). Since these sources of error may represent alternative explanations for an observed association, they must be ruled out to the extent possible. These sources of error are important for interpreting the strength and limitations of any given study and for understanding the criteria used by the committee to evaluate the strength of the evidence for or against associations.
Chance is a type of error that can lead to an apparent association between an exposure to an agent and a health effect when none is actually present. An apparent effect of an agent on a health outcome may be the result of random variation due to sampling when assembling the study populations, rather than to the agent under study. Standard methods using confidence intervals or tests of statistical significance allow one to assess the role of chance variation due to sampling. A statistically significant finding is one in which there is little chance (usually less than 5 percent) of observing an apparent association when none really exists. A confidence interval (for a relative risk, odds ratio, or other measure of association) is centered at the estimate of the measure of interest and its range depends on the amount of variability in the sample. Although it is possible to calculate a confidence interval for any coverage probability, a 95 percent confidence interval is commonly used. If 95 percent confidence intervals were constructed for repetitions of the experiment (i.e., many different samples were drawn from the population of interest under the same circumstances), 95 percent of these intervals would contain the true value.
Bias refers to systematic or nonrandom error. Bias causes the observed value to deviate from the true value. It can weaken the association or generate a spurious association. Because all studies are susceptible to bias, a key goal is to minimize bias or to adjust the observed value of the association using special methods to correct for bias. There are three general sources of error that may compromise the results of an investigation, including selection bias, confounding, and information bias.
Selection bias can occur in the recruitment of study subjects to a cohort when the study and control groups differ from each other by a factor that is likely to affect the results. Thus, the observed cohort differs from the population at large by some unmeasured variable that could predict the outcome. Non-population-based cross-sectional studies are particularly vulnerable to selection bias.
Confounding occurs when a variable or characteristic can account for part or all of an apparent association. For example, if inhaled uranium particles appear to be associated with the development of lung cancer, cigarette smoking may confound this outcome if the cohort exposed to uranium had more members that smoked than the unexposed cohort. Confounding variables can be either measured or unmeasured. With measured confounders, carefully applied statistical adjustments can control for or reduce their influence. With unmeasured confounders, no adjustment is possible. With studies of uranium miners, for example, it is usually not possible to adjust for the role of cigarette smoking by individuals, since the employee records of decades ago seldom contained information about smoking.
Information bias results from the way in which the data are collected, for example, from measurement errors, imprecise measurement, and misdiagnosis. These types of errors may be uniform across the entire study population or may affect some parts of the population more than others. Bias may result from misclassification of study subjects with respect to the outcome variable. Other common sources of information bias are due to the inability of study subjects to accurately recall the circumstances of the exposure (recall bias) or to the likelihood that one group more frequently reports what it remembers than another group (reporting bias). Information bias is especially pernicious when it affects one comparison group more than another.
SUMMARY OF THE EVIDENCE
As seen below in the discussion of categories of association, the committee distinguishes between “sufficient evidence of a causal relationship” and “sufficient evidence of an association.” Thus, before describing the categories used to summarize its findings, the committee provides a brief discussion of the concepts of causation and association.
Understanding Causation and Association
A principal objective of epidemiology is to understand whether exposures to specific agents are associated with disease or other health outcomes and, with additional available information, to decide whether such associations are causal. Although they are frequently used synonymously, the terms “association” and “causation” have distinct meanings.
Epidemiologic studies can establish statistical associations between exposure to specific agents and health effects. In the types of epidemiologic studies described earlier in this chapter, the degree of an association is often measured by relative risks, odds ratios, and SMRs. Epidemiologic studies find different degrees of association, depending on the magnitude of the relative risk, odds ratio, or SMR, and its variability and on the ability to exclude or reduce sources of error. To conclude that an association exists, it is necessary for an agent to occur together with the health outcome more frequently than expected by chance alone. Further, it is almost always necessary to find that the effect occurs consistently in several studies. Epidemiologists seldom consider one study, taken alone, sufficient to establish an association; rather, it is necessary to replicate the findings in other studies in order to draw conclusions about the association. Results from separate studies sometimes conflict with one another. It is sometimes possible to attribute discordant study results to characteristics such as the soundness of study design, the quality of execution, and the influence of different forms of error and bias. Studies that result in a statistically significant measure of association account for the role of chance in producing the observed result. When the measure of association does not show a statistically significant effect, it is important to consider the size of the sample and whether the study had the power to detect a rare but important effect.
Study designs differ in their ability to provide a valid estimate of an association (Ellwood, 1998). Randomized controlled trials are the most robust type of evidence, whereas cohort or case-control studies are more susceptible to chance, bias, and confounding. Case series and case reports carry the least weight, but may be the only information available, especially for an extremely rare event (e.g., a hypersensitivity reaction). For most of the agents reviewed in this report, the committee had to rely on case series and case reports because more robust epidemiologic studies were not available.
Determining whether a given statistical association rises to the level of causation requires inference (Hill, 1971). In order to infer a causal association, one must bring together evidence from different studies and apply well-established criteria that have been refined over more than a century (Hill, 1971; Evans, 1976; Wegman et al., 1997). The criteria for inferring a causal relationship are strength of association, dose–response relationship, consistency of association, temporal relationship, specificity of association, and biological plausibility (as discussed above). Strictly speaking, assessing causality was not within the charge of this committee, but the criteria for causality were helpful as the com-
mittee evaluated the strength of the evidence for or against associations between health effects and exposure to the agents being studied.
Categories of Association
The committee used five previously established categories to classify the evidence for association between exposure to a specific agent and a health outcome. The categories closely resemble those used by several IOM committees that evaluated vaccine safety (IOM, 1991, 1994a), herbicides used in Vietnam (IOM, 1994b, 1996, 1999), and indoor pollutants related to asthma (IOM, 2000). Although the categories imply a statistical association, the committee had sufficient epidemiologic evidence to examine statistical associations for only one of the agents under study (i.e., depleted uranium), there was very limited epidemiologic evidence for the other agents examined (i.e., sarin, pyridostigmine bromide, and anthrax and botulinum toxoid vaccines). Thus, the committee based its conclusions on the strength and coherence of the data in the available studies. In many cases, these data distinguished differences between transient and long-term health outcomes related to the dose of the agent. Based on the literature, it became incumbent on the committee to similarly specify the differences between dose levels and the nature of the health outcomes. This approach led the committee to reach conclusions about long- and short-term health effects, as well as health outcomes related to the dose of the putative agents. The final conclusions expressed in Chapters 4–7 represent the committee’s collective judgment. The committee endeavored to express its judgments as clearly and precisely as the available data allowed. The committee used the established categories of association from previous IOM studies, because they have gained wide acceptance for more than a decade by Congress, government agencies, researchers, and veteran groups.
Sufficient Evidence of a Causal Relationship. Evidence is sufficient to conclude that a causal relationship exists between the exposure to a specific agent and a health outcome in humans. The evidence fulfills the criteria for sufficient evidence of an association (below) and satisfies several of the criteria used to assess causality: strength of association, dose–response relationship, consistency of association, temporal relationship, specificity of association, and biological plausibility.
Sufficient Evidence of an Association. Evidence is sufficient to conclude that there is a positive association. That is, a positive association has been observed between an exposure to a specific agent and a health outcome in human studies in which chance, bias, and confounding could be ruled out with reasonable confidence.
Limited/Suggestive Evidence of an Association. Evidence is suggestive of an association between exposure to a specific agent and a health outcome in humans, but is limited because chance, bias, and confounding could not be ruled out with confidence.
Inadequate/Insufficient Evidence to Determine Whether an Association Does or Does Not Exist. The available studies are of insufficient quality, consistency, or statistical power to permit a conclusion regarding the presence or absence of an association between an exposure to a specific agent and a health outcome in humans.
Limited/Suggestive Evidence of No Association. There are several adequate studies, covering the full range of levels of exposure that humans are known to encounter, that are mutually consistent in not showing a positive association between exposure to a specific agent and a health outcome at any level of exposure. A conclusion of no association is inevitably limited to the conditions, levels of exposure, and length of observation covered by the available studies. In addition, the possibility of a very small elevation in risk at the levels of exposure studied can never be excluded.
These five categories cover different degrees or levels of association, with the highest level being sufficient evidence of a causal relationship between exposure to a specific agent and a health outcome. The criteria for each category incorporate key points discussed earlier in this chapter. A recurring theme is that an association is more likely to be valid if it is possible to reduce or eliminate common sources of error in making inferences: chance, bias, and confounding. Accordingly, the criteria for each category express varying degrees of confidence based upon the extent to which it has been possible to exclude these sources of error. To infer a causal relationship from a body of evidence, the committee relied on long-standing criteria for assessing causation in epidemiology (Hill, 1971; Evans, 1976).
COMMENTS ON INCREASED RISK OF ADVERSE HEALTH OUTCOMES AMONG GULF WAR VETERANS
As discussed in the beginning of this chapter, the committee reviewed the available scientific evidence in the peer-reviewed literature in order to draw conclusions about associations between the agents of interest and adverse health effects in all populations. The committee placed its conclusions in categories that reflect the strength of the evidence for an association between exposure to the agent and health outcomes. The committee could not measure the likelihood that Gulf War veterans’ health problems are associated with or caused by these agents. To address this issue, the committee would need to compare the rates of health effects in Gulf War veterans exposed to the putative agents with the rates of those who were not exposed, which would require information about the agents to which individual veterans were exposed and their doses. However, as discussed throughout this report, there is a paucity of data regarding the actual agents and doses to which individual Gulf War veterans were exposed. Further, to answer questions about increased risk of illnesses in Gulf War veterans, it would also be important to know the degree to which any other differences be-
tween exposed and unexposed veterans could influence the rates of health outcomes. This information is also lacking for the Gulf War veteran population. Indeed most of the evidence that the committee used to form its conclusions about the association of the putative agents and health effects comes from studies of populations exposed to these agents in occupational and clinical settings, rather than from studies of Gulf War veterans. Due to the lack of exposure data on veterans, the committee could not extrapolate from the level of exposure in the studies that it reviewed to the level of exposure in Gulf War veterans. Thus, the committee could not determine the likelihood of increased risk of adverse health outcomes among Gulf War veterans due to exposure to the agents examined in this report.
Ballantyne B. 1992. Exposure–dose–response relationships. In: Sullivan JB Jr, Krieger GR, eds. Hazardous Materials Toxicology: Clinical Principles of Environmental Health. Baltimore, MD: Williams & Wilkins. Pp. 24–30.
Brachman PS, Gold H, Plotkin S, Fekety FR, Werrin M, Ingraham NR. 1962. Field evaluation of a human anthrax vaccine. Am J Public Health 52:632–645.
Brooks SM, Gochfeld M, Herzstein J, Jackson RJ, Schenker MB. 1995. Environmental Medicine. New York: Mosby-Year Book, Inc.
Cohrssen JJ, Covello VT. 1989. Risk Analysis: A Guide to Principles and Methods for Analyzing Health and Environmental Risks. Washington, DC: Council on Environmental Quality, Executive Office of the President.
Ellwood JM. 1998. Critical Appraisal of Epidemiological Studies and Clinical Trials. 2nd edition Oxford: Oxford University Press.
Evans AS. 1976. Causation and disease: The Henle-Koch postulates revisited. Yale J Biol Med 49(2):175–195.
FDA (Food and Drug Administration). 1998. Center for Drug Evaluation and Research Handbook. [Online]. Available: http://www.fda.gov/cder/handbook/index.htm (accessed April 2000).
Hill AB. 1965. The environment and disease: Association or causation? Proc R Soc Med 58:295–300.
Hill AB. 1971. Principles of Medical Statistics. New York: Oxford University Press.
IOM (Institute of Medicine). 1991. Adverse Effects of Pertussis and Rubella Vaccines. Washington, DC: National Academy Press.
IOM (Institute of Medicine). 1994a. Adverse Events Associated with Childhood Vaccines: Evidence Bearing on Causality. Washington, DC: National Academy Press.
IOM (Institute of Medicine). 1994b. Veterans and Agent Orange: Health Effects of Herbicides Used in Vietnam. Washington, DC: National Academy Press.
IOM (Institute of Medicine). 1996. Veterans and Agent Orange: Update 1996. Washington, DC: National Academy Press.
IOM (Institute of Medicine). 1999. Veterans and Agent Orange: Update 1998. Washington, DC: National Academy Press.
IOM (Institute of Medicine). 2000. Clearing the Air: Asthma and Indoor Air Exposures. Washington, DC: National Academy Press.
Kramer MS, Lane DA. 1992. Causal propositions in clinical research and practice. J Clin Epidemiol 45(6):639–649.
Lilienfeld DE. 1978. Definitions of epidemiology. Am J Epidemiol 107(2):87–90.
Monson RR. 1990. Occupational Epidemiology. 2nd edition. Boca Raton, FL: CRC Press, Inc.
NIH (National Institutes of Health). 1991. Code of Federal Regulations. Title 45 Public Welfare. Part 46 Protection of Human Subjects. [Online]. Available: http://grants.nih.gov/grants/oprr/humansubjects/45cfr46.htm (accessed April 2000).
NRC (National Research Council). 1991. Animals as Sentinels of Environmental Health Hazards. Washington, DC: National Academy Press.
OTA (Office of Technology Assessment). 1990. Neurotoxicity: Identifying and Controlling Poisons of the Nervous System. Washington, DC: U.S. Government Printing Office. OTA-BA-436.
Randal J. 1999. Randomized controlled trials mark a golden anniversary. J Natl Cancer Inst 91(1):10–12.
Wegman DH, Woods NF, Bailar JC. 1997. Invited commentary: How would we know a Gulf War syndrome if we saw one? Am J Epidemiol 146(9):704–711, 712.