CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE
This chapter presents the approach that the committee used to identify and evaluate the health and epidemiologic literature on Gulf War veterans. It provides information on how the committee searched the literature and discusses the major types of studies considered. The chapter also includes a discussion of the committee’s evaluation criteria, the limitations of the studies reviewed, and the categories of association that the committee used in drawing conclusions about the possible health effects that might result from being deployed in the Gulf War.
Because the committee was tasked with determining the prevalence of diseases and symptoms in Gulf War veterans, the committee reviewed primarily observational studies that compared health outcomes seen in or reported by veterans deployed to the Gulf War with their nondeployed counterparts. The committee was not asked to associate diseases or health outcomes with exposures to specific biologic or chemical agents such as pesticides, nerve agents, or combustion products. The committee also did not concern itself with any policy issues, such as potential costs of compensation or policies regarding compensation. In Volume 4 of the Gulf War and Health series, Health Effects of Serving in the Gulf War, that committee identified numerous cohort and case-control studies that it objectively reviewed without preconceived ideas about health outcomes. To assist it in its work, the committee developed criteria to determine which studies to include in its review. The Update committee reviewed and used those criteria to evaluate the studies that have been published since Volume 4 but also used categories of association to determine the strength of the association between deployment to the Gulf War and health outcomes. The categories of association have been used by the other reports in the Gulf War and Health series, with the exception of Volume 4.
IDENTIFICATION OF THE LITERATURE
The committee began its work by overseeing extensive searches of the scientific literature, including published articles, other reports, and government documents that had been published after the last literature search for Volume 4, conducted in July 2005. The updated search retrieved over 1000 studies of potential pertinence to these analyses, and the titles and abstracts of those studies were reviewed. Studies that did not appear to have immediate relevance for this committee, based on an assessment of the title and abstract, were deleted from the search. Deleted studies included, but were not limited to, case studies, studies of civilians in the Persian
Gulf area, treatments, studies of short-term health outcomes only, rehabilitation, social outcomes (for example, employment), impacts on families, or studies of long-term outcomes from known physical events, such as gun-shot wounds. After the removal of these studies, approximately 400 potentially relevant epidemiologic studies were obtained for review and evaluation. The titles and abstracts of studies that had not been obtained as full text were available to the committee for review. The 400 studies that were obtained as full text were objectively evaluated by the committee members without preconceived ideas about what health outcomes might be seen and what, if any, associations might be found between being deployed to the Gulf War and any health outcomes.
The committee adopted a policy of using only published or unpublished literature that had undergone rigorous peer review as the basis of its conclusions. An exception to this policy was the inclusion of a few government reports. While the process of peer review by fellow professionals increases the likelihood that high-quality studies will appear in the literature, it does not guarantee the validity of any particular study or the ability to generalize its findings. Accordingly, committee members read each study critically and considered its relevance and quality. The committee did not collect original data, nor did they perform any secondary data analyses.
After securing the full text of the relevant studies, the committee determined which health conditions it would focus upon in the report. Initially, the health conditions listed in Volume 4 were used but after reviewing numerous studies, new health outcomes were added, such as diseases of the blood and blood-forming organs and endocrine disorders. For each health outcome, one or more committee member with expertise or knowledge of a particular health outcome volunteered to screen all 400 studies in the database to identify all the epidemiologic studies that appeared to include information on that health outcome. The responsible committee member then conducted a preliminary review of the studies, including those cited in Volume 4, to determine what, if any, information the study had on the health outcome of interest and if an individual study met the inclusion criteria for a primary or secondary study (see below). The responsible committee member(s) then presented the information from the initial study screening and categorization to the full committee for discussion. Typically, the information presented included the populations used in the study, the methods for selecting and evaluating the populations, the study results, and the committee member’s assessment of the strengths and limitations of the study. Each primary and secondary paper was discussed for each health outcome. Because of the variability in the description and diagnosis of the health conditions considered in this report, it was impossible for the committee to make a priori assumptions about the utility of any paper for a health outcome; each paper was discussed individually for each health outcome. After the studies had been discussed in plenary session, the responsible committee member drafted the text for that health outcome; the draft text was reviewed and discussed in further plenary sessions until all committee members reach a consensus on the description of the studies and the summary and conclusions. After this language was agreed upon, the full committee assigned a category of association based on the number and quality of the primary and secondary studies and expert judgment. It should be noted that the committee did not use a formulaic approach as to the number of primary and secondary studies that would be necessary to assign a specific category of association. Rather the committee found that each health outcome required a more considered and nuanced approach as described in the summary and conclusion section.
The following section briefly discusses types of evidence and the value of epidemiologic or clinical studies in determining whether an association exists. It is followed by a discussion of the committee’s specific inclusion criteria that were developed to help decide whether a particular study would be included and evaluated for this report. The committee also notes the numerous factors that it considered in evaluating the evidence in a study and, finally, presents the categories of association used in drawing conclusions about the strength of associations.
TYPES OF EVIDENCE
The committee relied entirely on clinical and human epidemiologic studies to draw its conclusions about the strength of evidence regarding associations between deployment to the Gulf War and health outcomes seen in Gulf War veterans. The committee acknowledges, however, that animal studies might prove helpful in providing biologic understanding of many of the effects seen in humans from specific exposures, such as pesticides, solvents, and nerve agents, which have been reported by troops deployed in the Gulf War. Furthermore, information from molecular and cellular biology, neuroimaging, and other types of human studies can be used to understand the biological mechanisms and identification of biomarkers for clinical outcomes. Such studies, however, are not, in general, included in this review.
In epidemiological research, analytical studies are designed to permit the examination of the association between two or more variables. Predictor variable or independent variable is a term used for an exposure to an agent of interest in a human population. Outcome variable or dependent variable is a term used to define a health or health-related event seen in a human population. Outcomes can also include a number of nonhealth results, such as use of medical services, social changes, and employment changes. One important goal of epidemiological research is to generate information that will help to understand whether exposure to a specific agent is associated with disease occurrence or other health outcomes. This goal is accomplished most straightforwardly in experimental studies in which the investigator controls the exposure (generally through random assignment) and the association between exposure and the subsequent occurrence of an outcome can be measured. Experimental studies are clearly not possible for studying the health effects of deployment to the Gulf War. Therefore, the studies included in this review are observational, not experimental, and compare health outcomes in those deployed to the Gulf with health outcomes seen in those who were in the military during the Gulf War but were not deployed to the Persian Gulf region. What is then assessed is the presence or absence of an association between the exposure and the outcomes.
Associations in Epidemiologic Studies
Association is primarily a statistical concept referring to the quantification of the relationship (positive, negative, or none) between two variables (e.g., independent and dependent). In the presence of association, additional considerations are required for causality to be judged as the reason behind the observed association. Apart from arising from a causal relationship between exposure and outcome, there may be other possible reasons for finding associations in observational studies including random error (chance), systematic error (bias),
and reverse causality. Random error and systematic error can also be responsible for not observing an association when one truly does exist. It is essential to consider these alternative explanations in judging the findings of an epidemiological study.
Random error, sometimes referred to as “chance,” is the statistical variation in a measurement of exposure, outcome, or both that arises from the fact that one cannot include an entire population in any study nor measure exposure or outcomes perfectly. The impact of random error can be mitigated through careful measurement and the inclusion of large samples, and it is quantified using statistical approaches including confidence intervals. For the most part, random error tends to result in an inability to find an association when one truly exists, that is, one is unable to separate out the signal from the noise. Systematic error or bias is the result of limitations in how the study was designed or conducted. Systematic error can cause an observed value to deviate from its true value and can falsely strengthen or weaken an association or generate a spurious association. Selection bias is one form of systematic error that occurs when the method of recruiting a study sample results in a sample that differs in some systematic way from the target population of the study. The findings of such a study are then potentially “biased” and may over- or underestimate the true association with the direction being dependent upon the form of selection bias. In addition, selection bias can also occur in a prospective study when there are losses to follow-up that differ between the exposed and unexposed group. Information bias relates to the way exposure or outcome factors are measured. If measurements are collected differently in groups that are to be compared then observed associations may be the result of these measurement differences rather than a true association.
Confounding bias occurs when a third variable, termed a confounding variable (or confounder), is associated with both the exposure and the outcome and mistakenly leads to the conclusion that the exposure is associated with the outcome. If the potential confounding variable is identified then statistical methods can be used to adjust for this form of bias.
Reverse causality bias may occur when the outcome actually precedes the exposure; for example, a study might suggest that a particular psychiatric outcome is a result of a traumatic brain injury. However, in reality, the psychiatric condition actually preceded the injury and the presence of the psychiatric condition placed individuals at increased risk of being injured.
Thus, the interpretation of the results of observational studies is complex. The committee reviewed the studies in this report with a view to considering the level of random error, the potential for bias, as well as the authors’ strategies for examining and/or limiting the impact of each on the study findings.
To conclude that an association exists, it is necessary for the exposure to be followed by the outcome more often (or less often in the case of a protective exposure) than would be expected to occur by chance alone (that is, if no association actually existed). The strength of an association is typically expressed as a ratio of the frequency of an outcome in a group of participants who have a particular exposure to the frequency in a group without that same exposure. The strength of an association between exposure and outcome is generally estimated quantitatively by using prevalence ratios, relative risks (RRs, also called risk ratios), odds ratios (ORs), correlation coefficients, or hazard ratios (HRs) depending on the epidemiologic design used. A ratio greater than 1.0 indicates that the outcome variable has occurred more frequently in the exposed group, and a ratio less than 1.0 indicates that it has occurred less frequently. Ratios are typically reported with confidence intervals (CIs) to quantify random error. If a CI (for example, 95% CI) for a ratio measure includes 1.0, the observed association is said to be
consistent with the null value (that is, no association). If the computed confidence interval does not include 1.0, the association is said to be consistent with a positive (or negative) association.
Determining whether an observed statistical association is causal requires additional considerations that must be examined carefully in the context of the particular relationship under study. Causality cannot be established directly through observational epidemiological studies for the reasons outlined above. The issue of causality is a major concern in epidemiology and in 1965, following the Surgeon General’s report on the relationship between smoking and lung cancer, Sir Austin Bradford Hill, a British epidemiologist and statistician, described nine aspects that should be carefully considered when trying to come to a decision about whether an observed association might be causal (Hill, 1965). While all aspects are relevant in making inference about causality there is only one of the nine aspects that is truly necessary and that is temporality. The remaining eight aspects are neither necessary nor sufficient requirements for causation but do present a framework for consideration. While the committee was mindful of the Bradford Hill aspects when assigning the categories of association discussed later in the chapter, it did not use them as rigid criteria but rather guidelines to inform its conclusions about the association between deployment to the Gulf War and a particular health outcome. Aspects such as consistency, plausibility, and strength of association were discussed for each health outcome as the committee reached consensus on assigning a category of association but there was no requirement that all the aspects be met. The nine aspects are summarized below.
Strength of association. Hill argues that a strong association is an important consideration and in the absence of other explanations would be a marker of causation. However he also points out that the absence of a strong association does not preclude a causal relationship.
Consistency. If an association is observed in different studies, using different designs and in different settings then this would be supportive of a causal association.
Specificity. If the association is specific to a particular exposure-disease outcome combination and there is no association between the exposure and other outcomes then such a finding would favor a causal association.
Temporality. For an association to be causal it is essential that there be evidence that the exposure in question precedes the outcome of interest.
Biologic gradient. Evidence of a biological gradient (also called a dose-response relationship) between increasing levels of the exposure and increasing frequency of the outcome supports a causal association.
Plausibility. Hill suggested that if the observed association was biologically plausible this would add evidence for causality. He further noted, however, that if the observed association was new then biological plausibility might not be expected.
Coherence. Following from biological plausibility, Hill suggested that at least the observed association should not contradict known facts.
Experiment. An association would be judged more likely to be causal if evidence is based on randomized experiments.
Analogy. Hill’s final aspect for consideration was analogy: “In some circumstances it would be fair to judge by analogy.” By this, Hill referred to the effect already having been shown for another similar exposure.
A strong association as measured by a high (or low) risk or ratio, an association that is found in a number of studies, and an increased risk of disease with increasing exposure or a decline in risk after cessation of exposure all strengthen the likelihood that an association seen in epidemiologic studies is causal. With deployment to a war-zone, there can be substantial uncertainty in the assessment of possible exposures in theater. To assess whether explanations other than causality (such as random or systematic error) are responsible for an observed association, one must bring together evidence from different studies and take into account the considerations presented by Hill and others (Evans, 1976; Hill, 1965; Susser, 1973, 1977, 1988, 1991; Wegman et al., 1997). For a recent review of those criteria, see the 2004 report of the US Surgeon General (Office of the Surgeon General, 2004).
TYPES OF EPIDEMIOLOGIC STUDIES
The committee focused on epidemiologic studies because epidemiology deals with the determinants, frequency, and distribution of disease in human populations. A focus on populations distinguishes epidemiology from medical disciplines that focus on the individual. Epidemiologic studies examine the relationship between exposures to agents of interest and health outcomes in a population (in this review, deployment is the exposure). Such studies can be used to generate hypotheses for study or to test hypotheses posed by investigators. This section describes the major types of epidemiologic studies considered by the committee.
A cohort study is an epidemiologic design that follows a defined group, or cohort, over a period of time. Using data from a cohort study, investigators can test hypotheses about whether exposure to a specific agent is related to the development of disease and can examine multiple health outcomes that might be associated with exposure to a given agent (for example, to deployment). A cohort study starts with people who are free of a disease (or other outcome) and classifies them according to whether they have been exposed to the agent of interest. It compares health outcomes in people who have been exposed to the agent in question with those who have not.
Cohort studies can collect data prospectively (such as in repeated follow-ups) or retrospectively (when exposure and outcome records exist). Generally, investigators select a group of subjects free of the health outcome at baseline (start of study follow-up) and determine who is exposed and not exposed to a given agent (independent variable) during follow-up while also determining the occurrence of the health outcome in both exposed and unexposed cohort members over time. In a retrospective (or historical) cohort study, investigators usually rely on records to determine past exposures for the cohort and another record system to ascertain the rate of disease. In a prospective cohort study both the exposure and disease assessment methods can be designed by the investigator rather than having to rely on existing records as is necessary for a retrospective cohort study. However, a prospective cohort study will not be able to provide sufficient data on chronic disease risk factors until a number of years, if not decades, of follow-up time have accrued. That is, many diseases have a lengthy latency period, for some health outcomes such as some cancers and cardiovascular disease, this can be 20 years or more.
Cohort studies can be used to estimate a risk difference or a relative risk, two statistics that measure association between the exposure groups. The risk difference, or attributable risk, is
the rate of disease in exposed persons minus the rate in unexposed persons, representing the excess risk of disease possibly attributable to the exposure. The relative risk is determined by dividing the rate of disease in the exposed group (for example, the deployed group) by the rate of disease in the nonexposed group (for example, the nondeployed group). A relative risk greater than 1.0 suggests an association between exposure and disease onset; the higher the relative risk, the stronger the association. A relative risk of less than 1.0, on the other hand, suggests a protective role for the exposure under study.
Cohort studies have several advantages and disadvantages as described in detail in Chapter 3. Generally, the advantages outweigh the disadvantages if the study is well designed and conducted. The advantages of cohort studies include the following:
The investigator knows that the predictor (exposure) variable preceded the outcome (disease) variable.
Exposure can be defined and classified at the beginning of the study, and subjects can be selected based on exposure definition.
Information on potential confounding variables can be collected in a prospective cohort study so that they may be taken into account in the analysis.
Rare or unique exposures (such as Gulf War exposures) can be studied, and the investigators can study multiple health outcomes.
Absolute rates or risk of disease incidence and prevalence can be estimated.1 Disadvantages of cohort studies include the following:
They are often expensive because of the long periods of follow-up required to accrue sufficient number of disease outcomes for analysis.
Long follow-up periods result in attrition of study subjects and delay in obtaining results.
They are inefficient for the study of rare diseases or diseases of long latency.
There is a possibility of the “healthy-worker effect”2 (Monson, 1990), which might introduce bias and can diminish the true exposure-disease relationship.
In a case-control study, subjects (cases) are selected on the basis of having a disease; controls are selected on the basis of not having the disease. Cases and controls are asked about their exposures to specific agents. Cases and controls can be matched on characteristics such as age, sex, and socioeconomic status as a method of increasing efficiency and controlling for
confounders. The odds of exposure to the agent among the cases is then compared with the odds of exposure among controls. The comparison generates an OR, which is a statistic that depicts the odds of having a disease among those exposed to the agent of concern relative to the odds of having the disease among an unexposed comparison group. An OR of greater than 1 indicates that there is a potential association between exposure to the agent and the disease; the further from 1.0 the OR ratio, the stronger the association. The OR is a measure of association that is interpreted in the same way as a relative risk or a risk ratio.
Case-control studies are especially useful and efficient for studying the etiology of rare diseases, having the advantages of ease, speed, and relatively low cost. They are also valuable for probing multiple exposures or risk factors. However, case-control studies are vulnerable to several types of bias, such as recall bias, which can enhance (or dilute) apparent associations between disease and exposure. Other problems include identifying representative groups of cases, choosing suitable controls, and collecting comparable information about exposures on both cases and controls. Those problems might lead to unidentified confounding variables that differentially influence the selection of cases or control subjects or the detection of exposure. Case-control studies are often the first approach to testing a hypothesis, especially one related to a rare outcome.
A nested case-control study draws cases and controls from a previously defined cohort. Thus, it is said to be “nested” inside a cohort study. Baseline data are collected at the time that the cohort is identified, and this ensures more uniform data collection on both cases and controls. Within the cohort, individuals identified with disease serve as cases, and a sample of those who are disease-free serve as controls. Using baseline data, exposure in cases and controls is compared, as in a regular case-control study. Using particular statistical approaches, changes in exposures over time can also be incorporated in the analysis. Nested case-control studies are efficient in terms of time and cost in reconstructing exposure histories on cases and on only a sample of controls rather than the entire cohort. Additionally, because the cases and controls come from the same previously established cohort, concerns about unmeasured confounders and selection bias are decreased.
The main differentiating feature of a cross-sectional study is that exposure and disease information is collected at the same point (period) of time. The selection of people for the study—unlike selection for cohort and case-control studies—is independent of both the exposure to the agent under study and disease characteristics. In a cross-sectional study, effect size is measured as prevalence ratio, or prevalence OR. In such studies disease or symptom prevalence between groups with and without exposure to the specific agent are compared. Several health studies of Gulf War veterans are cross-sectional studies that compare a sample of veterans who were deployed to the Gulf War with a sample of veterans who served during the same period but were not deployed to the Gulf War.
Cross-sectional studies are easier and less expensive to perform than cohort studies and can identify the prevalence of diseases and exposures in a defined population. They are useful for generating hypotheses, but they are much less useful for determining cause and effect relationships, because disease and exposure data are collected simultaneously (Monson, 1990); for this reason it may be difficult to determine the temporal sequence of exposures and symptoms or disease.
Standardized Mortality Studies
For comparison purposes, some cohort studies use mortality or morbidity rates in the general population rather than from within the same cohort since in some cases it might be difficult to identify a suitable group of unexposed people. One statistic that is used in such a comparison is the standardized mortality ratio (SMR), which is the ratio of the observed number of deaths in a cohort (from a specific cause, such as traumatic brain injury [TBI], for example) to the number of deaths from TBI expected in a carefully chosen reference population. An SMR greater than 1.0 generally suggests an increased risk of death in the exposed group. The standardization refers to the methodology used to ensure that any differences in observed and expected numbers of death are not due to differences in the age (or sex) distribution of the study cohort and the comparison cohort. Such measures can also be used to examine morbidity, such as cancer.
The major problem in comparing rates in the general population with rates in military cohorts is the so-called healthy-warrior effect described earlier. That effect arises when a military population experiences a lower mortality or morbidity rate than the general population. Inasmuch as military personnel must meet physical-health criteria when they enter the military and while they are on active duty, the group’s health status is usually better than that of the general population of the same age and sex. Since military personnel are at overall lower risk of adverse health outcomes compared to the general population, any excess risk associated with an exposure they experience must be large enough to overcome their inherent advantage in order to be detectable by such methods as SMR.
The Update committee included studies that would answer the question, “What does the literature tell us about the health status of Gulf War veterans?” To that end, the committee searched the literature and included descriptive epidemiologic studies of health outcomes in military personnel that served in the Gulf War theater. The studies were not restricted to US personnel only. The Volume 4 committee developed inclusion criteria for studies; the Update committee reviewed those criteria and found them to be appropriate for this update. Primary studies provide the basis for the committee’s findings. For a study to be included in the committee’s review as a primary study it had to meet specified criteria. A study needed to demonstrate rigorous methods (for example, was published in a peer-reviewed journal, included details of methods, had a control or reference group, and included adjustments for confounders when needed), include information regarding a persistent health outcome, have a medical evaluation conducted by a health professional, and use appropriate laboratory testing. Those types of studies constituted the committee’s primary literature. The committee did not evaluate studies of acute trauma, rehabilitation, medical treatment, or transient illness, nor did the committee consider health outcomes seen in veterans of conflicts other than the Gulf War unless those veterans formed an appropriate control group (for example, veterans who had served in Bosnia). Although the responsible committee member initially presented his or her determination of whether a study met the criteria, the committee discussed the study’s methods and results using the inclusion criteria at some length before agreeing as to whether the study should be classified as primary.
Studies reviewed by the committee that did not necessarily meet all the criteria of a primary study were considered secondary studies. Secondary studies are typically not as methodologically rigorous as primary studies and might present subclinical findings, that is, studies of altered functioning consistent with later development of a diagnosis but without clear predictive value.
Another step that the committee took in organizing its literature was to determine how all the study cohorts were related to one another. Numerous Gulf War cohorts have been assembled from several different countries, and it is from those original cohorts that many derivative studies have been conducted and published. The committee organized the literature into the major cohorts and derivative studies because they did not want to interpret the findings of the same cohorts as though they were results from unique groups (Chapter 3). The report excludes studies of participants in Gulf War registries established by the Department of Veterans Affairs (VA) or the Department of Defense (DoD), such as the DoD’s Comprehensive Clinical Evaluation Program. Registry participants can not be considered representative of all Gulf War veterans in that they are self-selected subjects, many of whom have joined the registries because they believe that they have symptoms of a new medical syndrome; they were not randomly selected from all Gulf War military personnel, and there is no nondeployed control group.
Finally, in assessing the descriptive studies, the committee was especially attentive to potential sources of bias, confounding, chance, and multiple comparisons, as discussed in the following sections.
A study had to be published in a peer-reviewed journal or other rigorously peer-reviewed publication, such as a government report, dissertation, or monograph; include sufficient methodologic details to allow the committee to judge whether it met inclusion criteria; include an unexposed control or reference group; and use reasonable methods to control for confounders.
In Volume 4, Chapter 3 describes the possible exposures Gulf War military personnel might have experienced. It also details the exposure modeling and biological monitoring that was conducted by the DoD and others to estimate troop exposures to some chemical agents such as depleted uranium, sarin and cyclosarin, and oil-well fire smoke. As noted in that chapter, there is poor agreement between subjective and objective measurements of exposures to depleted uranium and oil-well fire smoke. Some studies also show evidence of reporting bias regarding vaccinations and ingestion of pyridostigmine bromide tablets. The modeling of the possible exposures to sarin and cyclosarin from the demolition of the Khamisiyah complex has also been criticized. The committee did consider studies that compared health outcomes seen in deployed veterans who may or may not have been exposed to nerve agents as a result of the Khamisiyah detonation and to oil-well fire smoke; some of these studies also included nondeployed control groups.
Health Outcome Assessment
For medical conditions that have no morphological features, the use of validated symptom criteria, such as those of the Rome Foundation for irritable bowel syndrome, are
preferred over reports of medical symptoms or group of symptoms. For a study to be considered primary, the committee preferred studies that had an independent assessment of an outcome rather than self-reports of an outcome or reports by family members. It was preferable to have the health effect diagnosed or confirmed by a clinical evaluation, imaging, hospital record, or other medical record. For psychiatric outcomes, standardized interviews were preferred, such as the Structured Clinical Interview for the DSM-IV-TR (Diagnostic and Statistical Manual of Mental Disorders-IV-TR), the Diagnostic Interview Schedule, and the Composite International Diagnostic Interview. Similarly, for neurocognitive outcomes, standardized and validated tests were preferred. Additionally, the outcome had to be diagnosed after deployment. However, as self-reports of health outcomes and exposures account for the bulk of the Gulf War and health literature, the committee decided that it would not exclude such studies but rather considered them to be secondary. The committee recognized the potential for misclassification of a health outcome due to inaccurate recall in such studies.
CONSIDERATIONS IN ASSESSING THE STRENGTH OF EVIDENCE
The committee’s process for reaching conclusions about deployment during the Gulf War and its potential for adverse health outcomes was collective and interactive. Once a study was included in the review because it met the committee’s criteria, there were several considerations in assessing causality, including strength of the association, presence of a dose-response relationship, presence of a temporal relationship, consistency of the association, and biologic plausibility. The committee as a group reviewed the primary and secondary studies identified by the committee member responsible for each health outcome. The strengths and limitations of each study and its categorization as primary and secondary were discussed in plenary session and all committee members agreed on its contribution to the evidence base for each category of association for each health outcome. Because many of the studies were cited for more than one health outcome, committee members evaluated each study with equal vigor for every health outcome. The evidence tables were refined to include study limitations for the primary studies and to present the pertinent results; secondary studies were not included in the evidence tables. It should be noted that some of the larger cohort studies used a variety of methods and instruments to assess the health status of Gulf War veterans and it is for this reason that the committee discussed at some length the diagnostic approaches and use of self-reports for each paper. The assignment of a category of association was reached by committee consensus based on the weight of the evidence, including the studies cited in Volume 4, as well as any new studies. Those aspects of the committee’s review required thoughtful consideration of all the studies as well as expert judgment and could not be accomplished by adherence to a narrowly prescribed formula of what data would be required for each category of association or for a particular health outcome.
Categories of Association
The committee attempted to express its judgment of the available data clearly and precisely in the Summary and Conclusion section for each health outcome. It agreed to use the categories of association that have been established and used by previous Committees on Gulf War and Health and other Institute of Medicine committees that have evaluated vaccine safety, effects of herbicides used in Vietnam, and indoor pollutants related to asthma (IOM, 2000, 2003,
2005, 2006, 2007). Those categories of association have gained wide acceptance for more than a decade by Congress, government agencies (particularly the VA), researchers, and veterans groups.
The five categories below describe different levels of association and present a common message: the validity of an association is likely to vary to the extent to which common sources of spurious associations could be ruled out as the reason for the observed association. Accordingly, the criteria for each category express a degree of confidence based on the extent to which sources of error were reduced. The committee discussed the evidence and reached consensus on the categorization of the evidence for each health outcome in Chapter 4.
Sufficient Evidence of a Causal Relationship
Evidence is sufficient to conclude that a causal relationship exists between being deployed to the Gulf War and a health outcome. The evidence fulfills the criteria for sufficient evidence of a causal association in which chance, bias, and confounding can be ruled out with reasonable confidence. The association is supported by several of the other considerations used to assess causality: strength of association, dose-response relationship, consistency of association, temporal relationship, specificity of association, and biologic plausibility.
Sufficient Evidence of an Association
Evidence suggests an association, in that a positive association has been observed between deployment to the Gulf War and a health outcome in humans; however, there is some doubt as to the influence of chance, bias, and confounding.
Limited/Suggestive Evidence of an Association
Some evidence of an association between deployment to the Gulf War and a health outcome in humans exists, but this is limited by the presence of substantial doubt regarding chance, bias, and confounding.
Inadequate/Insufficient Evidence to Determine Whether an Association Exists
The available studies are of insufficient quality, validity, consistency, or statistical power to permit a conclusion regarding the presence or absence of an association between deployment to the Gulf War and a health outcome in humans.
Limited/Suggestive Evidence of No Association
There are several adequate studies, covering the full range of levels of exposure that humans are known to encounter, that are consistent in not showing an association between deployment to the Gulf War and a health outcome. A conclusion of no association is inevitably limited to the conditions, levels of exposure, and length of observation covered by the available studies. In addition, the possibility of a very small increase in risk at the levels of exposure studied can never be excluded.
The Bradford Hill aspects to consider when evaluating evidence to assess whether an association is causal have important exceptions and qualifications; therefore, the aspects, however useful, are neither criteria nor hard and fast rules for assessing causality (Rothman and Greenland, 2005). The validity of data and individual studies that may contribute evidence as to whether an association is causal must also be considered. Although strict rules are not available, design flaws that threaten the validity of a study often fall into one of three major categories: selection bias, confounding, and misclassification or information bias. Of particular relevance here is the healthy warrior effect described earlier. Failure to account for such differences can lead to biased estimates of an effect. These factors were all considered by the committee in evaluating the quality of data and individual studies, in determining the primary and secondary literature that would be used to draw conclusions, and in evaluating how those studies contribute to the body of evidence concerning health effects seen in Gulf War veterans.
Bias refers to systematic, or nonrandom, error. Bias causes an observed value to deviate from the true value, and can weaken an association, strengthen an association or generate a spurious association. Because all studies are susceptible to bias, a primary goal of the research design is to minimize bias or to adjust the observed value of an association by correcting for bias if the sources are known. There are different types of bias, such as selection bias. Selection bias refers to a systematic error in the way subjects are identified, recruited, included, excluded, or the way they participate in the study that leads to a distortion of the true association.
Information bias results from the manner in which data are collected and can result in measurement errors, imprecise measurement, and misdiagnosis. Those types of errors might be uniform in an entire study population or might affect some parts of the population more than others. Information bias might result from misclassification of study subjects with respect to the outcome variable or from misclassification of exposure. Other common sources of information bias are the inability of study subjects to recall the circumstances of their exposure accurately (recall bias) and the likelihood that one group more frequently reports what it remembers than another group (reporting bias). Information bias is especially harmful in interpreting study results when it affects one comparison group more than another.
Confounding occurs when a variable or characteristic otherwise known to be predictive of an outcome and associated with the exposure (and not on the causal pathway under consideration) can account for part or all of an apparent association. A confounding variable is an uncontrolled variable that influences the outcome of a study to an unknown extent, and makes precise evaluation of its effects impossible. Carefully applied statistical adjustments can often control for or reduce the influence of a confounder.
Sampling error (sometimes referred to as chance or random error) is a type of error that can lead to an apparent association between an exposure to an agent and a health effect when no
association is present or to a finding of no association when in fact one exists. An apparent effect of deployment on a health outcome might be the result of random variation due to sampling of the study population rather than the result of exposure to the agent. Standard methods that use confidence intervals, for example, allow one to assess the role of chance variation due to sampling.
When an investigator initiates a large number of analyses simultaneously on the same dataset, multiple comparisons pose a problem. When looking at so many different comparisons, the investigator is bound to find something of note by chance alone. For example, in many Gulf War veteran studies, the investigators are comparing multiple outcomes and multiple exposures. There are, however, ways to correct for multiple comparisons in studies. One way is to use a Bonferroni correction, a statistical adjustment for multiple comparisons. It effectively raises the standard of proof needed when an investigator looks at a wide array of hypotheses simultaneously.
LIMITATIONS OF GULF WAR VETERAN STUDIES
The epidemiologic and clinical studies to date have provided valuable information regarding the health of Gulf War veterans; however, many of the studies have limitations that hinder accurate assessment of the veterans’ health status. The limitations include the possibility that study samples do not represent the entire Gulf War population, the relatively young age of the exposed population, low rates of participation in studies, reinforcement of self-reporting of symptoms and exposures, insensitivity of instruments for detecting abnormalities in deployed veterans, and a period of investigation that is too brief to detect health outcomes that have long latency, such as cancer. In addition, many of the US studies are cross-sectional and this limits the opportunity to learn about symptom duration and chronicity, latency of onset, and prognosis. Finally, the problem of multiple comparisons that is common in many of the Gulf War studies results in confusion over whether the effect is real or occurring by chance. Those limitations make it difficult to interpret the results of the findings, particularly when several well-conducted studies produce inconsistent results.
Evans, A. S. 1976. Causation and disease: The Henle-Koch postulates revisited. Yale Journal of Biology and Medicine 49(2):175-195.
Hill, A. B. 1965. The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine 58:295-300.
IOM (Institute of Medicine). 2000. Gulf War and Health, Volume 1: Depleted Uranium, Sarin, Pyridostigmine Bromide, Vaccines. Washington, DC: National Academy Press.
IOM. 2003. Gulf War and Health, Volume 2: Insecticides and Solvents. Washington, DC: The National Academies Press.
IOM. 2005. Gulf War and Health, Volume 3: Fuels, Combustion Products, and Propellants. Washington, DC: The National Academies Press.
IOM. 2006. Gulf War and Health, Volume 4: Health Effects of Serving in the Gulf War. Washington, DC: The National Academies Press.
IOM. 2007. Gulf War and Health, Volume 5: Infectious Diseases. Washington, DC: The National Academies Press.
Monson, R. 1990. Occupational Epidemiology. 2nd ed. Boca Ration, FL: CRC Press.
Office of the Surgeon General. 2004. The Health Consequences of Smoking: A Report of the Surgeon General. http://www.surgeongeneral.gov/library/smokingconsequences (accessed July 31, 2008).
Rothman, K. J., and S. Greenland. 2005. Causality and causal inference in epidemiology. American Journal of Public Health 95:S144-S150.
Susser, M. 1973. Casual Thinking in the Health Sciences: Concepts and Strategies of Epidemiology. New York: Oxford University Press.
Susser, M. 1977. Judgment and causal inference: Criteria in epidemiologic studies. American Journal of Epidemiology 105(1):1-15.
Susser, M. 1988. Falsification, verification, and causal inference in epidemiology: Reconsideration in the light of Sir Karl Popper's philosophy. In Causal Inference, edited by K. J. Rothman. Chestnut Hill, MA: Epidemiology Resources. Pp. 33-58.
Susser, M. 1991. What is a cause and how do we know one? A grammar for pragmatic epidemiology. American Journal of Epidemiology 133(7):635-648.
Wegman, D. H., N. F. Woods, and J. C. Bailar. 1997. Invited commentary: How would we know a Gulf War syndrome if we saw one? American Journal of Epidemiology 146(9):704-711.