Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE This chapter reviews the approach that the committee took to identify and evaluate the health studies of Gulf War and other veterans. It provides information about the types of evidence the committee reviewed, how the committee assessed the strength of the evidence, and the categories of evidence the committee used to summarize its findings. The committee began its evaluation by presuming neither the existence nor the absence of associations between deployment-related stress and health effects. It sought to characterize and weigh the strength and limitations of the available evidence. The committee reviewed primarily epidemiologic studies of Gulf War and other veterans to determine the prevalence of diseases and symptoms in deployed vs nondeployed veteran populations, using deployment as a surrogate for deployment-related stress. In this chapter, the committee describes its approach to enable the reader to assess and interpret its findings and to assist those who may update the committeeâs conclusions as new information becomes available. It is possible that in the future the literature may suggest other health effects that have an association with deployment-related stress and that merit evaluation of the evidence, or new information may become available that will necessitate revisiting the conclusions presented in this report. IDENTIFICATION OF THE LITERATURE The committeeâs first step was to identify the literature it would review. It began its work by overseeing extensive searches of the English-language, peer-reviewed medical and scientific literature. It identified epidemiologic studies of persistent health effects associated with veterans deployed during World War II, the Korean War, the Vietnam War, the 1991 Gulf War, and Operation Enduring Freedom (OEF) in Afghanistan and Operation Iraqi Freedom (OIF) in Iraq. Although primary consideration was given to studies of U.S. military personnel, studies of veterans of these wars from other countries, such as Australia and the United Kingdom, were also included. The searches retrieved over 3000 potentially relevant references. January 2007 was the cutoff date for all literature searches. The committee reviewed epidemiologic studies for evidence of an association between deployment and persistent health outcomes in veterans. The committee used its collective judgment in selecting studies thought to reflect the types of stressors that Gulf War and other veterans might have experienced during deployment. Although 17
18 GULF WAR AND HEALTH veterans are exposed to numerous stressors (see Chapter 3), epidemiologic studies are not typically designed to address such types of exposures or the effects associated with specific stressors. The committee adopted a policy of using only peer-reviewed published literature as the basis of its conclusions. Publications that were not peer-reviewed had no evidentiary value for the committee; that is, they were not used as evidence for arriving at conclusions about the degree of association between deployment to war zone and adverse health effects. The process of peer review by fellow professionals, which is one of the hallmarks of modern science, ensures high standards of quality but does not guarantee the validity of a study or that its results can be generalized, particularly with respect to questions that were not the objective of the original researchers. Accordingly, committee members read each study critically and considered its relevance and quality. In some instances, non-peer-reviewed publications provided background information for the committee and raised issues that required further literature searches. The committee did not collect original data, nor did it perform any secondary data analysis. The committee chose not to adopt a formal meta-analysis approach because of practical concerns. First, there is a striking amount of heterogeneity in the epidemiologic studies in this report. They vary with respect to the nature, level, and measurement of exposure; the definition and measurement of outcomes; and study design. Further, many studies include multiple odds ratios or relative risk, corresponding to different study groups, control groups, outcomes, exposure measures, statistical models, etc. It is often not possible to choose a single measure from each study in a consistent way, and the meta-analysis would be subject to arbitrary decisions about data extraction. For many health effects the committee found only one or two primary papers and secondary papers. Although the committee used deployment as the exposure of interest for this report, the type, duration, and the nature of the deployment experience varied widely between the studies, again reducing the utility of a meta-analysis. The committee believed that a descriptive analysis would be more comprehensive and accurate given the varying quality and quantity of the studies for each health effect. With that orientation to the committeeâs task, the following sections provide a brief discussion of factors influencing the value of epidemiologic studies, the committeeâs criteria for inclusion of studies in its review, considerations in evaluating the evidence or data provided by the studies, and the categories of association for the conclusions about the strength of the evidence presented in the studies. TYPES OF EVIDENCE The committee relied entirely on epidemiologic studies to draw its conclusions about the strength of the evidence for an association between deployment to a war zone (a stressor) and health effects (see Chapter 6). However, animal studies play a critical role in elucidating the mechanisms of the stress response (see Chapter 4) and provide a biological platform for many of the effects seen in humans, including that for posttraumatic stress disorder (PTSD) (see Chapter 5). Animal Studies Studies of laboratory animals are essential to understanding mechanisms of action, biologic plausibility, and providing response information about possible health effects when
CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE 19 experimental research in humans is not ethically or practically possible (NRC 1991). Such studies permit a stressor to be introduced under conditions controlled by the researcherâsuch as its intensity and durationâto probe health effects on many body systems. Nonhuman studies are also a valuable complement to human studies of genetic susceptibility. Animal studies may determine the degree of response to acute (short-term) or chronic (long-term) exposures to stressors. Animal research may focus on the mechanism of action (that is, how a stressor exerts its deleterious effects at the cellular and molecular levels). Mechanism- of-action (or mechanistic) studies encompass a range of laboratory approaches with whole animals and in vitro systems using tissues or cells from humans or animals. In carrying out its charge, the committee used animal studies to provide a basis for a mechanism of action for the impact of stress on biological functions, including changes in brain structure, hormone concentrations, and neurological activity (see Chapter 4), to help establish biologic plausibility. One of the problems with animal studies, however, is the difficulty of finding animal models to study symptoms that relate to uniquely human attributes, such as cognition, purposive behavior, and the perception of pain. Many symptoms reported by veterans (for example, headache, muscle, or joint pain) are difficult to study in standard neurotoxi- cological tests in animals (OTA 1990). For its evaluation and categorization of the degree of association between each exposure and a human health effect, however, the committee only used evidence from human studies. Nevertheless, the committee did use nonhuman studies as the basis for judgments about biologic plausibility, which is one of the criteria for establishing causation (see below). Epidemiologic Studies Epidemiologic studies examine the relationship between exposures to agents of interest in a human population and the health effects seen in that population. The challenge of epidemiologic studies is to isolate the risk factors that contribute to health effects in populations that are inherently uncontrollable in the experimental sense; therefore, statistical techniques are used to take into account factors such as bias and confounding. Such studies can be used to generate hypotheses for future study or to test hypotheses posed in advance by investigators. A principal objective of epidemiology is to understand whether exposures to specific agents are associated with disease or other health effects and, with additional available information, to decide whether such associations are causal. Although they are frequently used synonymously by the general public, the terms âassociationâ and âcausationâ have distinct meanings (Alpert and Goldberg 2007). Epidemiologic studies can establish statistical associations between exposures and health effects; associations are generally estimated by using relative risks or odds ratios. To conclude that an association exists, it is necessary for the exposure to be followed by the health effect more frequently than it would be expected to by chance alone. Furthermore, it is almost always necessary to find that the effect occurs consistently in several studies. Epidemiologists seldom consider a single study sufficient to establish an association; rather, it is desirable to replicate the findings in other studies to draw conclusions about the association. Results of separate studies are sometimes conflicting. It is sometimes possible to attribute discordant study results to differences in such characteristics as soundness of study design, quality of execution, and the influence of different forms of bias. Studies that result in a tight confidence interval around a statistically significant relative risk of association suggest that the observed result was unlikely to be due to chance. When the measure of association does not show a statistically significant
20 GULF WAR AND HEALTH effect, it is important to consider the size of the sample and whether the study had the power to detect an effect of a given size. Epidemiologic study designs differ in their ability to provide valid estimates of an association (Ellwood 1998). An important issue is that the studies reviewed by the committee were seldom designed to answer the question in the committeeâs charge, that is, does exposure to deployment-related stress result in long-term adverse health and psychosocial effects. Cross- sectional studies generally provide a lower level of evidence than cohort and case-control studies. Determining whether a given statistical association rises to the level of causation requires inference (Hill 1965). As discussed by the International Agency for Research on Cancer in the preamble of its monographs evaluating cancer risks (for example, IARC 2004), a strong association is demonstrated by repeated observations in a number of different studies, specificity of effects, and an increased risk of disease with increasing exposure or a decline in risk after cessation of exposure. Those characteristics all strengthen the likelihood that an association seen in epidemiologic studies is a causal effect. Inferences from epidemiologic studies, however, are often limited to population or ecologic associations because of a lack of individual exposure information. Exposures are rarely, if ever, controlled in epidemiologic studies, and in most cases there is large uncertainty in the assessment of exposure. To assess whether explanations other than causality are responsible for an observed association, one must bring together evidence from different studies and apply well-established criteria, which have been refined over more than a century (Evans 1976; Hill 1965; Susser 1973, 1977, 1988, 1991; Wegman et al. 1997). For a review of those criteria, see the 2004 report of the U.S. Surgeon General (Office of the Surgeon General-HHS 2004). When examining the available epidemiologic studies, the committee addressed the question, âDoes the available evidence support a causal relationship or an association between exposure (deployment to a war zone) and a health effect?â Even a causal relationship between deployment and a specific health effect would not mean that deployment invariably results in the health effect or that all cases of the effect are the result of deployment. Such complete correspondence between exposure and disease is the exception in large populations (IOM 1994). The committee evaluated the data and based its conclusions on the strength and coherence of the data in the selected epidemiologic studies that met its inclusion criteria. The major types of epidemiologic studies discussed in this chapter are cohort, case-control, and cross-sectional studies. Cohort Studies A cohort, or longitudinal, study follows a defined group, or cohort, over time. It can test hypotheses about whether an exposure to a specific stressor is related to the development of a health effect and can examine multiple health effects that may be associated with exposure to a given stressor. A cohort study starts by classifying study participants according to whether or not they have been exposed to the stressor under study, in this case deployment to a war zone. A cohort study compares health effects in individuals who have been exposed to the stressor in question with those without the exposure. Such a comparison can be used to estimate a risk difference or a relative risk, two statistics that measure association. The risk difference is the rate of disease or health effect in exposed persons minus the rate in unexposed persons. A value greater than zero (H0 = 0.0) implies that extra cases of disease or health effect are associated with the exposure. The relative risk or risk ratio is determined by dividing the rate of developing the disease in the exposed group by the rate in the nonexposed group. A relative risk greater than 1
CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE 21 (H0 = 1.0) suggests a positive association between the stressor and the health effect. The higher the relative risk, the stronger is the association. One major advantage of a cohort study is the ability of the investigator to define the exposure classification of subjects at the beginning of the study. This classification in prospective cohort studies is not influenced by the presence of a health effect because the health effect has yet to occur, which reduces an important source of potential bias known as selection bias (see later discussion). As explained in the section on case-control studies, when it is possible to measure a confounding factor,1 the investigator can apply statistical methods to minimize its influence on the results. An advantage of a cohort study is that it is possible to calculate absolute rates of disease incidence.2 A final advantage, especially over cross-sectional studies (discussed below), is that it may be possible to adjust each subjectâs followup health status for baseline health status so that the person acts as his or her own control vs defining a group as âdisease- freeâ; that may reduce a source of variation and increase the power to detect effects. The disadvantages of cohort studies are high costs as a result of a large study population and prolonged periods of followup (especially if the health effect is rare), attrition of study subjects, and delay in obtaining results. A prospective cohort study selects subjects on the basis of exposure (or lack of it) and follows the cohort into the future to determine the rate at which the health effect develops. A retrospective (or historical) cohort study differs from a prospective study in terms of temporal direction; the investigator traces back in time to classify past exposures in the cohort and then tracks the cohort forward in time to ascertain the rate of the health effect. They often focus on disease mortality rates because of the relative ease of determining vital status of individuals and the availability of death certificates to determine the cause of death. For comparison purposes, cohort studies often use general population mortality or morbidity rates (age, sex, race, time, and cause-specific) because it may be difficult to identify a suitable control group of unexposed people. The observed number of deaths or illness among a group (from a specific cause such as lung cancer) is compared with the expected number of deaths or illness. The ratio of observed to expected deaths produces a standardized mortality ratio. However, for several of the studies cited in Chapter 6, a standardized morbidity ratio (SMR) for illness is used instead, as death is not the effect of interest. An SMR greater than 1.0 generally suggests an elevated risk of illness in the exposed group. The major problem with using general population rates for comparison with military cohorts is the âhealthy-warrior effect,â which arises when a military population experiences a lower mortality or morbidity rate than the general population, which consists of a mix of healthy and unhealthy people. The military has physical health criteria that personnel must meet when they enter the military and while they are on active duty. Case-Control Studies In a case-control study, subjects (cases) are selected on the basis of having a health effect; controls are selected on the basis of not having the health effect. Cases and controls are asked about their exposures to specific agents. Cases and controls can be matched with regard to such 1 A potential confounding factor is a variable that is associated with the health effect and may affect the results of the study because it is distributed differently in the exposed and nonexposed groups. 2 Incidence is the rate of occurrence of new cases of an illness or disease in a given population during a specified period. Prevalence is the number of cases of an illness or disease existing in a given population at a specific point or period.
22 GULF WAR AND HEALTH characteristics as age, sex, and socioeconomic status to eliminate those characteristics as causes of observed differences, or those variables can be controlled in the analysis. The odds of exposure to the agent among the cases are then compared with the odds of exposure among controls. The comparison generates an odds ratio, which is a statistic that depicts the odds of having a health effect among those exposed to the stressor relative to the odds of having the health effect among an unexposed comparison group. An odds ratio greater than 1 indicates that there is a potential association between exposure to the stressor and the health effect; the greater the odds ratio, the greater the association. Case-control studies are useful for testing hypotheses about the relationships between exposure to specific stressors and a health effect. They are especially useful and efficient for studying the etiology of rare effects. Case-control studies have the advantages of ease, speed, and relatively low cost. They are also valuable for their ability to probe multiple exposures or risk factors. However, case-control studies are vulnerable to several types of bias, such as recall bias, which can dilute or enhance associations between a health effect and exposure. Other problems include identifying representative groups of cases, choosing suitable controls, and collecting comparable information about exposures on both cases and controls. Those problems might lead to unidentified confounding variables that differentially influence the selection of cases or control subjects or the detection of exposure. For the reasons discussed above, case-control studies are often the first approach to testing a hypothesis about factors contributing to a specific health effect, especially a rare one. A nested case-control study draws cases and controls from a previously defined cohort. Thus, it is said to be ânestedâ inside a cohort study. Baseline data are collected at the time that the cohort is identified, which ensures a more uniform set of data on cases and controls. Within the cohort, individuals identified with a health effect serve as cases, and a sample of those who are effect-free serve as controls. Using baseline data, exposure in cases and controls is compared, as in a regular case-control study. Nested case-control studies are efficient in terms of time and cost in reconstructing exposure histories on cases and on only a sample of controls rather than the entire cohort. Additionally, because the cases and controls come from the same previously established cohort, concerns about unmeasured confounders and selection bias are decreased. Cross-Sectional Studies The main differentiating feature of a cross-sectional study is that exposure and health effect information is collected at the same time. The selection of people for the studyâunlike selection for cohort and case-control studiesâis independent of both the exposure to the stressor and health effect characteristics. Cross-sectional studies seek to uncover potential associations between exposure to a specific stressor and development of a health effect. In a cross-sectional study, effect size is measured as relative risk, prevalence ratio, or prevalence odds ratio. It might compare health effect or symptom rates between groups with and without exposure to the specific stressor. Many health studies of Gulf War veterans are cross-sectional studies that compare a sample of veterans who were deployed to the Gulf War with a sample of veterans who served during the same period but were not deployed to the Gulf War. Cross-sectional studies are easier and less expensive to perform than cohort studies and can identify the prevalence of health effects and exposures in a defined population. They are useful for generating hypotheses, but they are much less useful for determining cause-effect relationships, because effect and exposure data are collected at the same time (Monson 1990). It might also be difficult to determine the temporal sequence of exposures and symptoms or effect.
CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE 23 INCLUSION CRITERIA The committeeâs next step, after securing the full text of about 3000 studies, was to determine which studies would be included in the review as primary or support studies. For a study to be included in the committeeâs review, it had to meet these criteria: methodologic rigor, use of an appropriate control population, specificity of health effect, and an indicator of exposure to the stressor (deployment to a war zone). Studies that met the committeeâs criteria are referred to as primary studies. The committee focused on long-term health effects that persisted after deployment to a war zone ended. Studies reviewed by the committee that did not necessarily meet all the criteria of a primary study are considered secondary studies. Secondary studies are typically less rigorous in their methods; for example, a study might have a small sample, not include a physicianâs examination or other appropriate evaluation method, or rely only on veteransâ self-reports of symptoms or health effect using a mailed questionnaire. The committee used those types of studies to support its findings based on primary studies. Methodologic Rigor The study had to be a published in a peer-reviewed journal, had to include details of its methodology, had to include a control or reference group, had to have a sample size of at least 100, had to have statistical power to detect effects, and had to include reasonable adjustment for confounders. Case studies and case series were generally excluded from the committeeâs consideration. Health-Effect Assessment For a study to be considered primary, it had to have information regarding a specific health effect and exposure information. The committee preferred studies that had an independent assessment of a health effect rather than self-reports of a health effect or self-report of a physicianâs diagnosis. The health effect must have been diagnosed or confirmed by a clinical evaluation, a specific laboratory test, hospital record, or other medical record; for psychiatric outcomes, standardized interviews were necessary, such as the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV-TR, the Diagnostic Interview Schedule, and the Composite International Diagnostic Interview; for psychosocial effects, the effect needed to be obtained through the use of a validated or well-recognized instrument or the data obtained from databases maintained by a government agency or other appropriate organization. Primary studies are studies of veterans deployed to a war zone compared with their nondeployed counterparts or studies that evaluated health effects of veterans with deployment-related or combat-related PTSD. Furthermore, a study had to examine long-term rather than immediate or transitory outcomes. For example, the committee considered veterans with PTSD but not with acute stress disorder for which symptoms of disturbance last no longer than a month (APA 2000).
24 GULF WAR AND HEALTH ADDITIONAL CONSIDERATIONS In addition to determining the primary and secondary literature that would be used to draw conclusions, the committee considered other characteristics of the studies related to the methods used by researchers in their design and conduct. Bias Bias refers to systematic, or nonrandom, error. Bias causes an observed value to deviate from the true value, and could weaken an association or generate a spurious association. Because all studies are susceptible to bias, a goal of the research design is to minimize bias or to adjust the observed value of an association by correcting for bias. There are different types of bias, such as selection bias and information bias. Selection bias occurs when systematic error in obtaining study participants results in a potential distortion of the true association between exposure and outcome. Information bias results from the manner in which data are collected and can result from measurement errors, imprecise measurement, and misdiagnosis. Those types of errors might be uniform in an entire study population or might affect some parts of the population more than others. Information bias might result from misclassification of study subjects with respect to the outcome variable or from misclassification of exposure. Other common sources of information bias are the inability of study subjects to recall the circumstances of their exposure accurately (recall bias) and the likelihood that one group more frequently reports what it remembers than another group (reporting bias). Information bias is especially harmful in interpreting study results when it affects one comparison group more than another. Confounding Confounding occurs when a variable or characteristic otherwise known to be predictive of an effect and associated with the exposure (and not on the causal pathway) can account for part or all of an apparent association. A confounding variable is an uncontrolled variable that influences the outcome of a study to an unknown extent, and makes precise evaluation of its effects impossible. Examples of confounders are age, sex, smoking, and pre-existing illness. Carefully applied statistical adjustments can often control for or reduce the influence of a confounder. Random Error A false positive (type-one error) occurs when routine statistical variation leads to an apparent association between an exposure to a stressor and a health effect when no association is present. This happens when the observed result of a study falls in the tail of the probability distribution hypothesized to describe the process being studied. Standard statistical methods, such as p-values and confidence intervals, allow one to assess the likelihood that random error due to sampling is responsible for positive findings. Replication of a positive finding in additional studies demonstrates that it is not simply a false positive, but does not guard against the same biases and confounders distorting the results if the studiesâ designs are the same. Consistent results in multiple studies with different designs, and hence vulnerabilities to different confounders and sources of bias, increases confidence that the observed relationship is real and
CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE 25 helps to rule out the possibility that the positive results are due to random error, bias, or confounders. CONSIDERATIONS IN ASSESSING THE STRENGTH OF EVIDENCE The committeeâs process of reaching conclusions about deployment to a war zone and its potential for adverse health effects was collective and interactive. Once a study was included in this review because it met the committeeâs criteria, there were several considerations in assessing the strength of associations. They were patterned after those introduced by Hill (1971) and include presence of a temporal relationship, strength of the estimated association, presence of a dose-response relationship, consistency of the association, and biologic plausibility. Temporal Relationship If an observed association is real, exposure must precede the onset of disease by at least the duration of health effect induction. The committee considered whether a health effect occurred within a period after deployment that was consistent with current understanding of the natural history of the health effect. The committee interpreted the lack of an appropriate time sequence as evidence against association but recognized that insufficient knowledge about the natural history and pathogenesis of many of the health effects under review limited the utility of this consideration. Without a temporal relationship being established between exposure and outcome, other evidence to support the association becomes useless. Strength of an Association The strength of an association is usually expressed as the magnitude of the measure of effect, for example, relative risk or odds ratio. Generally, the higher the relative risk, the greater the likelihood that the exposure-effect association is causal and the lower the likelihood that it is due to undetected error, bias, or confounding (discussed above). Measures of statistical significance, such as p-values, are not indicators of the strength of an association. Small increases in relative risks that are consistent among studies, however, might be evidence of an association, whereas some forms of extreme bias or confounding can produce a high relative risk. The statistical power of a study was important for it had to be able to detect effects of an unspecified magnitude, especially important for negative results. This factor explains the committeeâs inclusion criteria regarding statistical power. Dose-Response Relationship The existence of a dose-response relationshipâthat is, an increased strength of association with increasing intensity or duration of exposure or other appropriate relationâ strengthens an inference that an association is real. However, the lack of an apparent dose- response relationship does not rule out an association. If the relative degree of exposure among several studies can be determined, indirect evidence of a dose-response relationship may exist. For example, if studies of presumably low-exposure cohorts show only mild increases in risk whereas studies of presumably high-exposure cohorts show larger increases in risk, the pattern would be consistent with a dose-response relationship.
26 GULF WAR AND HEALTH Consistency of Association A consistent association requires that the association be found regularly in a variety of studies, for example, in more than one study population and with different study methods. However, consistency alone is not sufficient evidence of an association. The committee considered findings that were consistent in direction among studies of different designs to be supportive of an association. It did not require exactly the same magnitude of association in different populations to conclude that there was a consistent association. A consistent association could occur when the results of most studies were positive and the variations in measured effects were within the range expected on the basis of sampling error, selection bias, and confounding. Thus, for a health effect to be considered associated with deployment there had to be corroboration, that is, replication of findings among studies and populations. The degree to which an effect could be consistently reproduced gave the committee confidence that they were observing a true effect. Specificity of Association Specificity of association is the degree to which exposure to a given stressor predicts a particular outcome. A positive finding is more convincing of causality when the association between the exposure and the health effect is specific to one or both than when the association is nonspecific to the exposure and the health effect. The committee recognized, however, that one- to-one specificity is not to be expected, given the multifactorial etiology of many of the health effects under examination. Biologic Plausibility Biologic plausibility reflects knowledge of the biologic mechanism(s) by which an agent could lead to a health outcome. That knowledge comes through mechanism-of-action or other studies in pharmacology, physiology, and other fieldsâtypically in studies of animals. A biologically plausible mechanism may not be known when an association is first documented. Biologic plausibility was required by the committee only in drawing a conclusion of âsufficient evidence of a causal associationâ (see below); for the other categories of association, it was not necessary to demonstrate a biologically plausible mechanism. CATEGORIES OF ASSOCIATION The committee attempted to express its judgment about the available data as clearly and precisely as possible. The committee agreed to use the categories of association that have been established and used by previous Committees on Gulf War and Health and other Institute of Medicine (IOM) committees that have evaluated vaccine safety, effects of herbicides used in Vietnam, and indoor pollutants related to asthma (IOM 2000, 2003, 2005, 2006, 2007). These categories of association have gained wide acceptance over more than a decade by Congress, government agencies (particularly the Department of Veterans Affairs [VA]), researchers, and veterans groups. The five categories below describe different levels of association and sound a recurring theme: the validity of an association is likely to vary to the extent to which common sources of errorâchance variation and bias, including confoundingâcould be ruled out as the reason for
CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE 27 the observed association. Accordingly, the criteria for each category express a degree of confidence based on the extent to which sources of error were reduced. The committee discussed the evidence and reached consensus on the categorization of that evidence for each health and psychosocial effect in Chapters 6 and 7, respectively. The committee was conservative in its judgment of the evidence as the quantity and quality of studies used for the determination of association for each health effect varied considerably, but in each case, the minimum requirements for the specific category of association were met. Sufficient Evidence of a Causal Relationship Evidence is sufficient to conclude that there is a causal relationship between exposure to deployment-related stress and a specific health effect in humans, and the evidence is supported by experimental data on humans or animals. The evidence fulfills the guidelines for sufficient evidence of an association (below) and satisfies several of the guidelines used to assess causality: strength of association, dose-response relationship, consistency of association, and a temporal relationship. Sufficient Evidence of an Association Evidence is sufficient to conclude that there is an association; that is, a consistent association has been observed between exposure to deployment-related stress and a specific health effect in human studies in which chance and bias, including confounding, could be ruled out with reasonable confidence as an explanation for the observed association. For example, several high-quality studies report consistent associations, and the studies are sufficiently free of bias, including adequate control for confounding. Limited but Suggestive Evidence of an Association Evidence is suggestive of an association between exposure to deployment-related stress and a specific health effect in human studies, but the body of evidence is limited by the inability to rule out chance and bias, including confounding, with confidence. At least one high-quality3 study reports a positive association that is sufficiently free of bias, including adequate control for confounding, and corroborating studies provide support for the association but are not sufficiently free of bias, including confounding. Alternatively, several studies of lower quality might show a consistent association with results that are probably not due to bias, including confounding.4 Inadequate/Insufficient Evidence to Determine Whether an Association Exists Evidence is of insufficient quantity, quality, or consistency to permit a conclusion regarding the existence of an association between exposure to a specific agent and a specific health effect in humans. 3 Factors used to characterize high-quality studies include the statistical stability of the association, whether a dose- response relationship or other trend was demonstrated, and the quality of the assessments of exposure and effect. 4 Factors used to make this judgment include the data on the relationship between potential confounders and related health effects in a given study, information on subject selection, and classification of exposure.
28 GULF WAR AND HEALTH Limited/Suggestive Evidence of No Association Evidence from well-conducted studies is consistent in not showing a positive association between exposure to a specific agent and a specific health effect after exposure of any magnitude. A conclusion of no association is inevitably limited to the conditions, magnitudes of exposure, and length of observation in the available studies. The possibility of a very small increase in risk after exposure studied cannot be excluded. LIMITATIONS OF VETERAN STUDIES A major limitation of the studies reviewed by the committee is that few of them measured combat exposure. Even in the studies that did assess combat exposure with questionnaires or scales, the researchers usually asked only whether the exposure occurred rather than attempting to determine the degree to which the veteran may have found the experience stressful. Furthermore, few studies attempted to determine the effects of repeated or combined exposures, for example, exposure to extreme heat, wearing of chemical protective gear, and shooting at the enemy. Another limitation in many of the studies was their retrospective design, which resulted in an inability to distinguish whether health effects existed before or were consequences of deployment. Some studies used self-reports from questionnaires to assess health effects, exposure, and the presence of risk or protective factors. Such questionnaires can lead to recall bias with regard to exposures or inaccuracies in reporting health effects. For those reasons, the committee weighted more heavily studies that included an examination by a health professional or other appropriate evaluation method. Similarly, for psychiatric disorders, such as PTSD, those studies that relied on symptom checklists to indicate the presence of disorders were weighted less heavily than those that involved a diagnostic interview by a health professional. Many studies had a selection bias in that health effects were assessed in veterans who were in treatment groups, such as inpatients or outpatients at PTSD clinics, or were selected from registries of veterans established by VA. In addition, sufficient time might not have passed since deployment to detect the development of some health outcomes, for example, cancer or heart disease particularly in Gulf War, OEF, and OIF veterans. SUMMARY The committee reviewed and evaluated studies from the scientific and medical literature that were identified with searches of bibliographic databases and other methods. The committee adopted a policy of using only peer-reviewed published literature as the basis of its conclusions. Publications that were not peer-reviewed were given no evidentiary value by the committee. The committee came to its conclusions using the categories of association used by previous IOM committees and widely accepted by Congress, the VA, and veteransâ service organizations. Committee members read each article critically. In some instances, non-peer-reviewed publications provided background information for the committee and raised issues that required further research. The committee, however, did not collect original data, nor did it perform any secondary data analysis. In its evaluation of the peer-reviewed literature, the committee
CONSIDERATIONS IN IDENTIFYING AND EVALUATING THE LITERATURE 29 considered several important issues, including quality and relevance; error, bias, and confounding; and the diverse nature of the evidence and the research. REFERENCES Alpert JS, Goldberg RJ. 2007. Dear patient: Association is not synonymous with causality. American Journal of Medicine 120(8):649-650. APA (American Psychiatric Association). 2000. Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision. Washington, DC: American Psychiatric Publishing Association. Ellwood JM. 1998. Critical Appraisal of Epidemiological Studies and Clinical Trials, 2nd Edition. Oxford, UK: Oxford University Press. Evans AS. 1976. Causation and disease: The Henle-Koch postulates revisited. Yale Journal of Biology and Medicine 49(2):175-195. Hill AB. 1965. The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine 58:295-300. Hill AB. 1971. Principles of Medical Statistics. New York: Oxford University Press. IARC (International Agency for Research Cancer). 2004. Tobacco Smoke and Involuntary Smoking. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Lyon, France: International Agency for Research on Cancer. IOM (Institute of Medicine). 1994. Veterans and Agent Orange: Health Effects of Herbicides Used in Vietnam. Washington, DC: National Academy Press. IOM. 2000. Gulf War and Health, Volume 1: Depleted Uranium, Pyridostigmine Bromide, Sarin, Vaccines. Washington, DC: National Academy Press. IOM. 2003. Gulf War and Health, Volume 2: Insecticides and Solvents. Washington, DC: The National Academies Press. IOM. 2005. Gulf War and Health, Volume 3: Fuels, Combustion Products, and Propellants. Washington, DC: The National Academies Press. IOM. 2006. Gulf War and Health, Volume 4: Health Effects of Serving in the Gulf War. Washington, DC: The National Academies Press. IOM. 2007. Gulf War and Health, Volume 5: Infectious Diseases. Washington, DC: The National Academies Press. Monson R. 1990. Occupational Epidemiology, 2nd Edition. Boca Raton, FL: CRC Press. NRC (National Research Council). 1991. Animals as Sentinels of Environmental Health Hazards. Washington, DC: National Academy Press. Office of the Surgeon General-HHS. 2004. The Health Consequences of Smoking: A Report of the Surgeon General. [Online]. Available: http://www.surgeongeneral.gov/library/smokingconsequences [accessed October 26, 2004]. OTA (Office of Technology Assessment). 1990. Neurotoxicity: Identifying and Controlling Poisons of the Nervous System. Washington, DC: U.S. Government Printing Office. Susser M. 1973. Casual Thinking in the Health Sciences: Concepts and Strategies of Epidemiology. New York: Oxford University Press.
30 GULF WAR AND HEALTH Susser M. 1977. Judgment and causal inference: Criteria in epidemiologic studies. American Journal of Epidemiology 105(1):1-15. Susser M. 1988. Falsification, verification, and causal inference in epidemiology: Reconsideration in the light of Sir Karl Popperâs philosophy. Rothman KJ, editor. Causal Inference. Chesnut Hill, MA: Epidemiology Resources. Pp. 33-58. Susser M. 1991. What is a cause and how do we know one? A grammar for pragmatic epidemiology. American Journal of Epidemiology 133(7):635-648. Wegman DH, Woods NF, Bailar JC. 1997. Invited commentary: How would we know a Gulf War syndrome if we saw one? American Journal of Epidemiology 146(9):704-711.