TYPES OF EPIDEMIOLOGIC STUDIES
This appendix briefly describes the types of studies considered in the body of this report. Some studies enabled the committee to form judgments about the strength of an association between a putative agent and a health outcome and were used as primary studies (for example, cohort and case-control studies); others were not considered useful for that assessment and were considered support studies (for example, cross-sectional studies, case reports, and case series).
EXPERIMENTAL STUDIES IN ANIMALS: ANIMAL MODELS
Studies of laboratory animals and other nonhuman systems are essential for understanding mechanisms of action, biologic plausibility, and possible health effects when experimental research in humans is not ethically or practically possible (Cohrssen and Covello 1989; NRC 1991). Such studies permit a potentially toxic agent to be introduced under conditions controlled by the researcher, such as dose, duration, and route of exposure. Nonhuman studies are also a valuable complement to human studies of genetic susceptibility. Although nonhuman studies often focus on one agent at a time, they more easily enable the study of chemical mixtures and their potential interactions.
Research on health effects of toxic substances includes animal studies that characterize absorption, distribution, metabolism, elimination, and excretion. Animal studies may examine acute (short-term) exposures or chronic (long-term) exposures. They may focus on the mechanism of action (how a toxicant exerts its deleterious effects at the cellular and molecular levels). Mechanism-of-action (or mechanistic) studies encompass a range of laboratory approaches with whole animals and in vitro systems that use tissues or cells from humans or animals. Structure-activity relationships, in which a potential toxicant and a known toxicant are compared with respect to molecular structure and chemical and physical properties, are an important source of hypotheses about mechanisms of action.
In carrying out its charge, the committee used the results of animal and other nonhuman studies in several ways, particularly as markers of health effects that might be important for humans. If an agent, for example, is absorbed and deposited in specific tissues or organs, the committee looked closely for possible abnormalities at those sites in human studies.
One problem with animal studies is the difficulty of finding animal models that permit the study of symptoms rather than disease end points. That is particularly true when one is trying
to find an animal model for attributes that we consider peculiar to humans, such as cognition, behavior, and the perception of pain. Many symptoms reported by veterans, such as headache and muscle or joint pain, are difficult to study in standard neurotoxicologic tests in animals (OTA 1990). Another problem is that for some outcomes (for example, cancer and birth defects) animal studies may implicate a chemical as being able to cause such outcomes, but the specific outcome in animals may differ from the outcome in humans. Given the task of this committee, the results of such studies could be considered supportive but not primary evidence of an association with a specific outcome in humans.
EXPERIMENTAL STUDIES IN HUMANS: RANDOMIZED CONTROLLED TRIALS
Experimental studies in humans are the foremost means of establishing causal associations between exposure to an agent and human health outcomes. Experimental studies are used most often in the evaluation of the safety and efficacy of medications, surgical practices, biologic products, vaccines, and preventive interventions. In an experiment, the investigator assigns the agent to be studied and records the outcome. Two key features of experimental studies are prospective design and use of a control group. Randomized controlled trials are considered the gold standard in experimental studies.
In randomized controlled trials, each subject has a known probability of assignment to the test group or the control group, and the various subjects’ probabilities are often equal. Large randomized controlled trials are designed to have all possible confounding variables occur with equal frequency in the test and control groups. Blinding—shielding test subjects and controls from knowledge of their assignment—may be another aspect of randomized controlled trials.1 It is most readily accomplished when subjects in the control group receive a placebo. When both subjects and investigators are unaware of assignment, a study is said to be double-blind. The objective of blinding is to reduce bias introduced by subjects’ and investigators’ attitudes and expectations for study outcomes.
The value of randomized controlled trials has been so convincingly demonstrated that they are required for ensuring the safety and efficacy of all new medications introduced into the market in the United States. The main drawbacks of randomized controlled trials are their expense, the time needed for completion, and the common practice of systematically excluding many groups of people, which makes results less easy to generalize from.
CONTROLLED EPIDEMIOLOGIC STUDIES (OBSERVATIONAL)
In contrast with randomized controlled trials and other experimental studies in humans, most epidemiologic investigations are “observational”. That means simply that the occurrences of exposure to the putative agent and of the particular diseases or outcomes are studied as they arise in the usual course of life and not under the conditions of a planned experiment. However, through various strategies of formal comparative investigations, observational studies in populations are often “controlled”. We discuss below the different types of controlled, observational studies considered by the committee.
The cohort, or longitudinal, study is an epidemiologic study that follows a defined group, or cohort, over time. It can test hypotheses about whether exposure to a specific agent is related to the development of disease and can examine multiple disease outcomes that may be associated with exposure to a given agent. A cohort study starts with people who are free of the disease in question and classifies them according to whether they have been exposed to the agent of concern. It compares health outcomes in people who have been exposed with outcomes in those who have not. Such a comparison can be used to estimate a risk difference or a relative risk, two statistics that measure association. The risk difference is the rate of a disease in exposed persons minus the rate in nonexposed persons. It represents the absolute number of extra cases of the disease associated with the exposure. The relative risk, or risk ratio, is determined by dividing the rate of the disease in the exposed group by the rate in the nonexposed group. A relative risk greater than 1 suggests a positive association between exposure and disease onset; the higher the relative risk, the stronger the association.
A prospective cohort study selects subjects on the basis of exposure (or lack of it) and follows the cohort to determine the rate at which the disease (or other health outcome) develops. A retrospective (or historical) cohort study differs from a prospective study in terms of temporal direction; the investigator traces backward to classify past exposures in the cohort and then forward to ascertain the rate of disease.
Retrospective cohort studies commonly have been performed in occupational health. They often assess disease-related mortality because of the relative ease of determining vital status of people and the availability of death certificates to determine causes of death. For comparison purposes, cohort studies often use general population mortality (age-, sex-, race-, time-, and cause-specific) because it may be difficult to identify a comparison group of nonexposed workers. The observed number of deaths among workers from a specific cause (such as lung cancer) is compared with the expected number of deaths from that cause. The expected number is calculated by multiplying the annual mortality in the general population by the number of person-years of followup2 of the workers. The ratio of observed to expected deaths (which by convention is often multiplied by 100) is a standardized mortality ratio (SMR). An SMR greater than 100 generally suggests an increased risk of dying from the specified cause in the exposed group. Many cohort studies refine their measures of health outcomes by using an internal comparison group, which may differ from the cohort in exposure magnitude but otherwise be more similar to the cohort than the general population is.
The major problem in comparing the general population with an occupational cohort is the “healthy-worker effect” (Monson 1990); this effect arises when an employed population of generally healthy people experiences lower mortality than the general population, which consists of both healthy and unhealthy people. The healthy-worker effect is usually due to workers’ lower rates of cardiovascular and traumatic deaths. A military population that has a high rate of external traumatic causes of death (such as Gulf War veterans) may be different from many occupational populations. In calculating the SMR, the denominator (expected deaths) is derived from general population figures rather than from an otherwise comparable group of nonexposed workers (which may be unavailable). The “artificially” higher denominator for expected deaths in the general population lowers the SMR, thereby underestimating the strength of an association
between exposure to the agent and the cause of death. In other words, the healthy-worker effect introduces a bias that diminishes the true disease-exposure relationship.
To counter the influence of the healthy-worker effect, some studies divide the worker population into different groups, on the basis of magnitude of exposure to the agent being studied. Searching for dose-response relationships within the worker population itself is a way of reducing the potential bias introduced by the use of population controls. The problem is that measurements of dose may be imprecise or unavailable, particularly if the exposures occurred decades earlier. Consequently, epidemiologists often rely on job classification as a surrogate of dose. Reliance on job classification introduces the possibility of misclassification bias because the classification may not be a good proxy for the actual exposure or dose. Another problem—not only in determining job classification but especially in determining whether potential confounding exposures (see next paragraph), such as cigarette-smoking by individual workers, are present—is incompleteness of records. Bias introduced by misclassification and confounding can systematically alter study results by diluting or strengthening associations. One major advantage of a cohort study is the ability of the investigator to control the classification of subjects at the beginning of the study. Classification in prospective cohort studies is not influenced by the presence of disease; the disease has yet to occur, and this reduces an important source of potential bias known as selection bias.
A cohort study design also gives the investigator the advantage of measuring and correcting potential confounding. When it is possible to measure a confounding factor,3 the investigator can apply statistical methods to minimize its influence on the results. Another advantage of a cohort study is that it is possible to calculate absolute rates of disease incidence.4 A final advantage (especially over cross-sectional studies) is that it may be possible to adjust each subject’s followup health status according to his or her baseline health status so that the person acts as his or her own control; this may reduce a source of variation and increase the power to detect effects.
The disadvantages of cohort studies are the high costs associated with a large study population and long periods of followup (especially if the disease is rare), attrition of study subjects, and delay in obtaining results.
The case-control study is useful for testing hypotheses about the relationships between exposure to specific agents and disease. It is especially useful for studying the etiology of rare diseases. When health outcomes are infrequent or rare, longitudinal or cross-sectional studies must be large enough and last long enough to accumulate enough adverse events to support accurate estimation of the risk posed by a particular agent. In case-control studies, subjects (or cases) are selected on the basis of having a disease, and controls are selected on the basis of not having the disease. Cases and controls are then asked about their past exposures to specific agents. Cases and controls might be matched with regard to such characteristics as age, sex, and socioeconomic status to eliminate those characteristics as causes of observed differences in past exposure; alternatively, matching factors can be controlled for in the analysis. The odds of
exposure to the agent among the cases are then compared with the odds of exposure among controls. The comparison generates an odds ratio,5 a statistic that depicts the odds of having a disease among those exposed to the agent of concern relative to the odds among a nonexposed comparison group. An odds ratio greater than 1 indicates a potential association between exposure to the agent and the disease. The greater the odds ratio, the stronger the association. In short, in a case-control study, subjects are selected on the basis of disease presence, and prior exposure is then ascertained.
Case-control studies have the advantages of ease, speed, and relatively low cost. They are also advantageous for their ability to probe multiple exposures or risk factors. However, case-control studies are vulnerable to several types of bias, including recall bias. Other disadvantages are the need to identify representative groups of cases, the need to choose suitable controls, and the need to collect comparable information on exposure of both cases and controls. Those disadvantages might lead to unidentified confounding variables that differentially influence the selection of cases or control subjects or the detection of exposure. Case-control studies are often the first, but not the definitive, approach to testing a hypothesis.
In a cross-sectional study, the population of interest is surveyed at one time. Information about health conditions and exposures to various agents, either present or past, is collected simultaneously. The selection of people to enter the study—in contrast with cohort and case-control studies—is independent of the exposure to the agent under study and independent of disease characteristics. Cross-sectional studies seek to uncover potential associations between exposure to specific agents and development of disease. They may compare disease or symptom rates between groups with and without the exposure to the specific agent or compare exposure to the specific agent between groups with and without the disease. Although cross-sectional studies need not have control groups, studies with control groups are methodologically more sound. Several health studies of Gulf War veterans are controlled cross-sectional surveys that compare a sample of veterans previously deployed to the Gulf War with a sample of veterans who served during the same period but were not deployed to the Gulf War.
Cross-sectional surveys are easier to perform and less expensive to implement than cohort studies. Cross-sectional surveys can identify the prevalence of diseases and exposures in a defined population. They are useful for generating hypotheses. However, they are much less useful for determining cause-effect relationships, because disease and exposure data are collected simultaneously and may be self-reported (Monson 1990). It may also be difficult to determine the temporal sequence of exposure and symptoms or disease.
Case Reports and Case Series
A case report is generally a detailed description of a patient’s illness reported by a clinician; the clinician may suspect that the illness is the result of exposure to a specific biologic or chemical agent. Subjects in a case series have the same or a similar disease and experienced identical or similar exposures to a specific agent. Case reports and case series provide information for generating hypotheses about exposure and disease relationships. For Gulf War veterans, registry programs established by the Department of Veterans Affairs and the
Department of Defense constitute a type of voluntary case series. Any veteran may come forward to receive a clinical examination and a referral for treatment. Because of their documentation of veterans’ symptoms and diagnoses, the registries have been valuable in generating hypotheses; but they are not designed for hypothesis-testing or for establishing the prevalence of disease or specific exposures among Gulf War veterans.
The value of case reports and case series is that they can document possible associations between environmental exposures and particular health outcomes. In some situations, they may be useful in suggesting causal relationships if a disease is rare and has a close temporal relationship to an exposure (Kramer and Lane 1992). However, case reports and case series do not have control groups. Because case series are not population-based, many cases caused by an exposure go unreported, and the prevalence of cases may be lower than that in the population at large. Furthermore, the cases may not have been caused by exposure to the specific agent.
Information from Death Certificates
Studies that use mortality data obtained from death certificates tend to be conducted among occupational groups. Death certificates can provide information on occupation, as well as cause of death; occupations listed are used by investigators as surrogates of potential exposures. However, using data from death certificates raises several concerns because an occupation might be listed incorrectly or without actual knowledge of the person’s job. That would lead to exposure misclassification and uncertainty regarding the associations reported. In many cases, a biologic or chemical exposure is underestimated if it is based on death-certificate information, and the magnitude of the association under consideration might be overestimated or underestimated. Finally, depending on the disease, studies that use a death-certificate cause of death may underestimate disease prevalence and may misclassify disease outcomes. The information from death certificates, if used by the committee, might provide supportive evidence.
COMMENTS ON THE NATURE OF THE GULF WAR STUDIES
Most studies of Gulf War veterans’ health have been cross-sectional and have been conducted years after the war. Few studies included clinical examinations or laboratory tests to verify outcomes. Almost all used questionnaires to identify a broad array of agents to which the veterans may have been exposed (often 10–20 agents per study), and symptoms (often 25–100) appear in a checklist format. Questionnaire studies—using a host of self-reported symptoms and exposures—have limitations for drawing inferences about symptom-exposure relationships. Exposure questionnaires were often general and rarely asked about duration, degree, or frequency of exposure.
Most of the studies were designed to detect the nature and prevalence of veterans’ symptoms and illnesses and whether they constituted a new syndrome rather than specifically to assess the effects of exposure to agents of interest to the committee.
Because the studies were generally cross-sectional, they limit opportunities to learn about symptom duration and latency of onset (IOM 2000). They were especially subject to recall bias: veterans who develop symptoms might be more likely than asymptomatic veterans to recall exposures to particular agents
Several approaches were taken to combine reported symptoms with outcome variables. One was to use a statistical method called factor analysis to uncover an underlying structure in reported symptoms (Cherry et al. 2001; Fukuda et al. 1998; Haley and Kurt 1997). A second approach attempted to match symptoms in some way to previously defined syndromes or illnesses (Iowa Persian Gulf Study Group 1997; Nisenbaum et al. 2000; Unwin et al. 1999). In some cases, symptoms were assembled into established syndromes on the basis of criteria devised by the investigators. Other studies did not attempt a synthesis of any sort but searched for associations between exposures to various agents during the Gulf War and individual symptoms.
Another limitation of studies of Gulf War veterans is the problem of multiple comparisons between exposure to numerous agents and health outcomes. When investigators examine a large number of exposure-symptom associations, the chances of reporting a spurious association as statistically significant are increased. Some Gulf War studies took a variety of statistical approaches to adjust for multiple comparisons. However, many did not account for multiple comparisons and reported any association with a p value of 0.05 or less as statistically significant. In some cases, the investigators indicated that they did not adjust for multiple comparisons, because of the exploratory nature of the study and because of their desire to reduce the probability of not finding a true association. Other investigators were more conservative and set a more stringent significance level to reduce the probability of error (Cherry et al. 2001; Haley and Kurt 1997; White et al. 2001).
Many studies noted that exposures to different agents were associated with the health outcomes they measured. Only one study attempted to examine the association between specific agents and specific health outcomes, and it found them to be strongly correlated (Cherry et al. 2001); the interrelationships might reflect information bias and might constitute an important limitation of the study. Thus, although the committee considered the body of evidence on Gulf War veterans, in many instances the studies supported findings rather than providing primary evidence.
Cherry N, Creed F, Silman A, Dunn G, Baxter D, Smedley J, Taylor S, Macfarlane GJ. 2001. Health and exposures of United Kingdom Gulf war veterans. Part I: The pattern and extent of ill health. Occupational and Environmental Medicine 58(5):291–298.
Cohrssen J, Covello V. 1989. Risk analysis: A Guide to Principles and Methods for Analyzing Health and Environmental Risks. Washington, DC: Council on Environmental Quality, Executive Office of the President.
Fukuda K, Nisenbaum R, Stewart G, Thompson WW, Robin L, Washko RM, Noah DL, Barrett DH, Randall B, Herwaldt BL, Mawle AC, Reeves WC. 1998. Chronic multisymptom illness affecting Air Force veterans of the Gulf War. Journal of the American Medical Association 280(11):981–988.
Haley RW, Kurt TL. 1997. Self-reported exposure to neurotoxic chemical combinations in the Gulf War. A cross-sectional epidemiologic study. Journal of the American Medical Association 277(3):231–237.
IOM (Institute of Medicine). 2000. Gulf War and Health, Volume 1: Depleted Uranium, Sarin, Pyridostigmine Bromide, Vaccines. Washington, DC: National Academy Press.
Iowa Persian Gulf Study Group. 1997. Self-reported illness and health status among Gulf War veterans. A population-based study. The Iowa Persian Gulf Study Group. Journal of the American Medical Association 277(3):238–245.
Kramer M, Lane D. 1992. Causal propositions in clinical research and practice. Journal of Clinical Epidemiology 45(6):639–649.
Monson R. 1990. Occupational Epidemiology. 2nd edition. Boca Raton, FL: CRC Press, Inc.
Nisenbaum R, Barrett DH, Reyes M, Reeves WC. 2000. Deployment stressors and a chronic multisymptom illness among Gulf War veterans. Journal of Nervous and Mental Disease 188(5):259–266.
NRC (National Research Council). 1991. Animals as Sentinels of Environmental Health Hazards. Washington, DC: National Academy Press.
OTA (Office of Technology Assessment). 1990. Neurotoxicity: Identifying and Controlling Poisons of the Nervous System. OTA-BA-436. Washington, DC: US Government Printing Office.
Unwin C, Blatchley N, Coker W, Ferry S, Hotopf M, Hull L, Ismail K, Palmer I, David A, Wessely S. 1999. Health of UK servicemen who served in Persian Gulf War. Lancet 353:169–178.
White RF, Proctor SP, Heeren T, Wolfe J, Krengel M, Vasterling J, Lindem K, Heaton KJ, Sutker P, Ozonoff DM. 2001. Neuropsychological function in Gulf War veterans: Relationships to self-reported toxicant exposures. American Journal of Industrial Medicine 40(1):42–54.