Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 159
APPENDIX C
Epidemiology Primer
Begin with a question such as "Does exposure X cause disease Y?" The
premise of epidemiology is so deceptively simple that it can be described in two
sentences:
· Scientists compare two groups of people that are alike in all ways except
that one group was exposed to X and the other group was not.
· If more people in the exposed group than in the other group have the dis-
ease, Y. scientists have an epidemiologic clue that exposure X may be harmful.
(Note: We have not proven that X causes Y; we have shown that in this sample
X and Y occur together more often than we would have expected them to by
chance.)
What, however, takes scores of technical textbooks and fuels ongoing de-
bates are the "how to" and "what if," "buts," "on the other hands," and
"howevers" that make all the difference between error-laden, error-tinged, and
accurate study results. In the next few pages, we describe several known pitfalls
and techniques for avoiding them. That should provide a basic background to
enable non-technically oriented readers to dig into this report.
Epidemiology is the study of the distribution and determinants of disease
and its effects (e.g., death) in human populations. While examining data, rather
than people (as in clinical research) or animals or chemicals (as in laboratory
research), epidemiologic analyses seek to understand causation. Epidemiology
attempts to tease out the relationships between factors be they characteristics
of people (e.g., age, race, sex), or their work (tension-filled or relaxed, indoors
This text is excerpted from Mortality of Veteran Participants in the CROSSROADS
Nuclear Test, Johnson et al. (1996~.
159
OCR for page 160
160
THE FIVE SERIES STUDY
or outdoors) or home (sufficient or insufficient food, shelter, and social support)
environments; characteristics of potentially harmful factors (viruses, poverty,
metabolic disturbances, high cholesterol, or radiation) or beneficial factors (in-
cluding new medication, surgery, medical devices, health education, income,
and housing); or measures of health status (mortality rates, cholesterol levels, or
disease incidence). Notice that one factor can be at once a characteristic, risk
factor, and outcome. A key distinction between epidemiologic and experimental
data is that epidemiologic studies usually are not designed experiments with
purebred animal subjects randomized to be exposed or not exposed. Rather, one
makes use of exposure situations that have occurred for various reasons to learn
what one can. This is essential in situations such as the study of CROSSROADS
participation where a randomized design is impossible retrospectively.
It is important to understand that while epidemiology seeks to understand
causal pathways, it cannot prove causation. Epidemiology uses judgment, statis-
tics, and skepticism to reach descriptions and interpretations of relationships
and associations. It is both a practical technique and an intellectual framework
for considering the possibilities of causal relationships. It is the approach we
have taken in this study.
Epidemiologists compare groups. The key to making sound comparisons is
in choosing groups that are alike in all ways except for the matter being studied.
This selection of comparison groups is where the science, mathematics, and art
of good epidemiology are blended. For example, because age and sex are asso-
ciated with health risks and conditions, data regarding age and sex are collected,
making it possible in the analysis to either compare like age distributions and
sexes or statistically adjust the data to account for known differences.
CHOICE OF COMPARISON GROUP
In studying CROSSROADS participants, comparison group options
include
the development of a specific control group, internal comparisons by level of ex
posure, and use of national statistics. Each carries useful and restrictive elements.
If, for example, one wants to study the effect of something on lung cancer,
knowing what we do about cigarette smoking and lung cancer, we would want
to pick two groups to compare that do not differ in smoking practices, for that
difference could mask the true causal relationship we are looking to explore. In
studies of military participants, it helps to use a reference group that is also
military. After checking age and sex, we rest a bit more comfortably that the two
groups are rather likely to be similar on a host of unmeasured characteristics-
such as smoking behavior. If, however, we chanced to compare the woodwind
section of the Navy band (good breathers) with an average group of smokers, we
could encounter differences attributable to smoking behavior. Closer to the con-
cerns of this study, we would not want to compare a group exposed to nuclear
test radiation with a group drawn from radiation workers. (Although if there
OCR for page 161
APPENDIX C
161
were a few radiation workers in a much greater number of comparison group
members, any possible confounding would be very diluted.)
Study results hinge on differences between the two (or more) groups com-
pared in the study. So, choice of comparison groupies) is an extremely important
task, one that has both conceptual and practical aspects. Consistent findings over
hundreds of different disease-exposure inquiries demonstrate what we refer to as
a "healthy worker effect." With no hypothesized harmful exposure, a cohort of
workers or soldiers is expected to be healthier, as reflected in mortality and mor-
bidity rates, than a general cohort. To be included in the soldier or worker co-
hort, the individual has to be mentally and physically functioning at or above
whatever level is required for the duties of that cohort. In the extreme, those on
their "deathbeds" are not hired or recruited. Furthermore, individuals are ex-
cluded from military service if they are not "fit," according to clinical and labo-
ratory findings. Numerous studies have confirmed that this healthy worker effect
is most pronounced in measurements taken close to the time of hiring (or entry
into military service) but continues for decades.
Using a military comparison group addresses and avoids the healthy soldier
effect but does carry other drawbacks. While government and other groups rou-
tinely gather statistics (including demographic, health, and employment de-
scriptors) on general populations, such as U.S. males aged 45-65, data are not
readily available for more finely (or even grossly) honed comparison groups in
the military or elsewhere. Using a specifically designed comparison group,
therefore, adds expense and time to a study. Furthermore, it increases the op-
portunity to introduce confounding information that could bias the findings.
Many of these difficulties can be overcome with meticulous attention to tech-
nique, innovative study designs and analytic plans, and a balanced view of what
statistics do and do not say. These options are difficult to weigh for practiced sci-
entists and no less difficult to explain to and discuss with non-technically trained
readers; misunderstanding between scientist and public often occurs.
One option is to compare the group in question (for example, military per-
sonnel who participated in nuclear tests) with more than one comparison group,
aiming to tease out relationships between exposure and outcome by seeing
similarities and differences in those comparisons. The current CROSSROADS
study is structured around a military comparison group, chosen to match on age,
rank, time period, and military occupation-all available characteristics but
specifically not CROSSROADS test participants. Secondarily, we included sta-
tistical comparisons with the general U.S. male population.
FINE TUNING OF EXPOSED GROUP
Although "participant" vs. "nonparticipant" is an intuitively reasonable
place to start analysis in this study, there are intricate details to consider. Fore-
most, not all "participants" received the same amount of exposure (or potential
OCR for page 162
162
THE FIVE SERIES STUDY
exposure, measured exposure, expected exposure, or type of exposure) as all the
other participants.
We look, therefore, for some Wayne) of measuring the amount of exposure
and then characterizing individuals in relation to their known (or expected or
hypothesized) dose (amount of exposure). Otherwise, if only a few of the par-
ticipants were exposed, any effect (on cancer mortality, for example) would be
diluted because most of the "exposed" were actually "not exposed" (or mini-
mally exposed) and would not reflect the exposure-disease association. No dif-
ference would be observed and we would not know whether that meant there
was indeed no difference or the comparison groups were identified in ways in
which a real difference could not be observed.
Because adequate direct exposure measurements are not always available,
researchers attempt to develop surrogate measures of exposure. In this study we
pursued data from actual dosimetry measurements made at the time of the nu-
clear tests, recalculations done to address the known incompleteness of those
measures, self-reports of participants, and coherent assumptions based on
knowledge of radiation physics, troop logistics, on-site reportage, logs, and
documents as well as logic.
CONFOUNDERS
It will come as no surprise that some characteristics such as age and sex-
are associated with numerous measures of health status. They are, also, associ-
ated with military experience in general and CROSSROADS participation in
particular. These are likely confounders (things that confuse a straightforward
comparison), because they are characteristics associated with both the outcome
and the putative causative element under study. While a military comparison
group based on broad categories of age, sex, similar unit assignment, and mili-
tary rank provides some assurance of comparability, differences are still likely to
exist. When we know what the confounders are and we can measure them, we
can take them into account in the statistical analysis. Careful choice of compari-
son groups can help to limit the effect of unknown confounders. Chapters 10 and
11 of this report describe the design and analytic steps we took to control for
potential confounding.
Examples of characteristics that frequently confound exposure-disease as-
sociations include age, race, sex, socioeconomic status, occupation, and various
behaviors, such as alcohol and tobacco use. In specific studies investigators may
hypothesize potential confounders such as ethnicity; military service-related
exposures, including sunlight, altitude, and preventive and therapeutic attention
to infectious disease, as well as the diseases themselves; and other risks based on
lifestyle, geography, and postmilitary careers.
OCR for page 163
APPENDIX C
163
DATA COLLECTION
Once researchers have chosen the groups to study, avoiding the pitfalls or
at least, recognizing and measuring them as best as possible for later adjustment,
they face a new set of problems during the planning and conduct of data collec-
tion. If you plan to get information directly from the subject, you need to do all
you can to find all subjects, regardless of their being in the case/participant or
control/comparison group and regardless of the outcome under study. If you are
getting information from records, you need to get records for all subjects, again
regardless of their being in the case/participant or control/comparison group and
regardless of the outcome under study.
For example, if you are attempting to get information from subjects them-
selves and want to find out mortality rates and gather information by phone, you
will notfind anyone to be dead. Conversely, if you look only at death certificates,
you will not f nd anyone alive. These somewhat tongue-in-cheek extremes are easy
to avoid; the shades of gray around and between them, however, are often stum-
bling blocks in data collection and then analysis and interpretation. The reasons
are that there are biases in record systems: not all records have an equal likelihood
of being retrieved. For example, in looking at hospital records, specific cases in-
volved in lawsuits may be in the general counsel's office and not in the clinic's
file, where they would normally be found. There are also mundane reasons for all
data not being equally available: records can be lost or destroyed, intentionally or
unintentionally, by flood or fire, as in the case of veterans' records at the National
Personnel Records Center in St. Louis (see Chapter 7~. Note that bias does not
necessarily mean prejudicial treatment, but would include any process that sys-
tematically treats one set of records differently than another.
To minimize possible biases, a number of general rules and protocols have
evolved to guide researchers regardless of participant or comparison group and
regardless of likely outcome. These protocols include developing an understand-
ing of all data sources and how they may be expected to affect data distributions
and establishing clear decision rules. A summary list of rules could include:
· ensuring that there is an equal likelihood of finding records of people in
each group; if a source of data is available for only one group, do not use it.
· being aware of biases built into record systems. There are potentially
many of these: people with illness are more likely to seek care; veterans with
lower incomes or service-connected disabilities are more likely to seek VA care;
care-seeking behavior varies over time (for example, as VA benefits change);
medical record technologies change; whether patients or family members have
concerns about benefits or suspicions of causation could influence whether they
notify the recordkeeping agency; data may be missing due to circumstances be-
yond human control, such as a fire destroying paper files; and data accuracy is
associated with level of ascertainment, such as completeness of fact-of-death,
date-of-death, or cause-of-death information.
OCR for page 164
164
THE FIVE SERIES STUDY
· using a firm cut-off date for the follow-up period. It is necessary to treat
participants and comparisons equally when it comes to data collection, follow-
up, and maintenance. The decisions made should be definable. Researchers
should examine according to biologic, logistical, and cost implications-
choices involving latency periods, cohort age, or pending compensation
questions. Once cut-offs are chosen, it is best to recognize and honor the choice
(although it may seem arbitrary in practice).
· recognizing that raw numbers offer different information than do rates or
proportions. The latter include a context for interpreting the importance of the
raw number. While reporting the number of people dead is often informative, it
is insufficient to use percentages without first identifying a conceptually accept-
able denominator and then using the entire denominator in any calculation. For
example, when examining constructs such as "average age at death," one should
account for the amount of time available for observations since the average will
change over time as larger proportions of the sample die. For example, let's
follow the mortality experience of a hypothetical sixth-grade class of 25 students
in 1923. Looking at them in 1925, after one 13-year-old died in a motor vehicle
accident, we would see an average age at death of 13 years. If no one else in that
class were to die over the next 15 years, then, in 1940, the average age at death
would still be 13 because all members of the cohort who had died (in this case
one person) did so at age 13. By 1975 (the original children would now be about
61 years old), perhaps another 10 had died; the average age at death would be
higher than 13, but necessarily lower than 61. The average would depend on
when the deaths occurred within that period. The average age of death calcu-
lated at any point in time is the average of the ages at death for all members de-
ceased by that point in time. The average will change over time as more deaths
are added into the calculations. The average does not reflect the total mortality
experience of the group until all members have died. Statistical techniques have
been developed to even out such things, so that numbers can be compared
meaningfully.
These comments show the bridges among data collection, reporting, and
analysis. In the following sections, we continue with analysis issues.
INTERPRETING DATA FINDINGS
Let us say that comparison groups were chosen appropriately, unbiased data
collected, and one group has more disease than the other. Epidemiology pro-
vides for the use of judgment in considering whether a numerical relationship
might reflect a causal one. The criteria of causal judgment which have been
stated in many contexts involve two broad considerations: Are the exposure
and the outcome associated? Does that association make sense, based on bio-
logical as well as other physical, historical, and study design factors?
OCR for page 165
APPENDIX C
165
Epidemiology studies are designed to describe numerical associations be-
tween factors (risks, treatments, outcomes). In interpreting the results we look at
characteristics of those associations. Evidence supporting a causal association
mounts if the association is consistent (observed in a variety of studies address-
ing the same type of exposure in different circumstances), strong (e.g., with high
relative risk ratios), and specific. Statistics serve as a tool to quantify the
strength of associations relative to random background fluctuations, which are
more likely to be observed the smaller the sample considered. Through mathe-
matical theory and centuries of data analysis, statisticians have derived (and
continue to derive) methods to deal with multiple comparisons, effects of mis-
classification, inferences from samples, and combining data from diverse (but
not too diverse) studies.
Vital to the epidemiologist's examination of data are the issues of statistical
measures and variability. Starting with a sample of people, we generate statistical
measures (or statistics, for short) that summarize some important information col-
lected on them (e.g., death rates). Variability enters the picture when we take a
particular sample, because the statistics we generate for that particular sample will
be specific to that sample; a different sample would generate different statistics
because the individuals in one sample are not the same as in the other. Yet, if a
sample has been selected essentially at random and something is known or as-
sumed about the distribution of the statistics generated from that particular sample,
then we can make some general statements about the variability of those statistics.
Typically, we characterize a particular statistical measure's variability by
quantifying how much it would vary just by taking different samples and recal-
culating that same statistic. In general, it turns out that the larger the sample, the
smaller the variability. It is customary to calculate two limits, called the lower
and upper 95 percent confidence limits, that have the property that if we repeat-
edly drew samples and recalculated the statistic, these different values would lie
between the upper and lower confidence limits 95 times out of 100. The interval
between the upper and lower confidence limits is thus called a 95 percent confi-
dence interval. The wider the confidence interval, the more variability there is in
the statistic.
It is frequently of interest to know what the variability of a statistic is be-
cause it affects its interpretation. If the mortality rates of participants and con-
trols are equal, for example, then the ratio of these two rates (the rate ratio)
should be 1.0. However, there is inherent variability in this rate ratio statistic, so
that we want to calculate its 95 percent confidence interval. If the ratio is only
slightly more than or less than 1.0, for example, by an amount that lies within
the confidence interval, we customarily conclude that this small deviation from
1.0 could be attributed to inherent variability (chance), such as that which comes
from selecting different samples. On the other hand, if the confidence interval
for the rate ratio does not include 1.0, its value is not attributed to chance and it
is considered statistically significant.
Another way to determine whether a particular statistic (let us stick to rate
ratios) is bigger or smaller than 1.0 is to perform a statistical test. A statistical
OCR for page 166
166
THE FIVE SERIES STUDY
test is a more formal statistical procedure that computes a statistic under the as-
sumption that some null hypothesis is true. A typical null hypothesis might be:
there is no difference in mortality rate between group A and group B (in other
words, the rate ratio is equal to 1.0~. If the statistic is "unusual," then the null
hypothesis is rejected. The measure of"unusual" is called ap-value. Customar-
ily, a p value of less than 0.05 is considered "unusual." For example, take the
above null hypotheses of no difference between mortality rates in groups A and
B.; that is, the rate ratio is 1.0. If observed data yield an actual rate ratio of 1.5,
for instance, and an associated test statistic with a p-value less than 0.05, then
we reject the null hypothesis and conclude that such a high risk ratio is unlikely
(only 5 times out of 100) to be due to chance.
Finally, we need to examine a little more what "unlikely to be due to
chance" means in a larger context. By custom, a value is called statistically sig-
nificant if the operation of chance will produce such a value only about 5 times
in 100. However, just as in the case of repeated samples, repeated analyses of
different data (for example, death rates due to cancer, to heart disease, to respi-
ratory disease, etc.), every one involving a statistical test, will carry an individ-
ual 5 percent risk of labeling a statistic significant when its increased or de-
creased value was actually due to chance.
Moreover, if we do many such analyses, that 5 percent risk for each one
mounts up. For example, if one does 20 statistical tests of rate ratios, it is quite
likely that there will be at least one rate ratio labeled statistically significant just
by the operation of chance. This analytic problem is known as the multiple com-
parisons problem.
Because the greater the number of statistical tests, the more findings are
labeled statistically significant due to chance, efforts are made to limit the num-
ber of statistical tests. This is usually done by specifying in advance a relatively
small number of tests, directed at a limited number of research questions. Nev-
ertheless, there are also times for example, when one is interested in com-
pletely describing all the data, say, looking at a complete list of causes of death,
whether or not one suspects that any of these rates are elevated when many
independent tests are made. In these situations, it is especially important to keep
in mind the possibility that statistically significant rate ratios may be labeled so
merely due to chance.
At the same time, one must consider that a true association may fail to test
as statistically significant by chance or because of lack of statistical power. The
power of a study to detect a real association (if there were one) depends on sam-
ple size, the incidence of the outcome in the absence of exposure, and the
strength of association between the exposure and the outcome.
In considering whether an observed association makes sense causally, epi-
demiologists consider the temporal relationship between the factors (e.g., if de-
scribed appropriately, an outcome cannot precede a cause), the biologic plausi-
bility of the association, and its coherence with a range of other related
knowledge (radiation biology, for example). No one of these factors is necessar
OCR for page 167
APPENDIX C
167
fly sufficient to prove causation. In fact, causation cannot actually be proven; it
can only be supported (weakly or strongly) or contradicted (weakly or strongly.
Epidemiology uses numbers, going to extreme lengths at times to "split
hairs" and "search under rocks," yet relies on judgment for interpretation. It is
hoped that the considered judgments of epidemiologists will be useful to the
judgment of clinicians in making treatment decisions and of policymakers in
making legislation and regulatory and procedural decisions.
EPIDEMIOLOGY SUMMARY RELATED TO THIS STUDY
This is a report of a retrospective cohort study comparing military partici-
pants in CROSSROADS with military nonparticipants who are similar in age,
rank-rating, military occupation, time frame of service, and sex. To more accu-
rately measure exposure, we developed and used criteria for those participants
most likely to have been more highly exposed. The study design calls for tight
controls on the selection process for assignment to participant or comparison
groups, data access, and data follow-up.
The endpoints considered are mortality rates. Specific causes of death were
chosen based on understanding of disease process and a priori expectations
based on knowledge and suspicion of radiation effects.
This study will not say whether Private Rogers, Rodriguez, or Rosenthal died
of cancer because of Operation CROSSROADS. It may be able to say that the rate
of cancer among all CROSSROADS participants was-or was not-different
from the rate of cancer among comparable nonparticipants. Whether associations
are reported with relative surety or uncertainty depends on the data themselves and
on statistical techniques for sifting the wheat from the chaff. If this were easy, we
would not still be studying and arguing about radiation effects.
The Medical Follow-up Agency of the Institute of Medicine, National
Academy of Sciences, conducted the study, relying, as necessary, on records
maintained by government and private groups. MFUA is itself "disinterested" in
that it stands to neither lose nor gain from its findings in this study: it will nei-
ther receive nor be denied compensation, nor will it be held fiscally or pro-
grammatically responsible for such compensation or related care. Because this
study (not unlike many other studies of human suffering and possible blame and
responsibility) has an historical overlay of tremendous emotion and distrust, we
must be especially careful to follow generally accepted ground rules for valid
studies and to describe openly our rationale for various decisions throughout.
Representative terms from entire chapter:
mortality rates