Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 99
6
Statistical Power and Interpretation of the Study
Statistical power is discussed in section VIT of the
HTDS Draft Final Report and in additional material given to the
subcommittee: appendix H of the HTDS protocol (May 1993),
section IT.G of the HTDS Pilot Study Final Report (Ianuary 1995),
and a memorandum by Ken Kopecky (December 14, 1995~. This
review focuses primarily on thyroid malignancies as an example,
although most of or all the points can be made about the other
disease end points of interest.
FACTORS IN STATISTICAL POWER
Because the study results were essentially negative (that
is, a finding of no increase in thyroid disease among those with
higher estimated levels of AT exposure compared with those with
lower estimated levels), a critical issue is how to interpret the
negative findings correctly. Part of the process of interpretation is
to subject the study to a series of reality checks. Were the data of
sufficiently good quality? Do the underlying patterns of exposure
and disease agree with or counter the negative association? Are the
confidence intervals wide enough for the results to be compatible
with other studies that have found an association between AT
exposure and disease? If, for example, this negative study did a
very good job of estimating thyroid-disease rates but found that
milk-drinkers have higher rates of disease than non-milk-drinkers
and that those who lived directly downwind of the site have higher
99
OCR for page 100
100
Review of the HTDS Draft Final Report
rates of disease than those in the more northern ("Iow-dose")
counties, we might conclude that the pattern of thyroid morbidity
fits the likely pattern of exposure and that the lack of a dose-
response association could well be due to poor dose estimates.
Similarly, a negative finding that is due to limitations in the
assessment of thyroid-disease rates, because of either poor data
collection or small numbers of subjects, does not support a
conclusion that downwinders' disease patterns are unrelated to
Hanford exposure patterns.
The power of a study of this type to detect a
hypothesized increase in disease prevalence per unit dose (an
absolute risk of 2.5% per Gy was used in the power calculations
for thyroid carcinoma) depends largely on
· The size of the sample.
studied.
· The background prevalence of disease in the sample
· The absence of biases that are caused by subject
selection, reporting, or inaccurate detection of disease.
· The dose distribution in the sample.
. The adequacy of the dosimetry system in characterizing
the dose of an individual.
· The independence of disease between individuals in the
study, conditional on dose (for example, if there is no
geographic clustering of disease or systematic geographic
difference in disease rates due to other unmown factors).
OCR for page 101
Statistical Power and Interpretation of the Study
SAMPLE SIZE AND ASSUMED BACKGROUND PREVALENCE OF
DISEASE
101
The HTDS was successful enough in obtaining subjects
who were willing to participate in the study that it essentially met
its goals, so sample size is not a concern if the statistical-power
assumptions and methods were appropriate. For the second point
listed above, it was assumed in section VTT of the HTDS Draft
Final Report that the cumulative background prevalence of thyroid
carcinoma either previously diagnosed or detected in screening
would be 0.7°/O in females and 0.4°/O in males; these translate into
19 expected background cases (in the absence of a Hanford dose)
in the cohort. Thus, if there is no increase in thyroid cancer due to
the Hanford exposures, the 20 observed cases closely match the
assumptions made in the statistical design. (For further discussion
of the expected number of cases, see chapter 5 of the present
report.)
EFFECT OF DOSIMETRY ERROR ON STATISTICAL POWER
Primary issues involved in determining whether the
statistical power of the study was as expected are the dose
distribution of the sample and the precision of the dosimetry
system at the individual level. Distributional assumptions must be
made in computing the power of the statistical tests or sample size
for a specified power, and one can ask how robust the results were
to these assumptions. The HEDR project acknowledged that
parameter values used in its dose-reconstruction process were
uncertain. Some of the uncertainties (those associated with release?
dispersion, deposition, uptake modeling, and so on) were common
to many individuals' dose estimates, whereas others (associated
with food consumption, lifestyle, and so on) were individual
· .
specific.
The HEDR project expresses parameter-value
uncertainty with subjective probability distributions, which
quantify the state of knowledge as judged by the HEDR analysts.
The propagation of uncertainty through the HEDR models results
OCR for page 102
102
Review of the HTDS Draft Final Report
in a subjective probability distribution for each individual's dose
estimate. The resulting distributions are very complicated, so,
rather than an analytic characterization, a random sample (the 100
alternative realizations of the HEDR dose) produced numerically
served as an approximate quantitative expression of the combined
influence of the parameter-value uncertainties on the estimation of
the individuals' doses.
The random sample was drawn so that the correlation
among individuals' dose estimates due to common sources of
uncertainty would be preserved. To summarize the dose
distribution for each person, the median of the 100 dose
realizations was calculated, and the median forms the single "dose
estimate".
The effect of dosimetry errors on the expected or
achieved statistical power of the HTDS is not mentioned in the
protocol section on statistical power. instead, in the power
calculations, the distribution of the median dose estimate for each
person is used as though it is equivalent to true dose.
it would be valid to ignore the dosimetry errors in the
calculation of the statistical power for detecting nonzero parameter
values in a linear dose-response mode! if both the following
criteria hold:
· The average value of true dose for all subjects with the
same estimated dose is equal to the estimated dose.
. Dose errors are independent from subject to subject, or
at least any correlation between subjects' true doses (given
estimated doses) is due to additive rather than multiplicative
components of error.
Consider the second criterion. If all the doses were off
by a constant unlmown additive amount, then only the intercept
terms, not the slope coefficients in the linear models, would be
OCR for page 103
Statistical Power and Interpretation of the Study
103
affected by the correlated dose errors. However, for most shared
sources of uncertainty, a multiplicative effect is likely. For
example, if all the errors in the doses were due to uncertainties in
the milk transfer coefficient appropriate for herds near Hanford,
this would affect all doses multiplicatively; the estimated slope
terms would retain the uncertainty, even in an infinitely large
epidemiologic study.
VIOLATION OF THE BERKSON ERROR ASSUMPTION
If the first criterion
is met, the statistical literature
indicates, the estimation of linear dose-response models is
essentially unaffected by independent dosimetry errors. In fact, an
important measurement-error correction technique, known as
"regression calibration", consists of the calculation of the average
value of true dose, given estimated dose (see chapter 3 of Carroll
and others, 1995), and the substitution of this average in the
regression analysis. The first criterion, is sometimes called the
Berkson mode! of measurement error.
Berkson errors arise when dose estimates are given as
the average value of possible doses of a category of subjects who
individually have a range of possible doses. The aim of the
designers of the dosimetry system (the HEDR project) was
evidently to provide the same average dose for all members of a
particular category defined by "input data" (such as specific
geographic location on a particular day with particular
meteorologic conditions and specific age). Because the input data
do not by themselves define the dose precisely, a Monte CarIo
procedure was used in which all possible factors affecting a given
individual's dose were considered to be random and 100 possible
doses were drawn. Use of the average of possible individual doses
as the dose estimate ideally results in Berkson error. However,
because uncertainties in multiplicative factors that affect all doses
simultaneously are admitted by the HEDR project (source terms,
milk transfer coefficients, on so on) even under ideal
OCR for page 104
104
Review of the HTDS Draft Final Report
circumstances, violations of the second criterion are expected.
These violations can be considerable.
As described in section Lit of the Draft Final Report,
the noncentrality parameter governing the power of the study is
equal to
NCP = I /2NB2cr2t ~ + .
Pm(l Pm) Pf (] Pf
where
N= sample size (here 3,190),
1
B = assumed dose-response relationship (absolute risk, 2.5%/Gy
for thyroid malignancies), and
pm and pf = cumulative incidence of thyroid malignancies in mates
and females, respectively, in the population as a whole (assumed to
be 0.4°/0 and 0.7°/0).
The variance term CJ2 has to do with the variance of the
dose distribution. If closes were observed without error, this term
would be equal to the variance of the dose estimates. When doses
are observed with error (whether Berkson or from any other
model) but errors are independent or dependent because of purely
additive components (that is, satisfying the first criterion), CT2 iS
replaced in equation ~ with the variance of the average of true dose
given estimated dose: CT2 = Var (Avg (True dose~estimated Moseys.
(This follows as a consequence of the "regression-calibration"
approach to measurement-error correction.) By definition, if one
accepts the argument that the HEDR system produces Berkson
errors and that the correlations are small, the value of O2 to use in
OCR for page 105
Statistical Power and Interpretation of the Study
105
equation ~ is just the variance of the average dose for each
individual. This is essentially what was done in the sample-size
and power calculations for the HTDS, except that the median
rather than the estimated mean dose was used. Dosimetry error
always reduces study power, because it reduces the correlation
between study outcomes and the exposure estimates, relative to the
correlation that would be seen if true dose were known. However,
in linear models of the probability of occurrence of disease and for
dosimetry errors that satisfy the two conditions above, the slope
parameter being estimated will remain statistically unbiased if the
average dose estimates are used. Thus, the formula for the
noncentrality parameter in equation ~ holds, except that the value
of ~2 being used is now the variance of the average dose estimates,
that is, Var(Avg(True dose~estimated dose)), which is always less
than the vanance of the true exposures.
DOSE ERROR DUE TO UNCERTAINTIES IN INPUT DATA
One reason for doubt about the substitution of the
variance of estimated doses for CT2 iS that the input data themselves
are subject to obvious errors. The input data consist of such factors
as location of residence (probably known fairly well) and milk-
consumption habits in early childhood (undoubtedly known much
less well; more will be stated about this below). If the fundamental
input data are known with error, then in general Var (Avg (True
dose~estimated dose)) will be overestimated by the dosimetry
system. That occurs because the averaging process required to
calculate Avg (True dose~input data) uses too few scenarios, and
too little overlap is assumed between the scenarios that correspond
to the distinct sets of reported input data.
The mean estimated dose in the HTDS Draft Final
Report is IS2 mGy with a variance equal to (227 mGy)2. The
sensitivity of the power of the study to the assumption that dose
errors are of the Berkson type is approached as follows.
OCR for page 106
106
Review of the HTDS Draft Final Report
Suppose that the distribution of true dose is logno~mal
and that instead of a Berkson-error mode! we assume a classical-
error model on the log scale so that
logfestimated dose) = log~true dose) + error.
(2)
in this model, it is the estimated doses that are
randomly distributed, multiplicatively, around the true doses. This
mode! has often been considered potentially appropriate if input
data are known with errors that are independent from subject to
subject. The relevant aspect of the mode! here is that the average of
true dose, Avg (True dose~estimated dose), derived from it has
smaller variability, from person to person, than does the estimated
dose itself. (The large estimated doses are reduced, and the small
estimated doses increased.) The reduction of variance (of the
average of true compared with estimated dose) is governed by the
relative sizes of the variances of the last two terms log~true dose)
and error-in equation 2.
Assume, for example, that errors in log (estimated dose)
have mean zero and standard deviation equal to 0.30 and are
independent between subjects. This roughly corresponds to
measurement error with a coefficient of variation of 30°/0 (quite
small compared with the variation seen in the 100 HEDR dose
replications discussed in section VTT, figure VTIl.4~. in this case, it
can be shown that if the estimated dose distribution has a mean of
~ 82 mGy and a variance of (227 mGy)2, the variance of Avg(True
doselestimated dose) would be equal to (178 mGy)2. Substituting
that for O2 in equation ~ reduces the power from 96% to about
85°/0. Assuming larger errors in equation 2 has correspondingly
larger effects on the analysis. Reduction of the power to below
60% (generally regarded as a study of low power) would occur
when the standard error in equation 2 equaled 0.48, because this
will reduce the Var(True dose~estimated dose) to about (125
mGy)2. Note that 0.48 is still quite small compared with the overall
variability seen in the 100 estimates of HEDR doses. A value of
OCR for page 107
Statistical Power and Interpretation of the Study
107
0.48 in equation 2 gives dose variations of about a factor of 2.8
compared with the even larger value of 4 seen in figure VTTT.4 in
the HTDS Draft Final Report.
To reiterate, the important thing about equation 2 is that
the variance of the average true dose is smaller than the variance of
the estimated doses. For example, if the CIDER program tended to
overestimate the average true dose for all subjects by about 80°/O,
the actual power of the test would again be 60% instead of the 96%
claimed, because it would also correspond to cs2 = (126 mGy)2.
We conclude that if a substantial, but not overwhelming
Faction of the variability of the HEDR individual dose estimates
actually is due to non-Berkson error, as in equation 2, or if there is
a substantial additional component of error due to uncertainties in
input data, the power of the study likely was reduced to below
levels that would usually be considered acceptable.
A worst-case scenario, in which all the error seen in
figure vIrI.4 of the Draft Final Report is due to independent errors
in equation 2, would produce very low power to detect a positive
dose-response relationship. For a number of reasons, however, it is
considered unlikely that such a worse case actually applies. Given
that the dosimetry system is based on extensive Monte CarIo
calculations over many scenarios for each individual's input data,
it seems reasonable to believe that some errors in the dose
estimates do correspond to Berkson error. Also, the point is made
in the later parts of the HTDS results section (section VTTI) that a
primary feature of the data is that two of the geostrata with the
lowest estimated doses (Okanogan County and Ferry-Stevens
Counties) actually had the highest rates of many of the thyroid
diseases considered. Basic considerations of such factors as the
prevailing wind directions would indicate that those counties
should have had less ]3~l deposition then the other counties in the
study. Unless such basic assumptions in the dosimetry system are
incorrect, it is difficult to believe that this is an artifact of
dosimetry error.
OCR for page 108
108
Review of the HTDS Draft Final Report
EFFECT OF ERRORS IN ASSESSING CHILDHOOD MILK
CONSUMPTION
The effect of errors in assessment of childhood milk
consumption on the power of the HTDS to detect dose-response
relationships depends on the fraction of variation, M, in between-
person thyroid dose that is due to between-person variability in
milk consumption. if R is the coefficient of correlation between
reported milk consumption and true consumption, the sample size
needed to maintain the same power, relative to a study of size N
with no errors in reported consumption, can be approximated to a
first order as N/~1 - M + MR2) (see appendix D). if, for example,
half the variation in thyroid dose is due to variation in milk
consumption and the correlation between true and reported milk
consumption is 0.3, the sample size needed is I.8N. Thus, about
80°/0 more subjects are needed, in this example, to make up for the
poor correlation between reported and true milk consumption. The
HTDS Draft Final Report does not discuss errors in the milk-
consumption estimates relative to the calculation of the HEDR
doses, so we assume that no allowance for such errors has been
made.
The HTDS Draft Final Report did investigate the effect
of substituting defaults for estimated milk consumption (reference
values, rather than each subject's individual estimate) in the HEDR
model for thyroid dose estimation. The use of the reference values
did not change the overall trends in the dose-response analyses, but
more information about this analysis is needed. A comparison of
the variance of the HEDR dose estimates using individual versus
HEDR reference milk-consumption estimates would be helpful for
two reasons. It would allow calculation of the statistical power to
detect the hypothesized dose-response relationship in the analyses
that used the reference diet an important point not explicitly
discussed. And, by allowing the estimation of M, it would partly
address the extent to which the HTDS might have been over
optimistic about the value of the retrospective reports of diet. In
OCR for page 109
Statistical Power and Interpretation of the Study
109
particular, if the variance of the individual diet-based HEDR
estimates is much larger than the variance of the reference diet-
based estimates (that is, if M approaches 1), the power of the
primary analysis (which used individual diet) could be very
sensitive to low values of R for the correlation between estimated
and true individual consumption of milk. (if M is close to ~ and R
is 0.3, it would take a study size perhaps 10 times as large to obtain
the power that knowing true consumption would yield.) But, if M
is relatively small, then the power of the HTDS to fine! dose-
response relationships is much less sensitive to assumptions about
the accuracy of the retrospective surrogate reports of diet because
other factors (such as location of residence) dominate the
calculation of the estimated doses. Both power (of the reference
diet analysis) and sensitivity (of the primary analysis) to error in
the individual diet estimates should be discussed in future revisions
of the Draft Final Report.
CORRELATED DOSE ERRORS
Correlations between individuals in dose errors also
affect the power of the study to detect a dose-response relationship.
For example, if the CIDER program tended to overestimate the
average dose for all individuals by about 80%, the actual power of
the study would be 60% instead of the 96% claimed, because it
correspond to cT2 = (~26 mGy)2. But if doses were consistently
underestimated, the study power would increase. In general, highly
correlated multiplicative errors lead to wider confidence intervals
(and hence reduced power) for estimated slope terms in the linear
model, inasmuch as allowance for the common uncertainties in
dose need to be accounted for. Because the Monte Cario
procedures used by the HEDR project involved averaging over
possible values of a number of parameters (source term, milk
transfer coefficients, and so on) that are expected to affect doses
multiplicatively, some analysis of the correlation between doses
should have been performed as a part of the power calculations for
the HTDS.
OCR for page 110
110
Review of the HTDS Draft Final Report
EFFECT OF GEOGRAPHIC VARIATION IN DISEASE RATES ON
STATISTICAL POWER
A notable feature of the HTDS data is that there were
indications of heterogeneity by geostratum for many of the
thyroid-disease outcomes considered. Part of this heterogeneity
was that the two low-dose geostrata often had higher rates of
diseases than the other areas, whether or not dose was considered
as a risk factor. That sort of heterogeneity of background rates of
disease can have important effects on the power of studies of this
type. Essentially, the issue is related to the last statistical "factor"
noted earlier (see page 1001: whether, conditional on dose, disease
outcome is independent from individual to individual.
it is possible that important known or unknown
covariates for thyroid-disease risk, not considered in the study,
could lead to biases or loss of power if they tend to cluster by
geostratum, distance from the Hanford facility, or otherwise in
ways that affect estimated doses. To take an unlikely example,
adult weight has been found in case-control studies (McTiernan
and others, 1987; Preston-Martin and others, 1993) to play an
important role in thyroid-cancer risk, with the heaviest subjects in
one study (Goodman and others, 1992) having up to 5 times the
risk as the lightest. if for any reason the average weight of subjects
differed substantially by geostratum or distance from Hanford, this
could make the estimation of a dose-response relationship quite
difficult. Similarly, thyroid-cancer rates have been noted to differ
by ethnicity, with an especially high rate among Filipino women in
Hawaii (Kolonel, ~ 9851. The Hanford study population is
ethnically homogeneous, but the evidence, for many of the thyroid
diseases or abnormalities, that risk differs by geostratum raises the
question of whether clustering of important unconsidered factors
could have reduced the power of the study by violating the
independence assumption.
Representative terms from entire chapter:
draft final