Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 Evaluation of StatisticalData Analysis MODE OF PRESENTATION OF RESULTS The basic objective of the statistical analysis was to determine whether thyroid disease has increased among persons exposed to ]3~l released from Hanford in the period under consideration. That objective was appropriately addressed by modeling the relationship between dose and the probability of occurrence of a thyroid disease. in particular, the relationship was modeled as a linear function that is, a regression equation by using the median of the 100 dose estimates for each person as his or her assumed dose. The HTDS investigators used these linear models in dose to mode} probabilities of disease and the numeric values of blood concentrations of various biomarkers. The use of linear models (or linear-quadratic models) for probabilities is common in radiation epidemiology and is generally based on biologic or radiation-protection considerations. Such models, however, can present difficulties in estimation, in that negative probabilities are not allowed to occur but can appear during the iterative procedure used to produce maximum likelihood estimates, especially if a generally negative dose-response relationship is evident in the data. In such cases, the models are said to have had problems in "converging". The HTDS investigators also used another approach: logistic regression. Logistic regression is a nonlinear mode! in which probabilities are never allowed to reach zero, so convergence problems are less common during maximum 86

Evaluation of Statistical Data Analysis 87 likelihood fitting. However, the parameter estimates from logistic regression are arguably less easily interpretable in the radiation- biology or radiation-epidemiology setting because Tog odds ratios? not disease probabilities, are being modeled as linear in dose. The HTDS investigators used the linear models as their primary method of analysis but also gave the logistic-regression results, especially when the linear models failed to converge. In their analyses, convergence apparently was always achieved with the logistic models, but many of the linear models had convergence problems. The HTDS investigators clearly considered the linear models as the primary method of analysis; for example, power calculations were given only for linear models (section V). We focus most of our comments on their use of the linear models, but much of our critique is also applicable to logistic regression. A zero slope in the regression equation indicates no association between dose and the probability of occurrence of a thyroid disease. Standard statistical tests were used to determine whether the slope was significantly positive. Because the investigators assumed that the association, if any, would be in a positive direction, they appropriately used a one-sided statistical test. For most of the thyroid diseases considered in the HTDS, the conclusion was that the null hypothesis of zero slope could not be rejected, that is, there was no clear evidence of a thyroid-disease effect due to exposure. In the HTDS Draft Final Report, there was an overreliance on the maximum likelihood fitting of the linear dose- response model. For several of the important outcome variables, such as thyroid carcinoma, the mode! calculations failed to converge. A better organization of the results would have been achieved by expanding the tables of high- versus low-dose results (section VITI, pages 107-124) into quartiles or quintiles, so that disease and abnormality rates would be given for four or five categories of dose, and incorporating them into the presentation of the earlier results. In cases where the results of the linear mode! calculations as described failed to converge, it would then be

88 Review of the HTDS Draft Final Report possible to rerun the analysis, using the average value of dose in each category as the predictor variable. That would probably have resulted in successful convergence and would retain reasonable power to detect an effect. Another approach would be to replace the maximum likelihood fitting with an ordinary least-squares analysis when the mode} failed to converge with maximum likelihood. Unless a large proportion of the estimated probabilities lie outside the unit interval (0,1), the slope test statistic is reasonable for testing for the presence of a dose-response relationship. For models in which convergence was still not obtained, it would be reasonable to report the value of the slope parameter at the point where the constraint that all outcome probabilities be positive was first violated. A confidence limit based on the profile likelihood for the slope parameter could have been calculated and would have been helpful, especially for comparison with other studies. Rather than reliance exclusively on analyses that used the putative individual doses, an additional set of confirmatory analyses would be valuable. The basic parameters that define a person's dose are geographic location, source of milk (backyard cow, commercial milk, and so on), and amount of milk consumed. Analyses of thyroid-disease rates according to those basic dose- related variables would provide assurance that doses were not seriously misestimated and could further confirm (or contradict) the principal negative results. Some results were presented in an abstract, rather uninformative manner. For example, there was a scatter plot of individual thyroid doses on a logarithmic scale but no table of frequency distribution of doses, which would have been much more useful. Similarly, one expects radiation-epidemiology reports to include tables that show observed and expected numbers of disease outcomes according to dose groups; these key tables were absent from the Draft Final Report. . Too little descriptive material supporting the results of the analysis was presented. A description of the estimated dose

Evaluation of Statistical Data Analysis 89 distribution (distribution of median doses for people in the study) and disease frequencies and prevalence according to such important categories as sex, geostratum, year of birth, and amount of milk consumption in childhood would be helpful, especially in interpreting the finding that people in the least exposed geostratum appeared to have the highest rates of many of the thyroid diseases or abnormalities. TYPES OF ANALYSES AND RELATED DOSIMETRY-ERROR ISSUES The mode! of the dose-response relationship was given in the HTDS Final Draft Report (section VIT, page 10) as Pj649 = Aj + By, where = sex' a1= cumulative dose to thyroid, Pj6~) = probability that person of sex j and dose ~ has disease or condition in question, Aj= baseline risk frisk without radiation, which can depend on sex), and B = regression coefficient on dose (the slope of dose-response regression line). There is sizable uncertainty in the doses reconstructed for individuals based on residential and especially dietary histories, and variations related to source term, meteorologic uncertainties, pasture deposition, milk concentrations of }3~{, source of milk, and

9o Review of the HTDS Draft Final Report iodine metabolism need to be taken into account. It seems clear that analyses need to address those uncertainties explicitly and that the confidence intervals and the strength of the conclusions have to reflect them. That implies assumptions pertaining to the distributions of two error terms, En and E2: Pj~d[) = Aj + B (~1 + Eli + E2, where , = error in estimating doses, and = error in response to given dose. The statistical-methods section (section VTT) of the Draft Final Report described a mode! incorporating uncertainties in dosimetry, but the analyses (section VIII) used a simpler model that did not include dose uncertainties. Furthermore, no assumptions were stated for E2, and little attention was given to the uncertainty represented by it. As described in chapter 6, below, it appears that dosimetry-error issues were not fully treated in the analysis of the power of the HTDS. Ignoring dosimetry errors can lead to unrealistically narrow estimates of the confidence limits that are applied to the estimated parameter values. The statistical-methods section does describe a method for computing likelihood-ratio (LR) statistics when (as is true for the HEDR doses) errors in the close estimates for each individual are correlated (section VIl.C.3~; however, this method is not used in the results section. The suitability of the FR method that the investigators presented to account for dosimetry errors depends on the validity of the Berkson mocle! for errors (see chapter 6) and on accepting that the correlations between dose estimates are fully specified by the HEDR simulations. Despite those cautions, results from the ER approach could be useful. Although it is very unlikely that the

Evaluation of Statistical Data Analysis 91 estimated dose-response relationships would change in an important way, confidence intervals that take dosimetr~r errors into account would provide more appropriate information about the uncertainty of estimates. It is recommended that confidence intervals be calculated. ANALYSIS OF POTENTIAL CONFOUNDING OR EFFECT-MODIFYING VARIABLES it was not clear from the Draft Final Report how confounders of dose-response relationships were treated, and results adjusted for possible confounders were rarely given. The HTDS investigators conducted analyses of the venous thyroid- disease end points to evaluate a number of possible risk factors for confounding effects or effect modification (section VIlI.D.20) but presented no tables to show a summary of the results of their analyses for any of the end points. Several such tables should be added to the report. ASSUMPTION OF EQUIVALENT RADIATION EFFECT FOR MALES AND FEMALES An important assumption in the main analyses was that rates of thyroid disease might differ between sexes in the absence of 13~{ but that the radiation effect (which was calculated as an excess absolute risk) would be comparable~for males and females. A number of studies have found that excess absolute risks posed by external radiation are greater for females than for males with respect to thyroid cancer (Ron and others, ~995) or thyroid nodules (Nagasaki and others, 1994; Ron and others, 1989; Wong and others, 1996), so the assumption of comparability between sexes is a key assumption to be tested. The investigators stated that they tested it but presented no results for the reader to examine with respect to this assumption for any of the disease outcomes. An analysis that allows for differences between sexes in dose-response slopes should be presented.

92 ANALYSES BY AGE AT EXPOSURE Review of the HTDS Draft Final Report The prevalence of thyroid cancer induced by radiation depends heavily on age at exposure, so it would have been helpful to see a table showing dose-response analyses for those who were younger versus older in 1945, the time of the greatest ~3~: irradiation, to examine whether there were indications of a radiation effect among those exposed at the lowest ages. In particular, it is recommended that results be presented for those exposed in utero and during the first 2 years of life. Likewise, because the magnitude of thyroid doses from AT fallout from the NTS and from global fallout was not greatly different from the Hanford doses in many study subjects, tables showing the results of analyses stratified by magnitude of NTS or global fallout are potentially important. OUT-OF-AREA ANALYSES The HTDS investigators took care to examine the results for the out-of-area participants, those who proved never to have been in the dosimetry area during the time of 13~{ exposure. They performed sensitivity analyses in which the out-of-area participants with disease were assigned either the minimal (zero) or maximal (at the dose-assessment area boundary) likely dose and those without disease were assigned the converse. The two contrasting analyses test the minimal and maximal contributions, respectively, that the out-of-area subjects could make to the dose- response analyses. Either way, the overall results were essentially unaffected, and this indicates that their deletion from the main analyses did not produce a substantial bias. The effect of these cases was small probably because only about 7°/0 of the subjects were out-of-area and the assigned doses for these subjects in the sensitivity analyses were relatively small (~-5 ~ mGy). However, the HTDS investigators made no attempt to mode] the out-of-area doses for persons who were included in the main analyses. That is, if persons were in their dose-assessment

Evaluation of Statistical Data Analysis 93 area for only part of the time when there were 13~{ releases, their doses were calculated only for the time when they were in the area. The investigators implicitly assumed that the dose was zero for any time when a person did not reside in the area. That assumption might or might not have been valid for some individuals, but no attempt was made to improve on the approach or to conduct a sensitivity analysis to evaluate how the assumption could have affected the results. That approach could have led to attenuated or biased results in that it estimated the total Hanford fallout doses for some people and only partial doses for others. There was not even a tabulation of the fractions of the dose-modeled persons that were partly in and partly out of the dose-assessment area during the exposure period or, what would have been better, what fractions of them were out-of-area during the period of heaviest exposures (1944-1947), out-of-area only during other exposure periods, and entirely in-area. The committee cannot evaluate the potential for attenuation or bias by this factor without at least some information on its frequency, and we recommend that the issue of partial out-of-area HTDS subjects be examined. GEOSTRATUM VERSUS. DISEASE The HTDS investigators examined thyroid morbidity according to geographic areas, which they called "geostrata". Given that outcomes (disease or abnormalities) appeared to differ by geostratum, an alternative analysis that stratified by geostratum would be natural to consider. It would be difficult for thyroid carcinoma (owing to the few cases detected), but many of the other outcomes could be analyzed so that the dose-response relationships were estimated for the individual geostrata and then combined to yield a pooled dose-response estimate. Additional analyses are presented that are based on excluding the Okanogan and Ferry- Stevens geostrata; this could well have effects on the dose- response estimates similar to those of a stratified analysis, but one cannot be sure from the writeup. A set of analyses stratifying on

94 Review of the HTDS Draft Final Report geoskata seems needed because the tabulations show that the disease-rates tended to be higher in areas with low fallout, this means that the geostratum differences would induce a negative association between i31T and thyroid-nodule rates. It is recognized that it can be tricky to conduct an analysis controlling for a variable that is correlated with dose, because one does not want to control (remove) a large fraction of the variability in dose; but in this case, when it appears that geostratum is a potent confounder of the dose-response association, it seems necessary. Perhaps a judicious collapsing of similar geostrata can minimize the potential for "overadjustment" (Day and others, 1980) of the exposure variable. Faced with a similar problem in the study of Utah NTS fallout and thyroid disease, Kerber and others (1993) conducted their primary analysis with stratification on coarse geostrata (by state), examining the association of thyroid neoplasms and 13lI dose within geostrata. it is recommended that the Hanford investigators perform a similar type of analysis to examine the possible association of thyroid nodules and other thyroid diseases with 13lI dose. This would provide assurance that a possible confounding variable had been sufficiently evaluated, either to ensure that a positive association was not masked by the geostratum variations or to detect a masked association. ~ ~ A GENERAL-POPULATION COMPARISON AND SCREENING ISSUES When one takes into account the different contributions of 13lI from Hanford, NTS, and global fallout from weapons testing, everyone was exposed, so it was not possible to identify an unexposed control group. Concern has been expressed that a stuciv · . . ~· . . ~ , In Which everyone IS exposed IS not valid that an unexposed group is needed to assess the risk posed by Hanford 1311 fallout. However, under the weak assumption of a monotonic dose- response relationship (that is, other things being equal, the larger the dose the greater the thyroid-cancer risk), it is not necessary to have an unexposed control group to estimate the risk. The slope of

Evaluation of Statistical Data Analysis 95 the dose-response curve would provide a valid index of the risk even without an unexposed control group, provided that there is a sufficient range of doses and that the doses are estimated with reasonable accuracy. Problems in trying to define and use an unexposed control group are discussed below. The primary analyses of the cumulative incidence of thyroid cancer or other thyroid conditions were dose-response analyses of the subjects in the study. These analyses are appropriate to address the scientific questions regarding the association between IT and thyroid conditions, the magnitude of risk per unit dose, and the public-health question of how much risk was associated with Lit in the population of children who were downwind from Hanford. Another potential way to address the public-health question is to compare the incidence of thyroid cancer or other thyroid conditions with the incidence in unexposed populations. However, comparisons with an external, general population are fraught with problems. Persons living in various geographic areas might vary in their baseline risk of thyroid diseases because of differences in dietary iodine intake and other unknown factors. Perhaps more important, the rates of detected disease are based on examinations and depend on the methods and criteria of the examinations; this produces screening effects that cannot be readily disentangled to make meaningffi} comparisons with disease-rates from other geographic regions that did not have comparable screening. The HTDS investigators attempted to compare the number of thyroid cancers that they detected with the number expected in the general population. They reported that the observed number of thyroid cancers and the number expected in the general population were almost identical. To do that, they had to introduce a factor to account for their study group's having received a thorough thyroid screening, whereas the general population by and large has not received one. They chose a screening factor of 3, which had been reported in a 1985 monograph on radiation- induced thyroid cancer (NCRP, 19851. But that factor was based

96 Review of the HTDS Draft Final Report on only indirect evidence: specifically, the prevalence of nodules found by screening in two studies was multiplied by 0.! or 0.12 at various ages because a third study found that about 10-12% of nodules were malignant, and this result was compared with the incidence reported in a national survey, which proved to be one- third as high as the prevalence found in the two screening studies. That is a weak and questionable basis for choosing a multiplier of 3-there could have been unaccounted-for differences among the studies, and the screenings involved only palpation and not ultrasonography, as in the HTDS. More recent studies, which were available but not cited by the HTDS investigators, have produced different values for a screening factor. For example, the study of atomic-bomb survivors, which at different times involved only palpation or palpation plus ultrasonography, produced a screening factor of 2.5 (Thompson and others, 19941. A study in Chicago with a sensitive screening technique produced screening factors of about 7 for thyroid cancer and 17 for thyroid nodules (Ron and others, 1992~. The discrepancies in those values indicate that there is a great deal of uncertainty in the appropriate size of the screening factor, and the different values could allow one to conclude that those residing near Hanford had anywhere from a large deficit to a small excess of thyroid cancer. Hence, there is no unambiguous answer. The HTDS Draft Final Report does not indicate any attempt to compare the HTDS thyroid-nodule prevalence with that found in unirradiated populations. Reported prevalence rates in unirradiated groups are available from about a dozen studies in the literature, so, in principle, it is possible to do, although again there would be a question about comparability with respect to screening intensity. In summary, in the subcommittee's conclusions drawn from comparisons with general-population prevalence would probably have more uncertainty than those drawn from dose- response comparisons in the study population, so the HTDS

Evaluation of Statistical Data Analysis 97 investigators rightly chose to emphasize the internal comparisons rather than general-population comparisons. ANALYSES OF SOURCE OF PERSONAL-EXPOSURE INFORMATION One major component of the determination of individual I'll exposures was the milk-drinking habits of the study subjects. An attempt was made to interview a parent of each subject or other knowledgeable surrogate to obtain recollections of the milk consumption of the subject in childhood in terms of quantity and sources of milk at various ages. However, for 38°/O of the subjects it was not possible to interview a parent or surrogate, in which cases default assumptions were used in calculating thyroid dose. The defaults that the CIDER mode! used proved to result in considerable overestimates of the average dose derived from the reported milk consumption and sources. Specifically, the doses using default values were 40°/O higher than the average dose of those interviewed. For the critical group who were infants in 1945-1946, the discrepancy was even greater: the doses using default values were 77% higher than the average of interview- estimated ones. A table showing mean doses by amount of milk consumption in a given geo stratum would be illuminating in indicating the degree to which dose variations were driven by milk consumption versus geographic location. That is important for understanding the degree to which the study's negative results might have occurred because of lack of reliability or validity in the reported milk-consumption rate. If a large fraction of the variation in dose is attributable to milk-consumption variation, the random- error component of the dose estimates is probably large, considering that Dwyer and others (1989) found a correlation of only 0.3 between contemporaneous reports and long-term recall of milk-drinking habits; this implies that one would not be likely to detect a dose-response relationship. A similar table giving mean doses by source of milk information (interview versus defaults) and geo stratum would also be informative.

98 Review of the HTDS Draft Final Report Analyses that take into account the source of milk inflation are needed. The HTDS investigators did perform secondary analyses that used defaults for those without interviews on the basis of average reported quantity and sources of milk, and they indicated no association, but actual results were not presented. A useful analysis would examine associations using only those with interview information so as to yield results that minimize dose misclassification. Section 6 of this report describes the effect of milk-consumption measurement error on the statistical power of the study.