Methodological Observations on the Ranch Hand Study
Because the veterans of Operation Ranch Hand are among those with the highest exposure to herbicides in Vietnam, because that exposure is well documented by records and serum dioxin measurements, and because the group has been the focus of extensive epidemiological and clinical observation, the Ranch Hand study represents one of the best available opportunities for understanding the health effects of exposure to herbicides in Vietnam veterans. The committee thus reviewed all of the research reports concerning the Air Force's Ranch Hand study in great detail.
The committee is aware of the logistical difficulties involved in such a large and detailed data collection effort, and it applauds the efforts of the Air Force research team in that regard. In addition, the committee also acknowledges the efforts of the Ranch Hands themselves to participate in the follow-up study; they deserve a great deal of appreciation for their efforts. In reviewing the results of the Ranch Hand study, however, the committee identified a number of concerns with the way the data have been analyzed and presented. Ranch Hand analysts seem to have made a good faith effort to be thorough in their analyses and to address a number of potential criticisms in advance, but the report is difficult to read and interpret for a number of reasons (detailed below).
In spite of these defects, the committee recognizes the possibility that extensive reanalysis of the data would not uncover any consistent positive findings, although only actual reanalysis would determine this. The following observations are offered in a constructive spirit, with the hope that they
may serve as suggestions for future analyses and reports of the Ranch Hand data and those of similar groups.
PROBLEMS WITH THE REPRODUCTIVE EFFECTS STUDY
The comments that follow focus on the report dealing with reproductive outcomes (AFHS, 1992). The committee feels, however, that many of its observations have somewhat broader applicability to other aspects of the Ranch Hand study as well.
Tabular Presentation of Results
One general concern is that many of the tables and analyses are confusing and inadequately documented. The definitions of column and row labels in the tables relied on prior text so that the tables do not stand alone. There are p-values presented without adequate descriptions of the specific hypotheses being tested. Although a thorough reading of the methodology section (Chapter 1) provides the required information, interpretation of the tables would be facilitated by the inclusion of better "stand-alone" documentation in the form of footnotes and expanded row and column headings. Overall, the committee believes that some clarification, some simplification, and more emphasis on estimates of effect than on p-values would help the researchers communicate their important findings and would facilitate the appropriate peer review of this work.
Similarly, the committee feels that there is an overreliance on p-values in place of simple measures of effect such as relative risks. The executive summary, in particular, contains essentially none of the relevant quantitative data.
Exploration of an Overall Effect
A more specific concern involves the text on page i, describing overall differences between Ranch Hands and control groups. Initial findings, using unverified end points, showed that when looking at births occurring prior to the Ranch Hands' experience in Southeast Asia (pre-SEA), the comparison group had slightly elevated rates of adverse outcomes. When looking at post-SEA births, the reverse was true; that is, the rate of adverse birth outcomes was higher for Ranch hands. The 1992 report states that subsequent analyses of verified end points confirmed these findings, but the committee could find only one table reporting the findings, and this table did not provide any information specific to particular types of birth defects or several other outcomes.
The data in table 1-16 suggest that some aspect of Ranch Hand service
may have caused an increase in rates of birth defects among children born to Ranch Hands, and this should be explored further. However, it is not clear what caused the difference between Ranch Hands and the control group. For this reason, internal comparisons based on different exposure measurements among the Ranch Hands themselves are appealing. The use of multiple definitions of exposure (extrapolated baseline dioxin, current dioxin adjusted for time between SEA experience and dioxin assay, and categorized current dioxin) represents an admirable effort to explore several possible mechanisms. Presentation of the results of these analyses, however, is confusing. Some of the data were analyzed by using continuous dioxin levels, but the presentations in the tables use trichotomized values. It is not very clear whether the results reported refer to the continuous or the trichotomized values, for either past or current exposure (models 1 or 2). One possibility for future analyses would be to report (in addition to what has been done) logistic regression coefficients and odds ratios for the continuous exposure measurements, perhaps by looking at quadratic terms to address any concerns about nonlinear associations.
Exclusion and Categorization of Data Points
Most of the analyses, especially those using models 1 and 2, excluded the most relevant baseline data, namely those for Ranch Hands whose measured dioxin residues were no higher than background levels. In particular, many of the analyses were restricted to those veterans with current serum dioxin levels greater than 5 or 10 parts per trillion (ppt). This restriction essentially eliminates truly unexposed individuals from the analysis, and makes it difficult to identify positive statistical relationships. It is also arbitrary; other studies reviewed by the committee used various definitions of "background," including 3-4 ppt and 7 ppt.
In model 3, page 1-6, the authors claim that they eliminated people having between 10 and 15 ppt from the "categorized dioxin analysis in an attempt to avoid misclassification of the Unknown and Low categories." There are several potential problems with this strategy:
As with the omission of any data from an analysis, it is possible that this biased the analysis. The easiest way to judge whether this occurred would be to redo the analyses with those people included, perhaps as a separate category.
An analysis involving categories and not continuous dioxin levels could have been performed using quartiles or quintiles; 0-4, 5-9, 10-14, and 15-19 ppt, or some other categories. No subjects should be discarded from the analyses. These categorical variables could be treated as either nominal scale or ordinal scale in logistic regression models. The specific
contrasts between the highest- and the lowest-exposure categories could be accommodated in logistic regression models.
Dioxin levels were unavailable for a large proportion of the cohort. Specifically, 2,278 of 6,792, or 33.5 percent, of the births had no dioxin levels associated with them. The committee could not find any analysis comparing outcomes or other characteristics in those who did versus those who did not have dioxin levels available. The potential for bias clearly exists. At the very least, there is a severe reduction of statistical power.
Correlation of Results Within Subject
The analyses did not seem to account for the reported pair matching of comparisons to Ranch Hands. Some aspects of the study design, including the matching, were not described in enough detail in the executive summary. In addition, the analyses did not consider defects other than the "most serious" one when a child had multiple defects.
The analyses treat data on outcomes for all pregnancies fathered by the same subject as independent binary observations. Especially when restricted to "full sibling" subsets however, there are good a priori reasons to expect that the outcomes within subject are correlated. A large body of statistical methodology has been developed to accommodate such correlated binary data, stimulated by similar "intralitter" correlations in toxicology. For rare outcomes, it will probably matter very little, but it could be important for more common ones. Tests for such "extrabinomial" variation are available. Because of the high a priori likelihood that extrabinomial variation exists, however, methods that accommodate such variation should be used whenever they yield results different from those obtained by standard methods, which assume independence, even when the tests of that variation do not provide statistically significant results. The generalized estimating equation methods of Zeger and colleagues (Zeger and Liang, 1986; Zeger et al., 1988) are appropriate in such circumstances.
There is extensive use of statistical interaction terms in models in the analysis that may have rendered those models difficult or impossible to interpret. Statistical interactions are subtle, but they may be obscuring some of the findings. In general, the concern is that a model containing a main effect for a covariate (such as age of the father), a main effect for dioxin level, and an interaction term, could easily be misinterpreted. If the interaction term is statistically nonsignificant, and if it is highly correlated with dioxin level, which is a very reasonable scenario, the statistical significance
for the main effect term for dioxin is uninterpretable. In particular, it is possible that a model with both main effects present, but no interaction term, could reveal a statistically significant effect of dioxin level, but this result could spuriously seem to disappear in the model with the interaction term. This problem is of particular concern for model 2, which includes a main effect for the time between SEA experience and dioxin level, a main effect for dioxin level, and the interaction. Such a model could obscure any effect of dioxin level.
Analyses of Conception Rates
One intriguing finding in the Ranch Hand report is the increase in the average number of post-SEA conceptions with increasing dioxin exposure. To be able to interpret this result, one needs much more basic descriptive information on how these three exposure groups differ on relevant characteristics, particularly data on year of entry into SEA service. If, as seems possible, those who had the highest dioxin residues tended to enter SEA service earlier so that they had the chance for longer or repeat tours, the higher number of conceptions could be explained by the longer period since their return during which they were ''at risk" for conception. It is standard practice in demography to take account of period at risk when analyzing fertility rates in different populations, and the same methods would be appropriate here.
Inclusion of Pre-SEA Births
In one of the models, the analysis included all births whether they occurred prior to or following the subject's SEA experience. There was a main effect term in each model for "pre/post SEA," a main effect for dioxin, and the interaction term between these. The p-values reported in these pre/post models apparently represented tests of the interaction terms, which the committee views as of only secondary importance. The hypothesis being tested is whether the effect of dioxin (if any) is the same for pre-SEA births as for post-SEA births. Presumably, if an effect of subsequent dioxin exposure on pre-SEA births were detected, this would be a result of bias in the study design, confounding, or both. Although this hypothesis is of some interest, the report did not appear to include an estimate and test of the effect of dioxin in the post-SEA births for these models. The presentation of the p-values conveys the impression that the dioxin effect is nonsignificant. Although this may be true as well, that is not what the committee understood those p-values to be testing, and it was not completely clear from the tables exactly what the p-values do represent. It would be preferable to analyze only post-SEA birth, create a history variable (History of
adverse reproductive outcomes prior to SEA experience? Yes or no?), and include that variable in the logistic regression model.
Lack of Documentation of Algorithms for Adjustment for Covariates
On page 1-7 the authors say, "When appropriate, analyses were adjusted for as many as 8 covariates," but they do not state the criteria for deciding when it was appropriate. In order to interpret the results, it is necessary to know how the decision was made as to which covariates were or were not included in any particular analysis.
Table 3-21 (page 3-22) provides an example: The unadjusted model shows no significant association between (log) initial dioxin and miscarriages. In the adjusted analyses, a significant (or nearly significant) effect of service occupation is found. Since service occupation was highly correlated with dioxin levels, it is unclear how to interpret the finding of no effect of dioxin in the adjusted models. Adjustment of an exposure variable for a highly correlated variable that may, in fact, represent a surrogate measure of exposure is not generally recommended and may have obscured a positive relationship.
SUMMARY OF STUDY
In summary, several issues need to be addressed in future analyses of the Ranch Hand reproductive effects data. The technical issues raised above, particularly the restriction to those with current dioxin levels above background and the potential misapplication and/or misinterpretation of interaction terms, are of concern because of the conclusions drawn from them. The problems with the analyses could lead to apparently inconsistent findings with regard to the estimated effect of dioxin on outcomes. These apparent inconsistencies might or might not reflect the actual situation. Thus, the conclusion drawn in the report that the inconsistent findings do not support an effect of dioxin may not be valid, since the inconsistencies may simply reflect problems with the analyses. On the other hand, it is very possible that reanalysis of these data, according to the recommendations listed above, would not lead to any substantively different conclusions.
Further exploration of the overall differences between Ranch Hands and the comparisons should also be pursued. Some aspect of the Ranch Hand experience seems to have increased the risk of fathering children with birth defects, but the implications of this finding are unclear. If reanalyses of the dioxin data still show that dioxin is not associated with increased risk of birth defects, a great deal of careful exploratory analysis will be required to sort out exactly what the cause of the increased risk might be.
Air Force Health Study. 1992. An Epidemiologic Investigation of Health Effects in Air Force Personnel Following Exposure to Herbicides. Reproductive Outcomes. Brooks AFB, TX: Armstrong Laboratory. AL-TR-1992-0090. 602 pp.
Zeger SL, Liang KY. 1986. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121-130.
Zeger SL, Liang KY, Albert PS. 1988. Models for longitudinal data: a generalized estimating equation approach. Biometrics 44:1049-1060 (published erratum appears in Biometrics 1989 45:347).