11
Discussion

THE STUDY’S STRENGTHS AND WEAKNESSES

Strengths

The quality of a study of this magnitude and complexity is not easily characterized. The strengths of this study include a relatively large initial cohort of participants and an equally large cohort of nonparticipant controls. After much effort, most of the members of these two cohorts were identified well enough to permit a relatively complete follow-up. With less than 6 percent of subjects lacking Social Security numbers (SSN), except for group B, we can be fairly confident that the combination of Department of Veterans Affairs (VA) and National Death Index (NDI) mortality follow-up is quite complete (Sohn et al., 2006).

Eventually, we were also able to obtain addresses and telephone numbers on a large majority of potential health survey respondents. We had a reasonably broad health survey instrument that included the SF-36 assessment of general health, allowing comparisons to national, normed data. The entire mail questionnaire, accompanying material, and telephone interview script (such as veteran service organizations’ [VSO] endorsement letter) were reviewed and approved by the National Academies’ Committee to Review Studies of Human Subjects committee.

Shortcomings: Response Rates

On the other side of the ledger are the study’s shortcomings. Primary among these is the low response rate to the health survey, only about 53 percent. Of additional concern is the fact that the response rate of participants (61 percent) was higher than that of controls (47 percent). Part of the reason for these low response rates was our inability to contact potential respondents. This is not a problem for our study alone. A very large survey of recently separated military veterans estimated that roughly 15 percent were not contactable (Ryan et al., In press), and we were further handicapped because we were trying to locate and trace a cohort of veterans who had been out of service for decades. On the other hand, our pilot study using FedEx delivery seemed to indicate that lack of response was probably not due to a bad address.

Although we saw few differences between survey respondents and nonrespondents when we examined available demographic data, the lack of evidence of differences is not very strong evidence of a lack of differences. With an overall response rate of 53 percent, we can not be confident that we have a complete picture of the health



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 68
Long-Term Health Effects of Participation in Project SHAD (Shipboard Hazard and Defense) 11 Discussion THE STUDY’S STRENGTHS AND WEAKNESSES Strengths The quality of a study of this magnitude and complexity is not easily characterized. The strengths of this study include a relatively large initial cohort of participants and an equally large cohort of nonparticipant controls. After much effort, most of the members of these two cohorts were identified well enough to permit a relatively complete follow-up. With less than 6 percent of subjects lacking Social Security numbers (SSN), except for group B, we can be fairly confident that the combination of Department of Veterans Affairs (VA) and National Death Index (NDI) mortality follow-up is quite complete (Sohn et al., 2006). Eventually, we were also able to obtain addresses and telephone numbers on a large majority of potential health survey respondents. We had a reasonably broad health survey instrument that included the SF-36 assessment of general health, allowing comparisons to national, normed data. The entire mail questionnaire, accompanying material, and telephone interview script (such as veteran service organizations’ [VSO] endorsement letter) were reviewed and approved by the National Academies’ Committee to Review Studies of Human Subjects committee. Shortcomings: Response Rates On the other side of the ledger are the study’s shortcomings. Primary among these is the low response rate to the health survey, only about 53 percent. Of additional concern is the fact that the response rate of participants (61 percent) was higher than that of controls (47 percent). Part of the reason for these low response rates was our inability to contact potential respondents. This is not a problem for our study alone. A very large survey of recently separated military veterans estimated that roughly 15 percent were not contactable (Ryan et al., In press), and we were further handicapped because we were trying to locate and trace a cohort of veterans who had been out of service for decades. On the other hand, our pilot study using FedEx delivery seemed to indicate that lack of response was probably not due to a bad address. Although we saw few differences between survey respondents and nonrespondents when we examined available demographic data, the lack of evidence of differences is not very strong evidence of a lack of differences. With an overall response rate of 53 percent, we can not be confident that we have a complete picture of the health

OCR for page 68
Long-Term Health Effects of Participation in Project SHAD (Shipboard Hazard and Defense) of the Project SHAD (Shipboard Hazard and Defense) participants or their controls. Yet the link between non-response rates and nonresponse bias is far from simple. A recent study of the link between nonresponse rates and nonresponse bias looked at 30 articles that reported 235 separate estimates of nonresponse rates (Groves, 2006). The mean nonresponse rate was 35 percent, fairly close to the rate among the Project SHAD participants in our study. Further analyses showed that a survey’s nonresponse rate was not in itself a good predictor of nonresponse bias, but that “nonresponse biases should be expected to vary across estimates within the same survey. The biases are heavily influenced by the covariance between response propensities and particular survey variables” (Groves, 2006). Thus, we cannot conclude very much about nonresponse bias based on our survey nonresponse rate of 39 percent for Project SHAD participants and 53 percent for controls. One further complication that might have affected the nonresponse rate was the use of three different contact letters, depending on potential Project SHAD exposure, a requirement of the National Academies’ institutional review board (IRB). Although there is a clear argument for the use of three different contact letters, so that survey subjects who were previously unaware of their potential Project SHAD exposures were made aware of them, we can not be sure that the use of three different letters did not unwittingly contribute to differential responses. Certainly, the response rate for controls was lower than for participants, and this might have been due partly to the contact letter. Shortcomings: Survey Content We were further handicapped in our health survey by a lack of well-defined end points for study. Although we commissioned a series of literature reports on the potential health effects of the various agents, we did not identify a clear, unambiguous list of potential health end points whose presence might be attributable to earlier exposure to some Project SHAD agent, either active agent or simulant. We also made a survey of the classified material on Project SHAD, convincing ourselves that nothing essential had been overlooked. In the end, we mounted a fairly comprehensive survey of general health and of a wide variety of medical diagnoses and symptoms. A downside to the large number of questionnaire items is that there are a large number of outcomes to examine statistically, creating a problem with multiple comparisons. That is, the more statistical tests one performs, the greater the chance of observing so-called statistically significant differences that are actually due to chance. We dealt with this problem in part by using a number of summary measures, in effect reducing the number of statistical comparisons. Nonetheless, the large number of statistical comparisons increases the odds for chance findings, and following what we believe is current good epidemiologic practice, we did not adjust for multiple comparisons. We had the additional complication of a multimode health survey, consisting of a mail questionnaire and a telephone interview. We also saw some substantial differences between mail questionnaire and telephone interview respondents, including a statistically significant difference in SF-36 mental component summary (MCS) score. In a national survey of health status, investigators compared mail and telephone survey respondents, finding that self-reported health measures, including SF-36 scales, were worse for mail than for telephone respondents (McHorney et al., 1994), a finding that mirrors our own. Many investigators have studied the shortcomings of self-reported health data, typically by comparing self-reported data to similar data from another source, such as medical records. A recent study of a rural Canadian population found that health survey information agreed well with medical chart information for diabetes, heart problems, hypertension, and breathing problems (Voaklander et al., 2006). Poor agreement was observed for diagnoses of depression, back problems, eye problems, stroke, walking problems, and bone and joint problems. These findings were similar to those in other studies. A Mayo Clinic study found good agreement between health questionnaire responses and medical records for diabetes, hypertension, myocardial infarction, and stroke, but not for heart failure (Okura et al., 2004). In this study, factors associated with higher agreement included age under 65 years and education greater than 12 years. We should note that because we used self-reported data from both Project SHAD participants and controls, we expect that the shortcoming of self-reported data would apply to both cohorts and would thus not materially affect our comparisons. Finally, because there was concern about an overreporting of symptoms, we included an item on earlobe pain, a symptom not thought to have a physiologic basis. We found rates ranging from 3–6 percent, with higher (but not

OCR for page 68
Long-Term Health Effects of Participation in Project SHAD (Shipboard Hazard and Defense) statistically significant) rates among participants. A study of Gulf War veterans (Knoke et al., 2000) found a rate of self-reported earlobe pain of 1.2 percent among deployed Gulf War veterans and a rate of 0.2 percent among nondeployed Gulf War–era veterans, both lower than our reported rates. Summary and Interpretation of Results Mortality We found no statistically significant difference in all-cause mortality between participants and controls in any of the four analysis groups, nor for the total comparison. Indeed, hazard ratios for all-cause mortality were less than 1.0 in group C and very close to 1.0 in groups A and D. However, heart disease mortality was significantly elevated overall and in groups A and B. The lack of a biological basis for this finding, together with the lack of data on cardiovascular risk factors, makes this finding difficult to interpret. There was a significant elevation of cancer mortality among group B participants as well, with the same difficulties in interpretation. Generally, hazard ratios associated with Project SHAD participation were not so large as for other significant factors, such as pay grade, but Marines in group B had significantly higher mortality than did Navy subjects. All-cause standardized mortality ratios (SMRs) were significantly greater than 100 for all participants and all controls combined, but were close to 100 for all participant and control analysis groups save for group B participants, indicating that mortality was close to that expected in the U.S. general population. Cancer SMRs were statistically significantly higher for group A controls and all controls, with most of this excess due to lung cancer; SMRs for non-cancer respiratory disease were not significantly different from 100 in these two groups. SMRs for injuries and external causes of death were also significantly low among all participants. We must note that causes of death were not available for deaths prior to 1979, and so our cause-specific mortality analyses are incomplete. Morbidity: SF-36 In general, although many differences in SF-36 summary scores between participants and controls were statistically significant, most were generally small, around 1 to 2 points. Interestingly, the smallest differences were seen in group C, the only group with potential exposure to active agents. SF-36 summary scores in our study were smaller than age- and sex-specific national norms, indicating that our subjects reported themselves to be less well than did comparable U.S. males. In contrast, veterans aged 50–64 in the Veterans Health Study, who were receiving VA outpatient care, had an average physical component summary (PCS) score of 37.2 and an average MCS score of 47.0 (Payne et al., 2005), both of which are substantially lower than the participant or control scores in our study. We made two attempts to look at level of exposure, one in group A and one in group B. Group A participants made up the largest of the groups and contained only men with potential exposure to either Bacillus globigii (BG) simulant agent or methylacetoacetate (MAA). The conduct of the tests made it possible to estimate independently the health effects associated with BG and with MAA. Once again, we found small but statistically significant differences, but when we attempted to analyze the effect of the number of tests as a proxy for exposure, there was no clear gradient. We further looked at the individual numbers of tests at which a participant might have been exposed to either BG or MAA and again found no clear exposure gradient. However, we did find statistically significant coefficients for linear trend for both BG and MAA for both PCS and MCS scores, evidence that PCS and MCS scores were statistically significantly lower with each additional test in which there was potential exposure to either BG or MAA. On the other hand, when estimating the effects of BG and MAA exposure controlling for the total number of Project SHAD tests, the statistically significant effects of BG and MAA all disappeared, whereas the differences in SF-36 summary scores by total number of tests was statistically significant. It appears that for group A participants, the number of Project SHAD tests is a more important factor than putative exposure to either agent. Only for a subsample of group B participants did we have individual exposure data that were recorded (ordinal) levels of contamination by trioctyl phosphate (TOF), a simulant with low toxic potential. We were unable to

OCR for page 68
Long-Term Health Effects of Participation in Project SHAD (Shipboard Hazard and Defense) obtain precise, numeric exposure estimates and so analyzed these data by arbitrarily assigning numeric doses to the ordinal levels measured (e.g., trace = 0.5, very light = 1.0, and so on). We found no evidence that our ordinal exposure levels were associated with either SF-36 summary health measures. Morbidity: Other Outcomes Project SHAD participants across all groups had higher somatization scores than did controls, based on a total of 12 items, with adjusted participant scores ranging from 2.2 to 3.8 and control scores ranging from 1.7 to 3.0. In a study of military volunteers at Edgewood Arsenal (many from the Vietnam era) exposed to anticholinesterase agents (Page, 2003), the average somatization score for a 20-item scale was 5.15, with military volunteers exposed to other agents having a score 5.00, and volunteers unexposed to chemical agents having an average score of 5.33. If we prorate the Edgewood results to estimate a 12-item score, their prorated scores would have averaged around 3.0. Thus, the somatization scores we observed among Project SHAD participants were close to those in the earlier study. We also saw statistically significantly higher adjusted memory and attention problem scores among participants in all but group C, with adjusted participant scores ranging from 8.0 to 11.6 and control scores ranging from 4.5 to 7.2. In the same study of military volunteers at Edgewood Arsenal exposed to anticholinesterase agents (Page, 2003), the average memory and attention scores were 7.2 and 7.7, respectively. The average scores for military volunteers exposed to other agents were 7.5 and 8.3, respectively, while volunteers unexposed to chemical agents had an average score of 7.2 and 7.7. Compared to these Edgewood results, the scores for Project SHAD participants were slightly higher, while those for controls were roughly the same, with the exception of the group B participants. In the Edgewood study, differences attributable to experimental exposure between the anticholinesterase and control subjects ranged from –0.60 to +0.31, while differences attributable to nonexperimental exposure were substantially larger, 0.92 and 1.12. The differences we observed between Project SHAD participants and controls are more in line with the nonexperimental differences seen in the Edgewood study. Project SHAD participants reported higher prevalence rates of medical conditions than did controls, although not all these differences were statistically significant. Respiratory conditions were significantly higher in all groups but D, and psychological conditions in all groups but C. All participant groups reported higher rates of neurodegenerative disease, with some moderately high adjusted odds ratios, but most of these conditions were unspecified, making interpretation difficult. Project SHAD participants similarly reported higher prevalence levels for symptoms of many kinds. This includes higher rates of earlobe pain, an item without a clear medical basis. There were no statistically significant differences in self-reported hospitalization rates between participants and controls, and rates of self-reported birth defects were similar for participants and controls except for group D, with a 2.4 odds ratio. We note, however, that the self-reported rate of birth defects among group D participants was similar to the rate in other groups of participants, and thus the higher odds ratio is attributable to a markedly lower self-reported rate among group D controls. We did not have sufficient data to do an agent-specific analysis in group D. Conclusions In conclusion, we saw no difference in all-cause mortality between Project SHAD participants and non-participant controls, and although participants had a statistically significantly higher risk of death due to heart disease, that lack of cardiovascular risk factor data as well as biological plausibility makes this latter difference difficult to interpret. We found overall deaths rates that were higher in both all participants and all controls than the U.S. population, as well as a higher cancer death rate among all controls, mostly attributable to lung cancer. We also found overall worse reported health in participants, but no consistent, specific, clinically significant patterns of ill health. Both PCS and MCS scores of the SF-36 were lower among participants than controls, but these differences were small in magnitude. Group C, the only group with potential exposure to active chemical or biological agents, reported the smallest differences. We also saw small but statistically significant increases in self-reported memory and attention problems as well as somatization scores. Project SHAD participants reported higher levels

OCR for page 68
Long-Term Health Effects of Participation in Project SHAD (Shipboard Hazard and Defense) of neurodegenerative medical conditions, but most of these were of an unspecified nature, and participants also reported nearly uniformly higher rates of symptoms, including a symptom without an apparent medical basis, thus raising the question of reporting bias. There were no significant differences in self-reported hospitalization, and in one group (group D), participants reported a higher rate of birth defects than controls; however, this significant difference can be attributed to an unusually low control rate rather than a high rate among participants. While we have found no clear evidence of specific health effects that are associated with Project SHAD participation, we must remark that this does not constitute clear evidence of a lack of health effects. Although the sample seems large, some of the exposure groups are indeed rather moderate in size, and the lack of specific a priori hypotheses of health effects becomes a real limitation. If there were, for example, very specific, targeted effects on a particular organ system, but with a relatively low prevalence, our relatively coarse grouping of health outcomes might well have missed finding such a specific effect. Were future research to be conducted, several items could be of potential interest. First, some way to reduce nonresponse bias should be considered. The collection of clinical, rather than self-report data, might also be contemplated. Included in this might be a records-based study of birth defects in these subjects; because many Project SHAD ships operated out of Pearl Harbor, data from the Hawaii Birth Defects Program might prove useful. Also of potential interest would be the collection and analysis of cause of death data for early (pre-1979) deaths. Other, similar possibilities would include linkages with population-based cancer registries, the VA’s inpatient database (PTF), and with the Medicare database for subjects 65 years of age or older. These same data sources would provide information to validate self-reported health outcomes. Another way to deal with nonresponse bias would be to mount a separate survey of nonrespondents. A better method of dealing with exposure data is always welcome in this kind of study, but the lack of exposure-related difference in our group A and group B analyses shows that this may not yield important results. Finally, further analyses of already collected data could be undertaken, especially if some ancillary risk factor data were added, such as service in Vietnam and combat service in Vietnam. These kinds of analyses might also be focused on the group B Marines in this study, who had significantly higher mortality than Navy personnel, adjusting for age, participation status, race, and pay grade. Marines in group B also had significantly lower PCS and MCS scores, with a large (more than 9-point) difference in MCS scores. Although these latter findings are not related to the original charge of the study, to examine the effects of Project SHAD participation per se, they may warrant some further investigations. REFERENCES Groves, R. M. 2006. Nonresponse rates and nonresponse bias in household surveys, Public Opinion Quarterly 70:646-675. Knocke, J. D., T. C. Smith, G. C. Gray, K. S. Kaiser, and A. W. Hawksworth. 2000. Factor analysis of self-reported symptoms: Does it identify a Gulf War syndrome? American Journal of Epidemiology 152:379-388. McHorney, C. A., M. Kosinski, and J. E. Ware, Jr. 1994. Comparisons of the cost and quality of norms for the SF-36 health survey collected by mail versus telephone interview: Results from a national survey. Medical Care 32:551-567. Okura, Y., L. H. Urban, D. W. Mahomey, S. J. Jacobsen, and R. J. Rodeheffer. 2004. Agreement between self-report questionnaires and medical record data was substantial for diabetes, hypertension, myocardial infarction and stroke but not for heart failure. Journal of Clinical Epidemiology 57:1096-1103. Page, W. F. 2003. Long-term health effects of exposure to sarin and other anticholinesterase chemical warfare agents. Military Medicine 168:239-245. Payne, S. M., A. Lee, J. A. Clark, W. H. Rogers, D. R. Miller, K. M. Skinner, X. S. Ren, and L. E. Kazis. 2005. Utilization of medical services by Veterans Health Study (VHS) respondents. Journal of Ambulatory Care Management 28:125-140. Ryan, M. A. K., T. C. Smith, B. Smith, P. D. Amoroso, E. J. Boyko, G. C. Gray, G. D. Gackstetter, J. R. Riddle, T. S. Wells, G. R. Gumbs, T. E. Corbeil, and T. I. Hooper. (In press). Enrollment in the Millennium Cohort begins a 21-year contribution to understanding the impact of military service. Journal of Clinical Epidemiology. Sohn, M., N. Arnold, C. Maynard, and D. M. Hynes. 2006. Population Health Metrics 4:2. Voaklander, D., H. Thommasen, and A. Michalos. 2006. The relationship between health survey and medical chart review results in a rural population. Social Indicators Research 77:287-305.