10
Characterization of the Cohorts and Analysis Plan
The overall goal of the analysis is to compare mortality among CROSSROADS participants with that among controls. In this chapter, we first describe the nature of the data on which that analysis rests and then describe the multivariate analysis plan itself. In an earlier chapter we presented data from the study, since some of the analytic strategies are influenced by our knowledge of our data's quality and idiosyncrasies. Data-based findings relating to the multivariate exposure-outcome relationships are presented in the following chapter (Chapter 11).
We have described in earlier chapters the detailed data collection plans and the practical adjustments that were necessary during their implementation. Refer to Chapter 5 for information on data sources. Chapters 6 and 7 describe who is in the analysis, while Chapters 8 and 9 discuss what data items are used and in what manner.
Data Decisions Taken Before Analysis
Restricting Data Elements or Sample Definition
In the overall attempt to balance the validity, precision, understandability, and usefulness of the analyses in this report, we made the following decisions, based mostly on issues of data availability:
- Use the 1994 participant database provided by the Defense Nuclear Agency (DNA). (See Appendix E for a detailed description of procedures undertaken to validate the completeness of the participant roster.)
- Restrict analyses to Navy participants and controls (see Chapter 6).
- Do not include in participant cohort those individuals who came on duty in the CROSSROADS area of operations after the officially designated period of the operation (see Chapter 6).
- Exclude from the control cohort those also in the participant cohort (see Chapter 6).
- Include participants and controls who have other-than-CROSSROADS nuclear test participation (see Chapter 6).
- Use male mortality rates, since the control cohort is almost totally male (gender generally was not recorded on the research files for participants), when making comparisons to standard populations (e.g., the U.S. population for specific years). However, do not exclude female military personnel from the participant or control study cohorts.
- Do not use dosimetry; use exposure surrogate variables (see Chapter 8).
Data Cleaning and Variable Development
- Code vital status outcome as a dichotomy: "Known Dead" and "Not Known to be Dead." The latter included participants and controls known to be alive, having date of death after prearranged study cut-off (31 December 1992), and others for whom no death confirmation was obtained through the Department of Veterans Affairs (VA), who were presumed to be alive (see Chapter 9).
- For VA claims folders that did not contain a date of death from VA records or an acquired death certificate and that had been transferred from VA to a Federal Archives Record Center (FARC), use date of folder transfer to calculate an estimated date of death (see Chapter 9).
- Because military records used only two digits to designate year of birth, assign a century-of-birth prefix of "l9" to years of birth 00 to 30 and the prefix "18" to years 31 to 99.
- Create a "boarder" variable to include participants assigned in the appropriate time period to one or more units known to be a target ship, a radiation safety unit, or a boarding team (see Chapter 8).
- Consolidate occupational specialty information into a two-level analysis variable (Engineering & Hull enlisted, other enlisted) to capture hypothesized exposure differences (see Chapter 8).
- Because of small numbers (paygrade and rank) or unavailable information (occupation), consider all officers as one category.
Univariate Descriptions of Study Population
As we describe later in this chapter, we base our inferential comparisons on data adjusted for confounding influences on exposure-mortality relationships. Here, we present univariate (unadjusted) descriptive statistics on the variables used in later models. This information supports our belief that the Navy participant and control cohorts are similar in characteristics we can measure. Tables 10-1 through 10-3 show the numerical balance between participant and control cohorts for age, rank (or rating), and occupational specialty, respectively, for Navy personnel.
TABLE 10-1. Age-at-Shot Distribution of Navy Participants and Controls
|
|
Participant Cohort |
Control Cohort |
||
Agea |
Years in Interval |
No. |
% |
No. |
% |
≥ 16 and < 21 |
5 |
23,081 |
59.7 |
19,511 |
55.7 |
≥ 21 and < 26 |
5 |
9,504 |
24.6 |
9,365 |
26.7 |
≥ 26 and < 36 |
10 |
4,730 |
12.2 |
5,053 |
14.4 |
≥ 36 and < 46 |
10 |
1,134 |
2.9 |
962 |
2.7 |
≥ 46 and < 56 |
10 |
179 |
0.5 |
134 |
0.4 |
≥ 56 and < 66 |
10 |
11 |
0.0 |
2 |
0.0 |
≥ 66 |
2 |
2 |
0.0 |
1 |
0.0 |
Missing |
|
27 |
0.1 |
8 |
0.0 |
Totalb |
|
38,668 |
100 |
35,036 |
100 |
a See Chapter 9 for discussion of the age variable. b Mean age-at-shot for Navy participants is 22.06 years; for controls, 22.50. |
TABLE 10-2. Distribution of Ranks and Ratings Among Navy Participants and Controls
|
Participant Cohort |
Control Cohort |
||
Paygrade* |
No. |
% |
No. |
% |
E1 Junior Enlisted |
12 |
0.0 |
10 |
0.0 |
E2 Junior Enlisted |
10,624 |
27.5 |
9,773 |
27.9 |
E3 Junior Enlisted |
7,321 |
18.9 |
6,616 |
18.9 |
E4 Midlevel Enlisted |
5,377 |
13.9 |
4,881 |
13.9 |
E5 Midlevel Enlisted |
4,917 |
12.7 |
4,225 |
12.1 |
E6 Senior Enlisted |
4,316 |
11.2 |
3,752 |
10.7 |
E7 Senior Enlisted |
2,718 |
7.0 |
2,330 |
6.7 |
W1 (Warrant) Officer |
70 |
0.2 |
0 |
— |
W2 (Warrant) Officer |
425 |
1.1 |
477 |
1.4 |
W3 (Warrant) Officer |
1 |
0.0 |
0 |
— |
W4 (Warrant) Officer |
1 |
0.0 |
0 |
— |
O1 (Commissioned) Officer |
934 |
2.4 |
1,186 |
3.4 |
O2 (Commissioned) Officer |
719 |
1.9 |
786 |
2.2 |
O3 (Commissioned) Officer |
526 |
1.4 |
491 |
1.4 |
O4 (Commissioned) Officer |
338 |
0.9 |
259 |
0.7 |
O5 (Commissioned) Officer |
194 |
0.5 |
167 |
0.5 |
O6 (Commissioned) Officer |
157 |
0.4 |
74 |
0.2 |
O7 (Commissioned) Officer |
0 |
— |
0 |
— |
O8 (Commissioned) Officer |
10 |
0.0 |
1 |
0.0 |
O9 (Commissioned) Officer |
4 |
0.0 |
0 |
— |
O10 (Commissioned) Officer |
2 |
0.0 |
0 |
— |
Missing |
2 |
0.0 |
8 |
0.0 |
Total |
38,668 |
100 |
35,036 |
100 |
* See Chapter 8 for description of paygrade, rank, and rate. |
TABLE 10-3. Distribution of Occupational Specialties Among Navy Participants and Controls
|
Participant Cohort |
Control Cohort |
||
Occupation* |
No. |
% |
No. |
% |
Administrative and clerical |
5,248 |
13.6 |
4,531 |
12.9 |
Aviation |
920 |
2.4 |
519 |
1.5 |
Construction |
322 |
0.8 |
285 |
0.8 |
Deck |
2,444 |
6.3 |
2,204 |
6.3 |
Dental |
1 |
0.0 |
0 |
— |
Electronics |
415 |
1.1 |
382 |
1.1 |
Engineering & Hull |
9,399 |
24.3 |
8,756 |
25.0 |
Medical |
639 |
1.7 |
582 |
1.7 |
Miscellaneous |
459 |
1.2 |
401 |
1.1 |
Ordnance |
1,455 |
3.8 |
1,343 |
3.8 |
Precision equipment |
4 |
0.0 |
3 |
0.0 |
Seaman |
13,776 |
35.6 |
12,481 |
35.6 |
Steward |
213 |
0.6 |
101 |
0.3 |
Unknown |
3,371 |
8.7 |
3,428 |
9.8 |
Missing |
2 |
0.0 |
20 |
0.1 |
Total |
38,668 |
100 |
35,036 |
100 |
* See Chapter 8 for description of Navy occupational specialties. |
Participants are labeled as Nonboarding and Boarding and are present in the cohorts in numbers shown in Table 10-4.
TABLE 10-4. Distribution of Boarders in the Study Cohort
|
Participant Cohort |
Control Cohort |
||
Boarder* |
No. |
% |
No. |
% |
Yes |
8,996 |
23.3 |
0 |
0 |
No |
29,672 |
76.7 |
35,036 |
100 |
Total (participants only) |
38,668 |
100 |
35,036 |
100 |
* See Chapter 8 for discussion of boarder variable. |
Missing Data
Imputation of Fact and Date of Death
Because we classified approximately 500 individuals as "Dead" solely because their VA claims folder had been transferred to a Federal Archive Records Center (FARC), we devised a test to determine whether such imputation of fact and date of death was justifiable. Looking at records with both noted dates of death and FARC transfer dates, we determined that there is a
definite relationship between year of death and the date a claims folder is transferred to a FARC. We used the year-specific lag time for those known pairs to impute a lag time for date of death for those with only the FARC transfer date. For records transferred between 1956 and 1962, we adjusted the date to represent a death six years earlier. For 1963 to 1971 transfers, we used a three-year adjustment; for 1972 to 1985, two years; and for 1986 to 1995, one year.
Imputation of Date of Birth
For 1,448 otherwise complete Navy records, date of birth was missing. Since the proportional hazards and standardized mortality analyses we used require age information, we devised a date-of-birth imputation procedure. For individuals with known dates of birth in the participant and control cohorts, date of birth was associated with paygrade and military rating. We therefore used a missing data imputation technique (hot deck technique, Naus 1975) to assign a date of birth from a randomly selected member of the cohort. Records were first matched according to exact rating (e.g., Seaman) and paygrade (e.g., E2); those having no matching rated individual were assigned based on paygrade.
Summary
Table 10-5 describes the extent and distribution of missing data items by the important analysis categories of exposure and outcome.
TABLE 10-5. Number and Percent of Records With Missing Needed Data Item
|
|
Participant Cohort |
Control Cohort |
||
Characteristic |
Denominator |
No. |
% |
No. |
% |
Date of birth without imputation |
All |
873 |
2.25 |
610 |
1.74 |
Date of birth with imputations |
All |
27 |
0.07 |
8 |
0.02 |
Occupation |
Enlisted |
2 |
0.01 |
8 |
0.03 |
Paygrade |
All |
2 |
0.01 |
8 |
0.02 |
Date of death without imputation |
Dead |
401 |
3.32 |
166 |
1.54 |
Date of death with imputations |
Dead |
17 |
0.14 |
20 |
0.19 |
Cause of death |
Dead |
1,650 |
13.64 |
1,146 |
10.61 |
Completeness of Vital Status Ascertainment
Because recorded vital status is the main outcome in this study, differences in success in its ascertainment could distort the association we observe between exposure and that outcome. We discuss this in great detail in the preceding
chapter (Chapter 9). In Tables 10-6 and 10-7 we present data on the follow-up of mortality status of participants and controls. Subjects are divided into those (1) known to have died and who have coded cause of death, (2) known to have died but with no cause of death available, (3) presumed alive (i.e., found on the Beneficiary Identification and Records Locator Subsystem [BIRLS] without a date of death or a FARC location), and (4) those not found on BIRLS, whom we consider lost to follow-up.
TABLE 10-6. Vital Status on Follow-Up
|
Participants |
Controls |
||
Vital Status on Follow-Up |
No. |
% |
No. |
% |
Dead |
12,093 |
31.3 |
10,806 |
30.8 |
Presumed alive |
21,770 |
56.3 |
20,319 |
58.0 |
Lost to follow-up |
4,805 |
12.4 |
3,911 |
11.2 |
Total |
38,668 |
100 |
35,036 |
100 |
TABLE 10-7. Information Available on Deaths
|
Participants |
Controls |
||
Data Available on Deaths |
No. |
% |
No. |
% |
Date and Cause |
10,436 |
86.3 |
9,649 |
89.3 |
Date only |
1,639 |
13.6 |
1,135 |
10.5 |
Cause only |
7 |
0.1 |
10 |
0.1 |
Neither Cause nor Date |
10 |
0.1 |
10 |
0.1 |
Blank |
1 |
0.0 |
2 |
0.0 |
Total Dead |
12,093 |
100 |
10,806 |
100 |
Mortality Comparisons
The overall goal of the analysis to is compare mortality among CROSSROADS participants with that among controls. Under the null hypothesis, which is usually defined as the absence of an association, there would be no differences in mortality rates between the participants and the controls. In particular, if participation at CROSSROADS had no effect, we would find no significant difference in overall mortality.
A secondary hypothesis arises from concerns that radiation exposure at CROSSROADS could be the cause of any effect that may be seen among participants relative to nonparticipating controls. Under this null hypothesis there would be no significant trend observed across boarding party participants (more exposure surrogate), non-boarding-party participants (less exposure surrogate), and nonparticipant controls (no exposure surrogate) in all-malignancy or leukemia mortality. Similarly, mortality experience in the
Engineering & Hull exposure group would be no different than that in the other enlisted group.
Multivariate Analyses
Cox Proportional Hazards Model
Using survival time since Operation CROSSROADS as the dependent variable, we use the proportional hazards model to estimate the risks associated with possible explanatory factors (e.g., participant status, boarder status, occupational specialty), including exposure, while mathematically adjusting for potential confounders (e.g., age, rank, rate, paygrade). This model, first formulated by Cox (1972), can take into account the varied lengths of follow-up and other time-dependent effects. We implemented the Cox analysis using the PHREG procedure in SAS (SAS Institute 1992). It is a semiparametric model that ''measures the relative risk of death or disease in (infinitesimally) small time intervals under the assumption that the relative risk is constant over the follow-up period (Ingram and Makuc 1994).''
We used the Cox model with survival time as the response variable; vital status as the censoring variable;22 and age, participant status, paygrade, Engineering & Hull status, and boarder status as explanatory variables. These covariate content areas were chosen before data collection; decisions regarding category divisions were informed by data availability and distributions. Variable definitions are found in Table 8.
Although the distributions of characteristics such as age and paygrade are similar for the participant and control cohorts, they are not identical, and thus we have adjusted for them in the analyses. This model estimates relative risk for one characteristic after removing the variation due to the distribution of other variables in the model. We present the output as relative rate ratios with 95 percent confidence intervals. All statistical tests are two-sided.
We examined the data for all-cause mortality, all-cancer mortality, leukemia mortality, and mortality from specific causes preselected because of concern or knowledge about radiogenicity. We tested a range of possible time-related interactions with exposure. To provide perspective, we also selected several broad categories of cause. The cause-of-death analysis categories are listed in Table 10-9 in decreasing categories of aggregation.
Table 10-8. Definitions of Analysis Variables
Variable Name |
Definition* |
Vital status |
1 = Dead; 0 = Not Known Dead |
Age at shot |
Continuous variable calculated by date of shot minus date of birth |
Survival time |
Continuous variable calculated by date of death minus date of shot |
Participant status |
1 = Participant; 0 = Control |
Boarder status |
3-Level set of indicator variables representing boarding participants, nonboarding participants, and nonparticipant controls. |
Paygrade |
Paygrades summarized in four levels (junior enlisted, El–E3; mid-level enlisted, E4–E5; senior enlisted, E6–E7; and Officers (commissioned and warrant, O1–O10 and W1–W4). |
Occupation 1 |
3-Level set of indicator variables representing: Engineering & Hull, all other enlisted occupational specialties, and all officers. |
Occupation 2 |
7-Level set of indicator variables combining information from 3-level Occupation I with 4-level paygrade categories. |
*See Chapter 8 for a fuller description. |
TABLE 10-9. ICD9 Mortality Codes Used as Case Definitions for Analyses
Case definitiona |
ICD9 mortality codes |
All causes |
0010–9999 |
All malignancies |
1400–2021, 2024, 2027–2089, 2384, 2386, 2898 |
Buccal cancer |
1400–1499 |
Digestive cancer |
1500–1590, 1592–1599 |
Esophagal cancer |
1500–1509 |
Stomach cancer |
1510–1519 |
Large intestine cancer |
1530–1539, 1590 |
Rectal cancer |
1540–1542, 1544–1549 |
Liver cancer |
1550–1551, 1553–1569 |
Pancreatic cancer |
1570–1579 |
Respiratory cancer |
1600–1639, 1642, 1643, 1648, 1649, 1650–1659 |
Lung cancer |
1620–1629 |
Bone cancer |
1700–1709 |
Skin cancer |
1720–1739 |
Prostate cancer |
1850–1859 |
Testicular cancer |
1860–1876, 1878–1879 |
Bladder cancer |
1880–1886, 1888–1889 |
Kidney cancer |
1890–1899, 1887 |
Eye cancer |
1900–1909 |
Brain and other CNS cancer |
1910–1929 |
Thyroid cancer |
1930–1939 |
All lymphopoietic cancer |
2000–2021, 2024, 2027–2089, 2384, 2386, 2898 |
Lymphosarcoma and reticulosarcoma |
2000–2009 |
Hodgkin's disease |
2010–2019 |
Leukemiab and aleukemia |
2040–2089, 2024, 2031 |
Other lymphatic tissue cancer |
2020–2021, 2027–2030, 2032–2039, 2384, 2386, 2071, 2053, 1591 |
Multiple myeloma |
2030, 2386 |
Benign neoplasms |
2100–2376, 2378–2383, 2388–2399 |
Circulatory system disease |
3900–4599 |
Respiratory disease |
4600–5199 |
Digestive system disease |
5200–5799 |
All external causes of death |
8000–9989 |
All accidents |
8000–9499 |
Motor vehicle accidents |
8100–8299 |
Suicide |
9500–9599 |
Infectious and parasitic diseases |
001–139 |
Endocrine, nutritional, and metabolic diseases and immunity disorders |
240–279 |
Case definitiona |
ICD9 mortality codes |
Diseases of the blood and blood-forming organs |
280–289 |
Mental disorders |
290–319 |
Diseases of the nervous system and sense organs |
320–389 |
Diseases of the genitourinary system |
580–629 |
Diseases of the skin and subcutaneous tissue |
680–709 |
Diseases of the musculoskeletal system and connective tissue |
710–739 |
Congenital anomalies |
740–759 |
Symptoms, signs, and ill-defined conditions |
780–799 |
a Case definitions chosen mostly from NCI updated mortality rates (NCI 1995); additional broad categories use ICD9 chapter headings as organizers (WHO 1995). b For the proportional hazards analysis of leukemia, we excluded chronic lymphoid leukemia because it has not been identified as radiogenic. The software package for SMR calculations, however, includes CLL (Preston et al. 1993). |
Standardized Mortality Ratios
For comparison with other atomic veteran studies (Darby 1988, 1993; NRC 1985) we calculated standardized mortality ratios (SMRs) for all-cause mortality, all malignancies, and leukemias for the Navy, Marine, and Army cohorts. To control for age and social factors in all-cause and all-malignancy categories, we calculated separate SMRs by the seven-level "Occupation 2" variable described in Table 8-2, Chapter 8. For leukemias, where there were few cases, we collapsed the seven levels over rank and rating. We expected that both participants and controls would exhibit a "healthy soldier effect." The details of this secondary analysis are presented in Appendix C.
An off-raised and truly considerable drawback to SMR use in studies of occupational-type exposures is the healthy worker—or soldier (sailor)—effect described in Chapter 3. In fact, an earlier National Research Council mortality study of atmospheric nuclear tests and mortality (Robinette et al. 1985) was criticized for using SMRs as its sole risk comparison. The study we report here was designed to include a military reference cohort to provide a finer comparison. The SMR comparison to the U.S. white male population23 of the period under study adds, as mentioned above, a perspective that is useful as long as one keeps its limitations in mind.
Analysis of Army (Including Army Air Corps) and Marine Data
The Navy constituted 91 percent of the CROSSROADS cohort, has occupational specialty information available on its enlisted component, has the largest availability of identification data (availability of date of birth in participants, 97.8 percent, and controls, 98.3 percent) and the most complete cause-of-death information (89.3 percent controls/86.3 percent participants). For that reason we chose to do our primary analysis on the Navy data.
For the Army, which was 7.8 percent of the CROSSROADS cohort, fully 20.4 percent of the dates of birth in the participants were missing and had to be imputed (as compared to 1.8 percent for controls). In addition, the availability of causes of death was lower for Army than for Navy personnel (87.9 percent for participants, 85.9 percent for controls). The quality for the Marines was comparable to that of the Navy, but the Marines constitute a comparatively small number of individuals (557 participants), making detailed analysis of the group impossible from a statistical point of view. Because the Marines do not have any specialty information available for their enlisted ranks, we were reluctant to mix them in with the Navy data.
As a result of these factors, we chose to:
- analyze the Army and the Marine data for differences in all-cause, all-cancer, and leukemia mortality using the proportional hazards model developed for the Navy without the occupational specialty variables, and
- compute SMRs on the Army and Marines only for all-cause, all-cancer, and leukemia mortality.
We do these analyses with some hesitation, given the limitations in the Army data and the small number of Marines, and we present the results solely for completeness. The conclusions of the study are based entirely upon our findings among the Navy personnel.
Not the Subject of Analysis in This Report
As we have stated before, this study was designed and funded subject to several unavoidable constraints, among which are: dosimetry is incomplete; military records do not keep the type of data often required for epidemiologic investigation, and those data items that are kept are not always complete; the U.S. does not have a centralized national vital statistics database for individuals that spans the time period 1946 until now; cause-of-death data have known limitations; Operation CROSSROADS was only one event in a lifetime of physical and psychological events for the participants; and few women were assigned to units included in participant and control cohorts.
For these reasons, this report neither explores nor addresses all the interesting facets of possible exposure-outcome associations. It can, therefore, neither reassure nor vindicate those who feel strongly about the nature of many of those associations. Areas of inquiry into which we have not delved in this study but for which we could imagine a study design include:
- exposure-outcome analyses based on exact dosimetry estimates calculated for a subset of the overall study population;
- fuller examination of cause of death, looking beyond the underlying cause to all associated or contributory causes listed on the death certificate; and
- detailed analysis of the participants who served in the Marines and Army (including the Army Air Corps) in CROSSROADS and their controls.
Unfortunately, the following group of topics may never be well studied in this observational cohort due to reasons including very small numbers; the nonexistence of necessary exposure information; and the unfeasibility, if not impossibility, of tracking health outcomes other than death:
- unique aspects, if any, of the exposure-outcome relationship in women;
- possible effects of participation or other measures of exposure on outcomes other than mortality, looking at morbidity rates for the diseases considered in the mortality study (e.g., skin cancer) and for other diseases and conditions believed to be radiogenic (e.g., cataracts);
- adverse reproductive outcomes;24
- more finely defined categories of military occupation, for officers and non-Navy enlisted personnel for whom no occupation data is available; and
- the interrelationships of other, non-CROSSROADS, risk factors accruing before, during, and after the Operation CROSSROADS activities, including an overlapping array of exposures that could be chemical and physical (occupational, environmental, behavioral); socioeconomic (education, income, occupation); geographic; and medical (comorbidities). Not the least of these is the possibility that many of the participants, as a result of their special radiological training for Operation CROSSROADS, may have gone on to careers associated with radiation.