10
Characterization of the Cohorts and Analysis Plan

The overall goal of the analysis is to compare mortality among CROSSROADS participants with that among controls. In this chapter, we first describe the nature of the data on which that analysis rests and then describe the multivariate analysis plan itself. In an earlier chapter we presented data from the study, since some of the analytic strategies are influenced by our knowledge of our data's quality and idiosyncrasies. Data-based findings relating to the multivariate exposure-outcome relationships are presented in the following chapter (Chapter 11).

We have described in earlier chapters the detailed data collection plans and the practical adjustments that were necessary during their implementation. Refer to Chapter 5 for information on data sources. Chapters 6 and 7 describe who is in the analysis, while Chapters 8 and 9 discuss what data items are used and in what manner.

Data Decisions Taken Before Analysis

Restricting Data Elements or Sample Definition

In the overall attempt to balance the validity, precision, understandability, and usefulness of the analyses in this report, we made the following decisions, based mostly on issues of data availability:



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 49
--> 10 Characterization of the Cohorts and Analysis Plan The overall goal of the analysis is to compare mortality among CROSSROADS participants with that among controls. In this chapter, we first describe the nature of the data on which that analysis rests and then describe the multivariate analysis plan itself. In an earlier chapter we presented data from the study, since some of the analytic strategies are influenced by our knowledge of our data's quality and idiosyncrasies. Data-based findings relating to the multivariate exposure-outcome relationships are presented in the following chapter (Chapter 11). We have described in earlier chapters the detailed data collection plans and the practical adjustments that were necessary during their implementation. Refer to Chapter 5 for information on data sources. Chapters 6 and 7 describe who is in the analysis, while Chapters 8 and 9 discuss what data items are used and in what manner. Data Decisions Taken Before Analysis Restricting Data Elements or Sample Definition In the overall attempt to balance the validity, precision, understandability, and usefulness of the analyses in this report, we made the following decisions, based mostly on issues of data availability:

OCR for page 49
--> Use the 1994 participant database provided by the Defense Nuclear Agency (DNA). (See Appendix E for a detailed description of procedures undertaken to validate the completeness of the participant roster.) Restrict analyses to Navy participants and controls (see Chapter 6). Do not include in participant cohort those individuals who came on duty in the CROSSROADS area of operations after the officially designated period of the operation (see Chapter 6). Exclude from the control cohort those also in the participant cohort (see Chapter 6). Include participants and controls who have other-than-CROSSROADS nuclear test participation (see Chapter 6). Use male mortality rates, since the control cohort is almost totally male (gender generally was not recorded on the research files for participants), when making comparisons to standard populations (e.g., the U.S. population for specific years). However, do not exclude female military personnel from the participant or control study cohorts. Do not use dosimetry; use exposure surrogate variables (see Chapter 8). Data Cleaning and Variable Development Code vital status outcome as a dichotomy: "Known Dead" and "Not Known to be Dead." The latter included participants and controls known to be alive, having date of death after prearranged study cut-off (31 December 1992), and others for whom no death confirmation was obtained through the Department of Veterans Affairs (VA), who were presumed to be alive (see Chapter 9). For VA claims folders that did not contain a date of death from VA records or an acquired death certificate and that had been transferred from VA to a Federal Archives Record Center (FARC), use date of folder transfer to calculate an estimated date of death (see Chapter 9). Because military records used only two digits to designate year of birth, assign a century-of-birth prefix of "l9" to years of birth 00 to 30 and the prefix "18" to years 31 to 99. Create a "boarder" variable to include participants assigned in the appropriate time period to one or more units known to be a target ship, a radiation safety unit, or a boarding team (see Chapter 8). Consolidate occupational specialty information into a two-level analysis variable (Engineering & Hull enlisted, other enlisted) to capture hypothesized exposure differences (see Chapter 8). Because of small numbers (paygrade and rank) or unavailable information (occupation), consider all officers as one category.

OCR for page 49
--> Univariate Descriptions of Study Population As we describe later in this chapter, we base our inferential comparisons on data adjusted for confounding influences on exposure-mortality relationships. Here, we present univariate (unadjusted) descriptive statistics on the variables used in later models. This information supports our belief that the Navy participant and control cohorts are similar in characteristics we can measure. Tables 10-1 through 10-3 show the numerical balance between participant and control cohorts for age, rank (or rating), and occupational specialty, respectively, for Navy personnel. TABLE 10-1. Age-at-Shot Distribution of Navy Participants and Controls     Participant Cohort Control Cohort Agea Years in Interval No. % No. % ≥ 16 and < 21 5 23,081 59.7 19,511 55.7 ≥ 21 and < 26 5 9,504 24.6 9,365 26.7 ≥ 26 and < 36 10 4,730 12.2 5,053 14.4 ≥ 36 and < 46 10 1,134 2.9 962 2.7 ≥ 46 and < 56 10 179 0.5 134 0.4 ≥ 56 and < 66 10 11 0.0 2 0.0 ≥ 66 2 2 0.0 1 0.0 Missing   27 0.1 8 0.0 Totalb   38,668 100 35,036 100 a See Chapter 9 for discussion of the age variable. b Mean age-at-shot for Navy participants is 22.06 years; for controls, 22.50.

OCR for page 49
--> TABLE 10-2. Distribution of Ranks and Ratings Among Navy Participants and Controls   Participant Cohort Control Cohort Paygrade* No. % No. % E1 Junior Enlisted 12 0.0 10 0.0 E2 Junior Enlisted 10,624 27.5 9,773 27.9 E3 Junior Enlisted 7,321 18.9 6,616 18.9 E4 Midlevel Enlisted 5,377 13.9 4,881 13.9 E5 Midlevel Enlisted 4,917 12.7 4,225 12.1 E6 Senior Enlisted 4,316 11.2 3,752 10.7 E7 Senior Enlisted 2,718 7.0 2,330 6.7 W1 (Warrant) Officer 70 0.2 0 — W2 (Warrant) Officer 425 1.1 477 1.4 W3 (Warrant) Officer 1 0.0 0 — W4 (Warrant) Officer 1 0.0 0 — O1 (Commissioned) Officer 934 2.4 1,186 3.4 O2 (Commissioned) Officer 719 1.9 786 2.2 O3 (Commissioned) Officer 526 1.4 491 1.4 O4 (Commissioned) Officer 338 0.9 259 0.7 O5 (Commissioned) Officer 194 0.5 167 0.5 O6 (Commissioned) Officer 157 0.4 74 0.2 O7 (Commissioned) Officer 0 — 0 — O8 (Commissioned) Officer 10 0.0 1 0.0 O9 (Commissioned) Officer 4 0.0 0 — O10 (Commissioned) Officer 2 0.0 0 — Missing 2 0.0 8 0.0 Total 38,668 100 35,036 100 * See Chapter 8 for description of paygrade, rank, and rate.

OCR for page 49
--> TABLE 10-3. Distribution of Occupational Specialties Among Navy Participants and Controls   Participant Cohort Control Cohort Occupation* No. % No. % Administrative and clerical 5,248 13.6 4,531 12.9 Aviation 920 2.4 519 1.5 Construction 322 0.8 285 0.8 Deck 2,444 6.3 2,204 6.3 Dental 1 0.0 0 — Electronics 415 1.1 382 1.1 Engineering & Hull 9,399 24.3 8,756 25.0 Medical 639 1.7 582 1.7 Miscellaneous 459 1.2 401 1.1 Ordnance 1,455 3.8 1,343 3.8 Precision equipment 4 0.0 3 0.0 Seaman 13,776 35.6 12,481 35.6 Steward 213 0.6 101 0.3 Unknown 3,371 8.7 3,428 9.8 Missing 2 0.0 20 0.1 Total 38,668 100 35,036 100 * See Chapter 8 for description of Navy occupational specialties. Participants are labeled as Nonboarding and Boarding and are present in the cohorts in numbers shown in Table 10-4. TABLE 10-4. Distribution of Boarders in the Study Cohort   Participant Cohort Control Cohort Boarder* No. % No. % Yes 8,996 23.3 0 0 No 29,672 76.7 35,036 100 Total (participants only) 38,668 100 35,036 100 * See Chapter 8 for discussion of boarder variable. Missing Data Imputation of Fact and Date of Death Because we classified approximately 500 individuals as "Dead" solely because their VA claims folder had been transferred to a Federal Archive Records Center (FARC), we devised a test to determine whether such imputation of fact and date of death was justifiable. Looking at records with both noted dates of death and FARC transfer dates, we determined that there is a

OCR for page 49
--> definite relationship between year of death and the date a claims folder is transferred to a FARC. We used the year-specific lag time for those known pairs to impute a lag time for date of death for those with only the FARC transfer date. For records transferred between 1956 and 1962, we adjusted the date to represent a death six years earlier. For 1963 to 1971 transfers, we used a three-year adjustment; for 1972 to 1985, two years; and for 1986 to 1995, one year. Imputation of Date of Birth For 1,448 otherwise complete Navy records, date of birth was missing. Since the proportional hazards and standardized mortality analyses we used require age information, we devised a date-of-birth imputation procedure. For individuals with known dates of birth in the participant and control cohorts, date of birth was associated with paygrade and military rating. We therefore used a missing data imputation technique (hot deck technique, Naus 1975) to assign a date of birth from a randomly selected member of the cohort. Records were first matched according to exact rating (e.g., Seaman) and paygrade (e.g., E2); those having no matching rated individual were assigned based on paygrade. Summary Table 10-5 describes the extent and distribution of missing data items by the important analysis categories of exposure and outcome. TABLE 10-5. Number and Percent of Records With Missing Needed Data Item     Participant Cohort Control Cohort Characteristic Denominator No. % No. % Date of birth without imputation All 873 2.25 610 1.74 Date of birth with imputations All 27 0.07 8 0.02 Occupation Enlisted 2 0.01 8 0.03 Paygrade All 2 0.01 8 0.02 Date of death without imputation Dead 401 3.32 166 1.54 Date of death with imputations Dead 17 0.14 20 0.19 Cause of death Dead 1,650 13.64 1,146 10.61 Completeness of Vital Status Ascertainment Because recorded vital status is the main outcome in this study, differences in success in its ascertainment could distort the association we observe between exposure and that outcome. We discuss this in great detail in the preceding

OCR for page 49
--> chapter (Chapter 9). In Tables 10-6 and 10-7 we present data on the follow-up of mortality status of participants and controls. Subjects are divided into those (1) known to have died and who have coded cause of death, (2) known to have died but with no cause of death available, (3) presumed alive (i.e., found on the Beneficiary Identification and Records Locator Subsystem [BIRLS] without a date of death or a FARC location), and (4) those not found on BIRLS, whom we consider lost to follow-up. TABLE 10-6. Vital Status on Follow-Up   Participants Controls Vital Status on Follow-Up No. % No. % Dead 12,093 31.3 10,806 30.8 Presumed alive 21,770 56.3 20,319 58.0 Lost to follow-up 4,805 12.4 3,911 11.2 Total 38,668 100 35,036 100 TABLE 10-7. Information Available on Deaths   Participants Controls Data Available on Deaths No. % No. % Date and Cause 10,436 86.3 9,649 89.3 Date only 1,639 13.6 1,135 10.5 Cause only 7 0.1 10 0.1 Neither Cause nor Date 10 0.1 10 0.1 Blank 1 0.0 2 0.0 Total Dead 12,093 100 10,806 100 Mortality Comparisons The overall goal of the analysis to is compare mortality among CROSSROADS participants with that among controls. Under the null hypothesis, which is usually defined as the absence of an association, there would be no differences in mortality rates between the participants and the controls. In particular, if participation at CROSSROADS had no effect, we would find no significant difference in overall mortality. A secondary hypothesis arises from concerns that radiation exposure at CROSSROADS could be the cause of any effect that may be seen among participants relative to nonparticipating controls. Under this null hypothesis there would be no significant trend observed across boarding party participants (more exposure surrogate), non-boarding-party participants (less exposure surrogate), and nonparticipant controls (no exposure surrogate) in all-malignancy or leukemia mortality. Similarly, mortality experience in the

OCR for page 49
--> Engineering & Hull exposure group would be no different than that in the other enlisted group. Multivariate Analyses Cox Proportional Hazards Model Using survival time since Operation CROSSROADS as the dependent variable, we use the proportional hazards model to estimate the risks associated with possible explanatory factors (e.g., participant status, boarder status, occupational specialty), including exposure, while mathematically adjusting for potential confounders (e.g., age, rank, rate, paygrade). This model, first formulated by Cox (1972), can take into account the varied lengths of follow-up and other time-dependent effects. We implemented the Cox analysis using the PHREG procedure in SAS (SAS Institute 1992). It is a semiparametric model that ''measures the relative risk of death or disease in (infinitesimally) small time intervals under the assumption that the relative risk is constant over the follow-up period (Ingram and Makuc 1994).'' We used the Cox model with survival time as the response variable; vital status as the censoring variable;22 and age, participant status, paygrade, Engineering & Hull status, and boarder status as explanatory variables. These covariate content areas were chosen before data collection; decisions regarding category divisions were informed by data availability and distributions. Variable definitions are found in Table 8. Although the distributions of characteristics such as age and paygrade are similar for the participant and control cohorts, they are not identical, and thus we have adjusted for them in the analyses. This model estimates relative risk for one characteristic after removing the variation due to the distribution of other variables in the model. We present the output as relative rate ratios with 95 percent confidence intervals. All statistical tests are two-sided. We examined the data for all-cause mortality, all-cancer mortality, leukemia mortality, and mortality from specific causes preselected because of concern or knowledge about radiogenicity. We tested a range of possible time-related interactions with exposure. To provide perspective, we also selected several broad categories of cause. The cause-of-death analysis categories are listed in Table 10-9 in decreasing categories of aggregation. 22    For all-cause mortality, survival time is measured from I July 1946 to date of death; survivors are right censored at the end of the study (31 December 1992). For cause-specific mortality, survival time is measured from 1 July 1946 to date of death due to the specific cause; other deaths are right censored at time of death; survivors are right censored at the end of the study.

OCR for page 49
--> Table 10-8. Definitions of Analysis Variables Variable Name Definition* Vital status 1 = Dead; 0 = Not Known Dead Age at shot Continuous variable calculated by date of shot minus date of birth Survival time Continuous variable calculated by date of death minus date of shot Participant status 1 = Participant; 0 = Control Boarder status 3-Level set of indicator variables representing boarding participants, nonboarding participants, and nonparticipant controls. Paygrade Paygrades summarized in four levels (junior enlisted, El–E3; mid-level enlisted, E4–E5; senior enlisted, E6–E7; and Officers (commissioned and warrant, O1–O10 and W1–W4). Occupation 1 3-Level set of indicator variables representing: Engineering & Hull, all other enlisted occupational specialties, and all officers. Occupation 2 7-Level set of indicator variables combining information from 3-level Occupation I with 4-level paygrade categories. *See Chapter 8 for a fuller description.

OCR for page 49
--> TABLE 10-9. ICD9 Mortality Codes Used as Case Definitions for Analyses Case definitiona ICD9 mortality codes All causes 0010–9999 All malignancies 1400–2021, 2024, 2027–2089, 2384, 2386, 2898 Buccal cancer 1400–1499 Digestive cancer 1500–1590, 1592–1599    Esophagal cancer 1500–1509    Stomach cancer 1510–1519    Large intestine cancer 1530–1539, 1590    Rectal cancer 1540–1542, 1544–1549    Liver cancer 1550–1551, 1553–1569    Pancreatic cancer 1570–1579 Respiratory cancer 1600–1639, 1642, 1643, 1648, 1649, 1650–1659    Lung cancer 1620–1629 Bone cancer 1700–1709 Skin cancer 1720–1739 Prostate cancer 1850–1859 Testicular cancer 1860–1876, 1878–1879 Bladder cancer 1880–1886, 1888–1889 Kidney cancer 1890–1899, 1887 Eye cancer 1900–1909 Brain and other CNS cancer 1910–1929 Thyroid cancer 1930–1939 All lymphopoietic cancer 2000–2021, 2024, 2027–2089, 2384, 2386, 2898    Lymphosarcoma and reticulosarcoma 2000–2009    Hodgkin's disease 2010–2019    Leukemiab and aleukemia 2040–2089, 2024, 2031    Other lymphatic tissue cancer 2020–2021, 2027–2030, 2032–2039, 2384, 2386, 2071, 2053, 1591    Multiple myeloma 2030, 2386 Benign neoplasms 2100–2376, 2378–2383, 2388–2399 Circulatory system disease 3900–4599 Respiratory disease 4600–5199 Digestive system disease 5200–5799 All external causes of death 8000–9989    All accidents 8000–9499       Motor vehicle accidents 8100–8299    Suicide 9500–9599 Infectious and parasitic diseases 001–139 Endocrine, nutritional, and metabolic diseases and immunity disorders 240–279

OCR for page 49
--> Case definitiona ICD9 mortality codes Diseases of the blood and blood-forming organs 280–289 Mental disorders 290–319 Diseases of the nervous system and sense organs 320–389 Diseases of the genitourinary system 580–629 Diseases of the skin and subcutaneous tissue 680–709 Diseases of the musculoskeletal system and connective tissue 710–739 Congenital anomalies 740–759 Symptoms, signs, and ill-defined conditions 780–799 a Case definitions chosen mostly from NCI updated mortality rates (NCI 1995); additional broad categories use ICD9 chapter headings as organizers (WHO 1995). b For the proportional hazards analysis of leukemia, we excluded chronic lymphoid leukemia because it has not been identified as radiogenic. The software package for SMR calculations, however, includes CLL (Preston et al. 1993). Standardized Mortality Ratios For comparison with other atomic veteran studies (Darby 1988, 1993; NRC 1985) we calculated standardized mortality ratios (SMRs) for all-cause mortality, all malignancies, and leukemias for the Navy, Marine, and Army cohorts. To control for age and social factors in all-cause and all-malignancy categories, we calculated separate SMRs by the seven-level "Occupation 2" variable described in Table 8-2, Chapter 8. For leukemias, where there were few cases, we collapsed the seven levels over rank and rating. We expected that both participants and controls would exhibit a "healthy soldier effect." The details of this secondary analysis are presented in Appendix C. An off-raised and truly considerable drawback to SMR use in studies of occupational-type exposures is the healthy worker—or soldier (sailor)—effect described in Chapter 3. In fact, an earlier National Research Council mortality study of atmospheric nuclear tests and mortality (Robinette et al. 1985) was criticized for using SMRs as its sole risk comparison. The study we report here was designed to include a military reference cohort to provide a finer comparison. The SMR comparison to the U.S. white male population23 of the period under study adds, as mentioned above, a perspective that is useful as long as one keeps its limitations in mind. 23    Navy personnel in 1946 were predominantly white; we have no individual data on race.

OCR for page 49
--> Analysis of Army (Including Army Air Corps) and Marine Data The Navy constituted 91 percent of the CROSSROADS cohort, has occupational specialty information available on its enlisted component, has the largest availability of identification data (availability of date of birth in participants, 97.8 percent, and controls, 98.3 percent) and the most complete cause-of-death information (89.3 percent controls/86.3 percent participants). For that reason we chose to do our primary analysis on the Navy data. For the Army, which was 7.8 percent of the CROSSROADS cohort, fully 20.4 percent of the dates of birth in the participants were missing and had to be imputed (as compared to 1.8 percent for controls). In addition, the availability of causes of death was lower for Army than for Navy personnel (87.9 percent for participants, 85.9 percent for controls). The quality for the Marines was comparable to that of the Navy, but the Marines constitute a comparatively small number of individuals (557 participants), making detailed analysis of the group impossible from a statistical point of view. Because the Marines do not have any specialty information available for their enlisted ranks, we were reluctant to mix them in with the Navy data. As a result of these factors, we chose to: analyze the Army and the Marine data for differences in all-cause, all-cancer, and leukemia mortality using the proportional hazards model developed for the Navy without the occupational specialty variables, and compute SMRs on the Army and Marines only for all-cause, all-cancer, and leukemia mortality. We do these analyses with some hesitation, given the limitations in the Army data and the small number of Marines, and we present the results solely for completeness. The conclusions of the study are based entirely upon our findings among the Navy personnel. Not the Subject of Analysis in This Report As we have stated before, this study was designed and funded subject to several unavoidable constraints, among which are: dosimetry is incomplete; military records do not keep the type of data often required for epidemiologic investigation, and those data items that are kept are not always complete; the U.S. does not have a centralized national vital statistics database for individuals that spans the time period 1946 until now; cause-of-death data have known limitations; Operation CROSSROADS was only one event in a lifetime of physical and psychological events for the participants; and few women were assigned to units included in participant and control cohorts.

OCR for page 49
--> For these reasons, this report neither explores nor addresses all the interesting facets of possible exposure-outcome associations. It can, therefore, neither reassure nor vindicate those who feel strongly about the nature of many of those associations. Areas of inquiry into which we have not delved in this study but for which we could imagine a study design include: exposure-outcome analyses based on exact dosimetry estimates calculated for a subset of the overall study population; fuller examination of cause of death, looking beyond the underlying cause to all associated or contributory causes listed on the death certificate; and detailed analysis of the participants who served in the Marines and Army (including the Army Air Corps) in CROSSROADS and their controls. Unfortunately, the following group of topics may never be well studied in this observational cohort due to reasons including very small numbers; the nonexistence of necessary exposure information; and the unfeasibility, if not impossibility, of tracking health outcomes other than death: unique aspects, if any, of the exposure-outcome relationship in women; possible effects of participation or other measures of exposure on outcomes other than mortality, looking at morbidity rates for the diseases considered in the mortality study (e.g., skin cancer) and for other diseases and conditions believed to be radiogenic (e.g., cataracts); adverse reproductive outcomes;24 more finely defined categories of military occupation, for officers and non-Navy enlisted personnel for whom no occupation data is available; and the interrelationships of other, non-CROSSROADS, risk factors accruing before, during, and after the Operation CROSSROADS activities, including an overlapping array of exposures that could be chemical and physical (occupational, environmental, behavioral); socioeconomic (education, income, occupation); geographic; and medical (comorbidities). Not the least of these is the possibility that many of the participants, as a result of their special radiological training for Operation CROSSROADS, may have gone on to careers associated with radiation. 24    Feasibility is discussed in Institute of Medicine, Medical Follow-up Agency. Adverse Reproductive Outcomes in Families of Atomic Veterans: The Feasibility of Epidemiologic Studies. Washington, D.C.: National Academy Press, 1995.