2

Clinical Trial

INTRODUCTION

This chapter focuses on the Clinical Trial (CT), which is the costliest, most complicated, and most controversial component of the Women' s Health Initiative (WHI). The CT is designed to test the benefits and risks of dietary modification (DM), hormone replacement therapy (HRT), and calcium and vitamin D supplements (CaD) on the health of postmenopausal women. The primary hypotheses of these three branches of the CT are: (1) whether a low fat dietary pattern reduces the risks of breast and colorectal cancers; (2) whether hormone replacement therapy reduces the risk of coronary heart disease (CHD); and (3) whether combined calcium and vitamin D supplementation reduces the risk of hip fractures.

NIH has structured the CT as a 3 × 2 × 2 partial factorial design involving 63,000 women between the ages of 50 and 79 (in addition, 100,000 women will be enrolled in the observational study). NIH has funded a Clinical Coordinating Center and the first 16 of 45 expected Clinical Centers. These 16 centers, called Vanguard Clinical Centers, began a three-year recruitment period on September 1, 1993. The additional clinical centers, to be named in September 1994, would begin recruitment in January 1995. Clinic closeout is scheduled to begin September 2004, followed by two years of data analysis by the Clinical Coordinating Center.

The CT is the most thoroughly designed aspect of the WHI thus far. The committee's assessment of it was therefore based on more information than was available for the Observational Study ( Chapter 3 ) or the Community Prevention Study ( Chapter 4 ); that is reflected in the size and scope of this chapter. The chapter begins with trial-wide issues of rationale and study design. This is followed by a presentation of more detailed information on each branch. Cost details for the entire CT are presented. The chapter, concludes with a presentation of the committee's findings and suggestions, and major recommendations regarding the CT component of the WHI.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 25
An Assessment of the NIH Women's Health Initiative 2 Clinical Trial INTRODUCTION This chapter focuses on the Clinical Trial (CT), which is the costliest, most complicated, and most controversial component of the Women' s Health Initiative (WHI). The CT is designed to test the benefits and risks of dietary modification (DM), hormone replacement therapy (HRT), and calcium and vitamin D supplements (CaD) on the health of postmenopausal women. The primary hypotheses of these three branches of the CT are: (1) whether a low fat dietary pattern reduces the risks of breast and colorectal cancers; (2) whether hormone replacement therapy reduces the risk of coronary heart disease (CHD); and (3) whether combined calcium and vitamin D supplementation reduces the risk of hip fractures. NIH has structured the CT as a 3 × 2 × 2 partial factorial design involving 63,000 women between the ages of 50 and 79 (in addition, 100,000 women will be enrolled in the observational study). NIH has funded a Clinical Coordinating Center and the first 16 of 45 expected Clinical Centers. These 16 centers, called Vanguard Clinical Centers, began a three-year recruitment period on September 1, 1993. The additional clinical centers, to be named in September 1994, would begin recruitment in January 1995. Clinic closeout is scheduled to begin September 2004, followed by two years of data analysis by the Clinical Coordinating Center. The CT is the most thoroughly designed aspect of the WHI thus far. The committee's assessment of it was therefore based on more information than was available for the Observational Study ( Chapter 3 ) or the Community Prevention Study ( Chapter 4 ); that is reflected in the size and scope of this chapter. The chapter begins with trial-wide issues of rationale and study design. This is followed by a presentation of more detailed information on each branch. Cost details for the entire CT are presented. The chapter, concludes with a presentation of the committee's findings and suggestions, and major recommendations regarding the CT component of the WHI.

OCR for page 25
An Assessment of the NIH Women's Health Initiative RATIONALE General Issues Cardiovascular disease, breast cancer, and osteoporotic fractures are among the leading causes of morbidity and mortality in postmenopausal women. As such, they are reasonable and defensible targets for a large prevention study. Coronary heart disease is the leading cause of death in U.S. women. The mortality and incidence rates of breast cancer are high; over an average 85-year lifespan, one in nine women develop breast cancer and approximately one in thirty die of it. Osteoporotic fractures, which are associated with aging, affect many more women than men; complications are life threatening and reduce both longevity and quality of life. These diseases are not alone among the severe disablers of women, however. The CT does not directly address arthritis, dysmobility, poverty and isolation, depression, dementia, hearing, vision, and dental losses, and institutionalization. Neither does it address other compelling outcomes, such as dysfunction or pain, that are not linked to solitary etiologies. This should not imply that these issues are not troubling sources of morbidity, nor that they would be inappropriate targets for future prevention and treatment research. Similarly, that the focus of the CT is on postmenopausal women should not be mistaken for a disregard of the myriad unanswered questions about younger women, or about the effects of behavior and disease in earlier stages of life on morbidity and mortality in later stages. One study cannot answer all questions. The primary hypotheses of the CT are as follows: A low fat dietary pattern reduces the risk of breast cancer. A low fat dietary pattern reduces the risk of colorectal cancer. Hormone replacement therapy reduces the risk of coronary heart disease. Calcium and vitamin D (combined) supplementation reduces the risk of hip fracture. The numerous secondary hypotheses include: DM reduces risk of CHD; HRT increases risks of breast and endometrial cancers; HRT reduces risk of fractures; and CaD reduces risk of colorectal cancer. The CT outcomes are presented in Figure 2-1; the CT hypotheses are listed in Appendix F. There are reasonably good rationales for some aspects of each of the three branches of the CT, although evidence for the central hypothesis for the DM branch–that a change to a low fat dietary pattern by women over the age of 50 will reduce the incidence of breast cancer over the following nine years—is the weakest and least consistent of the three. There are stronger rationales for expecting that there are effects of DM on colorectal cancer and various cardiovascular disease endpoints. Similarly, there is a strong rationale for the HRT branch, which will test not only the relationship between HRT and coronary heart disease,

OCR for page 25
An Assessment of the NIH Women's Health Initiative FIGURE 2-1Outcomes for the WHI Clinical Trial.

OCR for page 25
An Assessment of the NIH Women's Health Initiative but also quantify the secondary adverse and beneficial outcomes such as cancers of the breast and endometrium, fractures (especially hip fractures), quality of life, and total mortality. It is defensible to test the effect of CaD on risks of hip fracture and colorectal cancer within the context of a study that has been mounted for other purposes; these hypotheses would not stand alone as a rationale for an expensive trial. The DM branch drives the size of the study, the DM-breast cancer hypothesis drives the length of the study, and the DM and HRT branches generate most of the complexity of the study. Each outcome to be measured is hypothesized to be affected by more than one of the CT intervention branches. For example, DM and HRT may affect coronary heart disease; DM and HRT may affect breast cancer; and HRT and CaD may affect fractures. Integration of the CT with Other Components of the WHI The goal of the WHI CT is to test whether the interventions being used will reduce the morbidity and mortality associated with breast cancer, cardiovascular disease, and osteoporotic fractures. The WHI Observational Study (OS) is designed to follow women for an average of nine years. The goals of the OS are to: (1) improve risk prediction of coronary heart disease, breast cancer, colorectal cancer, fractures, and total mortality in postmenopausal women; (2) create a resource of data and biologic samples that can be used to identify new risk factors and/or biomarkers for disease; and (3) examine the impact of changes in individual characteristics on disease and total mortality. The OS can provide quantitative assessments of risk factor associations with major chronic diseases in women and it will enable the calculation of improved risk estimates for cardiovascular disease, cancers, bone fracture, and other disease endpoints in older women. Such information is expected to improve the quality of life of postmenopausal women by facilitating the identification and preventive treatment of high-risk women. Many women must be screened to determine their eligibility for the CT, and this is costly. The marginal cost of following these women in the OS is small relative to the expense of mounting an independent OS. Thus, it is appropriate to conduct the OS in tandem with the CT. The details of the Community Prevention Study (CPS) are unknown, so the committee cannot judge whether that component of the WHI will draw on the experience or results of the CT or OS. The CPS would fit well into the overall vision of improved women's health if its goals were to develop lifestyle change strategies in diet, exercise, smoking, and early disease detection that are accepted as national goals and for which major gaps in development exist, especially as pertain to women of low socioeconomic status (SES) and minority women. The CPS could also furnish an infrastructure of trained personnel to aid in carrying out interventions and policies that might flow from the CT and OS.

OCR for page 25
An Assessment of the NIH Women's Health Initiative DESIGN AND METHODS The committee's examination of the CT design concentrated on two fundamental questions: Can the study design—if no operational difficulties occur—answer the questions it addresses? If the study design is appropriate, what other threats are there to the successful completion of the study? The committee focused on twelve issues central to these questions, which are discussed below. Seven of these issues involve conceptual problems that are built into the design of the CT. Even if all study operations were to proceed without incident, these conceptual issues threaten the validity of the findings: factorial design sample characteristics proposed analytic techniques ethics: consent and stopping rules minority analysis plan specificity of intervention and effect outcome definition and measurement In addition to these conceptual problems, any study—no matter how well designed—is subject to setbacks by operational problems. The CT is particularly vulnerable to such problems because of its size, complexity, and duration. The committee has identified five operational issues that could jeopardize the study's success: recruitment and retention adherence secular trends provision of health care services to participants study management In the ensuing discussion of these twelve issues, certain specific suggestions are made. More global recommendations will be discussed at the end of this chapter. Factorial Design NIH argued that conducting a partial factorial design would reduce the required number of women and attendant costs and allow assessment of interactions among intervention branches. The partial factorial design is presented in Figure 2-2 , which has been reprinted with NIH permission from the June 28, 1993 WHI Protocol, page 18. The committee feels

OCR for page 25
An Assessment of the NIH Women's Health Initiative that the factorial design has serious weaknesses. The factorial design is criticized because the difficulty of maintaining adherence to one intervention, such as DM, is magnified greatly in a design that requires adherence to two interventions, DM and HRT. The 15.9 percent overlap between the DM and HRT interventions is insufficient to provide adequate statistical power to assess interactions between the interventions. Therefore, the complexity of the design is not compensated by an increase in statistical power. NIH also argued that it will be more economical for the clinical centers to screen simultaneously for the two branches, DM and HRT, rather than to mount each one separately. As now planned, there are effectively two separate studies, HRT and DM, done within the same administrative structure. It is mostly the efficiency of shared administration which make this plan more economical. In essence, the integrated design has become primarily a matter of efficiency; it is not essential to hypothesis testing. FIGURE 2-2 Women's Health Initiative Clinical Trial partial factorial design. Sample Characteristics Size The CT is one of NIH's largest clinical trials: 63,000 women are expected to enroll. The large sample size is one of the primary reasons that the CT is expensive: the CT and OS are expected by NIH to cost approximately $586 million.

OCR for page 25
An Assessment of the NIH Women's Health Initiative The sample size is driven by the choice of endpoints. The primary endpoints of the CT are incidence of breast cancer, colorectal cancer, CHD, and hip fractures. Because the incidence rate of each outcome differs, and because the study interventions have different hypothesized effects, setting sample size requirements for the overall CT is a complicated task. The cost of the trial, strongly linked as it is to sample size, would vary based on assumptions made. In reviewing the WHI Protocol, the committee was concerned about a number of the assumptions. For example, a continuing linear decline of CHD mortality is assumed; this should be examined in more detail, especially by age group. For each of the main hypotheses in the CT, the sample size is also determined by the need to achieve a specified power to test the effect of the intervention at a given significance level. Take, for example, the DM intervention. To test whether the difference in breast cancer event rates in the intervention and control groups is an effect of DM, a significance level α = 0.025 and a one-sided test is used in the WHI Protocol. The power to test for effects, and hence the required sample size, depends on assumptions made by NIH about the following factors: Age distribution. Women aged 50-54, 55-59, 60-69, and 70-79 are to be enrolled in the ratio 2:4:9:5, by design. Loss to follow-up. For the breast cancer endpoint, loss to follow-up is assumed to be 3.0 percent per year due to deaths from other causes or disappearance. Adherence. Based on the Women's Health Trial Vanguard Study (Henderson et al., 1990), it is assumed that the average percentage of calories from fat will drop from 39 percent at baseline to 20.9 percent at six months, will increase to 21.6 percent at one year, and to 22.6 percent at two years. It is then assumed to increase linearly to 26 percent at 10 years (June 28, 1993 WHI Protocol). For the control group, average percent calories from fat is assumed to decrease linearly from 38 percent at baseline to 34 percent at 10 years. Magnitude and Lag of Dietary Effects. Based on international correlations between dietary fat disappearance data (rate of use or wastage in the population) and breast cancer incidence rates (Prentice et al., 1988), the WHI Protocol assumes that the risk ratio decreases linearly from RR = 1.0 at baseline to RR = 0.5 at 10 years for fully adherent women. When this effect is averaged over nine years and nonadherence is taken into account, it is projected that the DM effect is a 14 percent reduction in breast cancer incidence. Incidence Rates. The protocol uses published age-specific incidence data from the SEER program for the years 1985-1989. The resulting percentage of cases, assuming 14 percent DM effect, are 2.92 percent and 2.52 percent for the control and intervention groups, respectively, after nine years.

OCR for page 25
An Assessment of the NIH Women's Health Initiative With the above assumptions, the WHI Protocol states that the power of the test is 86 percent based on a sample of 48,000 women. * It should be noted that the power of the test, and hence the required sample size, can vary drastically depending on changes in the above assumptions. For example, if the intervention effect is only 11 or 12 percent, rather than the expected 14 percent, then the power of the test would drop to 63 or 75 percent for a sample of 48,000 women. In fact, the protocol shows that reasonable changes in just three of the underlying assumptions—follow-up, effect size, and number of enrollees—produce enormous variation in the power, which could be as low as 25 percent (six years follow-up, 11 percent intervention effect, and 42,000 participants) or as high as 89 percent (nine years follow-up, 14 percent intervention effect, and 54,000 participants). All are within reasonable ranges of assumptions. To illustrate the effect of additional assumptions on sample size, the committee considered an example given by Lakatos (1988). In this example, when the lag time (the interval needed to achieve full intervention effect) increases from instantaneous (zero years) to one full year, the sample size needed to achieve 90 percent power at α= 0.05 (two-sided) can increase more than fourfold. Thus, the necessary sample size is very sensitive to the assumed timing of effect on the relative risk. The proposed protocol assumes a linear halving of the risk over five years. The existing data neither support nor contradict this claim, but the biology of breast cancer would seem to make this an optimistic projection. If diet does have an effect on breast cancer, but the lag time for halving the risk is, for example, 20 years, then the currently proposed project has very little chance to detect an effect. The uncertainty of the lag effect is crucial to the reliability of the sample size estimates. Short lag times would enable results to be acquired more quickly and longer lag times would likely preclude a result in the trial as planned. Information gained in the first five years of the WHI may be critical in setting bounds on these estimates. Recruitment Because the sample size determines what recruitment efforts are required, it is necessary to assess recruitment assumptions. The June 28, 1993 WHI Protocol estimates that 33 percent of the 189,000 women who are expected to attend the first screening visit are expected to enter the CT. The CT will randomize 25,000 women to the HRT component (40 percent of whom are expected to agree to be in the DM component as well); 48,000 women to the DM component (21 percent of whom are expected to agree to be in the HRT component as well); and 45,000 women to the CaD component (71 percent of the total), all of whom will be participating in at least one of the other components. * This is based on a modified version of a program designed by Lakatos (1988).

OCR for page 25
An Assessment of the NIH Women's Health Initiative Each Vanguard Clinical Center expects to enroll 336 women in the HRT arm; 846 in the DM arm; 224 in both the HRT and DM arms; and 2,220 women in the OS. As currently planned, each Clinical Center is expected to enroll 39 women per month. Therefore, for each month ahead of schedule a clinic becomes, there is a gain in power from the increase of three person-years (39 person-months) of follow-up. Similarly, for each month a clinic falls behind in recruitment, three person-years of follow-up are lost. Because of the recent delay in bringing on the 29 additional clinics, several months of follow-up are already lost to the study. This delay threatens the power and sample size computations, adding to the level of uncertainty. Participant Characteristics Postmenopausal women between the ages of 50 and 79 will be invited to join the CT. It is the goal of the WHI to have the study sample represent initial age categories in the following allocations: 50-54 years old—10 percent 55-59 years old—20 percent 60-69 years old—45 percent 70-79 years old—25 percent The WHI is also striving for, but not requiring, a “representative” accrual of participants with regard to race/ethnicity and SES. This will further complicate recruitment, although it will strengthen the generalizability of the results. It is not clear to the committee how this goal would be enforced. General inclusion criteria for the CT, according to the June 28, 1993 WHI Protocol, are postmenopausal status, with or without a uterus or ovaries; 50-79 years of age, inclusive, at first screening contact; likely to be residing in the study area for at least three years after randomization; and providing written informed consent. Exclusion criteria include competing risks such as a medical condition associated with a survival rate of less than five years, invasive cancer of any type in the past ten years, or breast cancer (in situ or invasive) at any time; characteristics that could affect adherence or retention, such as alcohol or drug dependency, mental illness, dementia, or current active participation in another intervention trial; and unwillingness to give up current HRT or calcium supplementation. See Appendix A for more a detailed description of exclusion criteria. Participants will not be categorized by risk for breast cancer, colorectal cancer, or coronary heart disease. This allows a more generalizable study, but the lack of risk restrictions requires a much larger sample size. The factorial design does not allow specific branches to focus on the most efficient samples, such as women at high risk of CHD for an HRT trial or women at high risk of breast cancer for a DM trial.

OCR for page 25
An Assessment of the NIH Women's Health Initiative Proposed Analytic Techniques NIH has proposed carefully designed and deliberated analytic techniques. A weighted logrank test will be used to test for the hypothesized effects in the CT (Lakatos, 1988). The logrank test is based on the time it takes until the event occurs. If the event does not occur within the observation period, the case is considered a censored observation. The null hypothesis (i.e., that the intervention made no difference) of the logrank test is that the distribution of time-to-events is the same in the intervention and the control groups. Although a one-sided test with α = 0.025 is mathematically equivalent to a two-sided test at α = 0.05 (yielding equivalent sample size estimates), the difference has implications for conceptualizing and monitoring the results. The committee, as well as a number of the investigators, feels that a two-sided test should be used. Statistical adjustments using relative risk regression methods will be used to consider the effects of including other covariables, the ability of intermediate variables to explain an intervention effect, the estimation of full adherence relative risk as a function of time since randomization, and a reliability substudy. No multiple comparison adjustments are planned for primary endpoint analysis. Subsidiary outcome analyses will rely on multivariate response analyses when appropriate. While it is legitimate to forego formal multiple comparison adjustment, as long as that is clearly stated in the protocol, the practice stands in stark contrast to the proposed use of the Bonferroni adjustment, one of the most conservative adjustments, in the analyses to be presented to the Data and Safety Monitoring Board (DSMB). The Bonferroni adjustment to the significance level consists of dividing the alpha by the number of tests simultaneously performed and using the result as the level of significance for the test. It seems likely, and preferable, for the DSMB to receive uncorrected data. Data Safety and Monitoring Board As in many blinded NIH studies involving human participants, there will be a Data and Safety Monitoring Board (DSMB) with oversight responsibilities. To address the tasks of the DSMB, plans for interim analysis have been drafted. The committee was told that the Clinical Coordinating Center will present data on primary, subsidiary, and intermediate outcomes to the DSMB after Bonferroni adjustments for multiple comparisons are made. Each CT branch will be monitored for early stoppage based on summary measures of benefits and risks. Since the DSMB will have the responsibility of stopping a CT branch if adverse effects produce a risk to participants, these interim plans are not well enough formulated to be adequate. These plans are extremely complicated and are slated to be addressed by the DSMB. This onerous task has major implications. If certain monitoring plans are adopted, it might be decided to provide the participants with some study results. Alternatively, using the severe corrections for all the multiple comparisons, key results may

OCR for page 25
An Assessment of the NIH Women's Health Initiative be obscured, delaying the release of important public health results. The DSMB will no doubt address these issues, but the lack of information on how these decisions will be made over the duration of the trial increases the uncertainty about the ability of the CT to achieve specific goals. The committee suggests that the DSMB prespecify a number of outcomes and situations to monitor concerning stopping the trial. Ethics: Consent and Stopping Rules Any clinical trial must incorporate adequate protection for the well-being and self-determination of human participants. This study has such a broad population base and such high visibility that its procedures in this regard are likely to come under special scrutiny. A randomized study is ethically justifiable only when competent professionals cannot discern a reason why one arm of the study is clearly better or worse than others for the potential participants. Allowing a participant to join a randomized trial is ethically defensible only when the participant has enough information to evaluate whether all arms are reasonably equal in her own view or, alternatively, that the differences between arms are of a magnitude and seriousness that she is willing to accept in order to contribute to the common good. After much discussion, the committee decided that it is currently defensible to offer the randomizations to each of the CT branches. Ensuring that each participant can knowingly accept randomization requires that she know the key information about the risks, benefits, and uncertainties involved, as individualized to her situation. Conventionally, this means that a certain minimum of information is given to the potential participant, who is then encouraged to ask any additional questions that may be of special relevance or interest. Obviously, the respondent to these inquiries must be knowledgeable in the subject area. This information and consent requirement can pose challenges to the achievement of a study's implementation goals. If the WHI CT proceeds as currently designed, it will require substantial resources to meet the obligation to inform actual and potential participants adequately. This obligation will require much more information about the interventions at the outset, as discussed below, as well as a commitment to provide evolving scientific information over the course of the project. Informed Consent The committee found the proposed informed consent measures to be inadequate. The committee was provided with Appendix IV “Informed Consent Guidelines” in the WHI Protocol, approved by the DSMB on June 16, 1993, and feels that the consent forms give no understanding of the likelihood or magnitude of major risks and benefits. Certain women at substantial risk of particular problems would not necessarily learn of the currently known

OCR for page 25
An Assessment of the NIH Women's Health Initiative investigators as to what the contract actually requires financially in terms of anticipated or unanticipated changes in the sample size, scope of work, etc. This must be clarified. For example, if several years into this project a PI faces greater expenses than were budgeted, how will this be handled? Will NIH provide additional support, will the institution in which the clinical center is based provide necessary funding—at the expense of the PI's other projects or from general funds? Or will the quality of work on the WHI tasks suffer? Some PIs believe that by signing the contract, the institution agreed to pick up any necessary additional costs. Others see a more standard contract, in which NIH would allocate more money if the scope of work were to change. What if, though, the scope were to remain the same but the costs increase? Still other investigators assume that NIH would not allow its investment to founder and would provide additional funding if necessary. Ancillary Studies The committee recognizes that Vanguard Clinical Center investigators and others have plans to request funds—from NIH and other public and private sources—to carry out studies ancillary to the WHI. Although adding in the funding amounts of those studies would increase the total cost associated with the WHI, the committee believes ancillary studies represent anticipated and desired side benefits to such a large trial. NIH has already proposed a mechanism for review of ancillary studies. Potential Causes of Budget Shortfalls Despite elegant planning and budgeting, there are predictable threats to maintaining a study of this scope within its budget. These include difficulty in recruiting participants, unanticipated staff turnover, inadequate adherence to the protocol, and larger than estimated cross-overs. As any one of these occurs, the budget will affected; for each additional problem encountered, the budget will be further challenged. Participant Recruitment If participant recruitment were to lag, there is no money available for increased—and usually more costly—staff time, clinic hours, or promotional materials. In response to questions by the committee regarding the ability to recruit minorities and older women and to monitor various demographic characteristics, such as SES and age level, during recruitment and enrollment, NIH pointed out repeatedly that the Vanguard Clinical Center investigators have signed contracts to produce specific recruitment results for the monies allocated. If a clinical center has insufficient funds in reserve to accomplish this, however, it will threaten the validity of the science in a number of ways, but most prominently in diminishing the power of the CT by reducing the person years of follow-up. If NIH plans to drop centers experiencing recruitment delays (the possibility for which has been adequately planned by randomizing within centers), those person-years attributable to

OCR for page 25
An Assessment of the NIH Women's Health Initiative a clinical center will be lost, thus weakening the study. These person-years of follow-up can be regained, but only at additional cost by increasing enrollment at other centers and/or extending the study. Thus, if the funding is not adequate to recruit and to implement the interventions with appropriate intensity, the tests of the hypotheses will suffer. If the informed consent interview (discussed above) were to include, as recommended by the committee, a fuller description of possible risks and benefits, along with estimated probabilities of their occurring for a given participant, fewer women may consent to be randomized. This could slow the recruitment rate or increase the efforts needed to compensate for the higher refusal rate. In either case, increments in costs would ensue. Staffing If there is unanticipated staff turnover, the study will incur additional costs. Especially in a 12-year study involving 45 centers, new staff must be recruited and trained, and this will raise related costs. Staff turnover also has the potential to delay recruitment and threaten adherence to the protocols, thus delaying the study with the accompanying financial and validity costs. Adherence Similar threats to the budget lie in attempts to achieve adequate adherence to the intervention regimens. If adherence to the DM or the medication schedule is weak, a clinical center could direct increased effort at education and incentives leading to increased adherence. Increased effort would translate into more staff time or more highly skilled staff, both at higher cost. If such additional resources were not available, any potential difference in outcomes between the intervention and control groups would be attenuated because of poor adherence to the intervention regimen and, therefore, the diminished difference in group exposures. Cross-Over Investigators anticipate some cross-over of study participants from intervention regimens to control and vice versa. The extent of that cross-over activity is difficult to estimate, especially in a clinical trial involving more than one intervention with potentially problematic side effects. Expected cross-over in the DM branch is exceptionally uncertain, because few studies have attempted dietary change over a 12-year duration. Percentages in excess of the small percentages planned of participants changing their dietary or medication patterns could affect necessary sample size needed to test various hypotheses. To overcome the effect of cross-overs, investigators would need to increase sample size at concomitant expense. The committee recognizes that NIH has considered many of these threats as well as others. Memoranda from NIH statisticians, for example, note that the occurrence of these difficulties will be monitored and sample size and power calculations would be adjusted as

OCR for page 25
An Assessment of the NIH Women's Health Initiative necessary. The committee remains concerned, however, that while such adjustments would be required, the adjustments alone could not ameliorate the effects on the study. Additional source of funding would be needed to maintain sufficient power for meaningful statistical comparisons. Summary The committee feels that all three WHI components cannot be done for the announced costs of $625 million. In terms of the total cost of the WHI, with the Clinical Coordinating Center at $142 million, if the Vanguard Clinical Centers are funded at $10.4 million each and if additional centers are funded at $8 to $9 million each, this will account for approximately $570 to 580 million of the $625 million that has been committed before consideration of the funding of the Community Prevention Study. If additional centers are funded at $10 million each, the total committed funding would be, however, $600 million, leaving only $25 million for the CPS. Are the proposed Vanguard Clinical Center budgets adequate? There is apparently a good deal of variability in Vanguard Clinical Center budgets and in expected institutional commitments. Certainly many of the Vanguard Clinical Center representatives stated comfort with their ability to achieve the requirements of the study for the budget, especially relying on institutional contributions. Some were not as certain. The integrity (and cost estimates) of WHI depend on the collective whole, not just those who are confident. The committee concluded that the planned expenses are not excessive in relation to the research tasks they are to cover. NIH's publicizing of the WHI as one mega-study may enhance its chances at recruitment and public health promotion publicity. However, this characterization masks the fact that WHI is several studies of lesser cost combined in a single package. After extensive formal and informal conversations with NIH, PIs, and other Vanguard Clinical Center representatives, the committee gets the picture of a very tightly budgeted trial—if nothing goes wrong. Should things not all go according to plan and estimate, however, there is little room for correction. The committee feels that the majority of the Vanguard Clinical Centers could probably function with the formal and informal arrangements in place but this is probably not true for all of the additional clinical centers. This does not give the committee the confidence to state that the funding of the WHI, as now designed, is adequate. Because of the many uncertainties, the committee is uneasy regarding whether the budgeted funds will be adequate to carry out the WHI even if nothing unanticipated goes awry. The committee believes that the WHI CT will face enormous difficulties along the lines discussed above. In addition, it is impossible to assess the firmness of the nebulous soft costs that many institutions have committed to over the 14 years, which will probably span

OCR for page 25
An Assessment of the NIH Women's Health Initiative different institutional administrations. In sum, the committee believes that the project cannot be fully completed as planned within the current budget. FINDINGS AND SUGGESTIONS The committee feels that the Women's Health Initiative (WHI) had inadequate peer review from within NIH or from outside scientists. Although various elements of the WHI were reviewed at one time or another (e.g., the dietary modification trial was reviewed many times in earlier proposals, none of which were allowed to proceed), the committee's impression is that the complicated interlocking combination of the clinical trial and the observational study at the inter-Institute level was not reviewed as rigorously as the usual Institute-initiated project. It seems that this inter-Institute study fell outside the established review process. The committee suggests that NIH reexamine and strengthen the mechanism through which it reviews future inter-Institute proposed projects. The committee concentrated on two fundamental questions. Can the design answer the questions it addresses, if no operational difficulties occur? If the study design is appropriate, what threats are there to the successful completion of the study? The committee identified seven issues involving conceptual problems that are built into the design. Even if all study operations were to proceed without incident, these design issues threaten the validity of the findings. Where appropriate, the committee has also suggested strategies to overcome the difficulties. Factorial Design NIH argued that conducting a partial factorial design would reduce the required number of women and attendant costs and allow assessment of interactions among intervention branches. The committee feels that the factorial design has major drawbacks. The overlap of 15.9 percent between the DM and HRT interventions is insufficient to provide adequate statistical power to assess interactions, and the difficulties of maintaining adherence to two or three interventions detracts from the attractiveness of a factorial design. In essence, the integrated design has become primarily a matter of economic efficiency; it is not essential to hypothesis testing.

OCR for page 25
An Assessment of the NIH Women's Health Initiative Sample Characteristics In determining sample size, the study design relies heavily on extremely uncertain assumptions regarding magnitude of effect and lag times. This concern is a factor in the recommendation described below regarding study duration. Participants will not be categorized by risk for breast cancer, colorectal cancer, or coronary heart disease. This allows a more generalizable study, but the lack of risk restrictions requires a much larger sample size. The factorial design does not allow specific branches to focus on the most efficient samples, such as women at high risk of CHD for an HRT trial or women at high risk of breast cancer for a DM trial, according to NIH assumptions. Proposed Analytic Techniques Committee concerns center on choice of endpoints for trial closeout and the planned use of methods to adjust for multiple comparisons when considering interim decisions by the Data and Safety Monitoring Board (DSMB). The committee believes that studywide material must inform potential participants of risks as well as benefits. The committee suggested that unadjusted data be made available to the DSMB. The committee felt that the Bonferroni statistical adjustment, for which current analysis plans call, might be too conservative and therefore might deprive many participants of an appropriately timed conclusion to the study. The committee also suggested the use of two-sided tests of significance to maintain a scientifically-justified neutral stance regarding whether the interventions might yield beneficial or adverse effects. Ethics The informed consent measures do not provide an adequate understanding of the likelihood or magnitude of major risks and benefits. The obligation to inform potential and current research participants would require much more information at the outset, as well as a commitment to provide evolving information over the course of the project. The committee suggested that the counselors at the clinical centers be knowledgeable and have access to algorithms, guidelines, and printed material about known risks and benefits. These counselors would need supervision, training, and monitoring. In addition, new information from this as well as other pertinent trials (as judged by the WHI coordinators and the DSMB) must be shared with the participants to allow them to make their own decisions about ongoing risks and benefits of the interventions. The inclusion of several interventions with several endpoints in a single trial makes the stopping rules difficult to formulate.

OCR for page 25
An Assessment of the NIH Women's Health Initiative Therefore, the committee suggested that the DSMB should (a) use preexisting or external information to establish a prior probability that internal data could confirm (this might mean accepting an earlier “stopping” conclusion than would be justified by data arising solely from the CT); (b) perform pre-specified subset analyses on participant groups that are especially likely to evidence harm or benefit; (c) ask to examine uncorrected estimates of effect and do any analyses it feels are warranted; (4) review the monitoring of the consent process; and (5) evaluate pre-specified event rates for potential morbidity and mortality outcomes. Minority Analysis Plan As currently designed, the study will have insufficient power to compare individual minority groups to the majority population. The study will be able to observe differences, if they exist, but will not be able to test them with adequate power. The committee encourages NIH to make these limitations known to those who may be expecting definitive comparative findings among minority and majority groups. Specificity of Intervention and Effect The CT design does not distinguish which element of the low fat dietary pattern may be responsible for any observed outcome. Similarly, the design will not allow analyses to distinguish whether calcium or calcium plus vitamin D is responsible for any observed outcome. Because some endpoints can be affected by more than one of the study interventions, and because the factorial design is modified by participant decisions, the overlap and interactions will be difficult to analyze. Outcome Definition and Measurement Threats to accurate and unbiased endpoint detection include the obscure meaning of many mammography-detected tiny malignancies; the unstandardized method of detecting colorectal cancer; and the inadequate development of behavioral, psychological, and quality of life measures for use in the study. The committee encourages NIH to include measures of constructs such as pain, mobility, and psychological status. In addition to the conceptual problems described above, any study —nomatter how well designed—is subject to setbacks by operational problems. The WHI CT is particularly vulnerable to such problems because of its size, complexity, and duration. The committee has identified five operational issues that could jeopardize the success of the study:

OCR for page 25
An Assessment of the NIH Women's Health Initiative Recruitment, Retention, and Adherence The message of the study is not adequately developed and may be misleading. The committee suggests that NIH and the clinical centers develop an overall message for the study that pays particular attention to long-term recruitment strategies for older and minority participants, and does not emphasize the WHI as a breast cancer prevention trial. In addition, investigators should set higher standards for studywide materials than currently appears to exist, including introductory brochures, consent forms, and videotape information. This information should be available in conversational language. NIH has made overly optimistic assumptions about recruitment, retention, and adherence, especially in subgroups with which researchers have less clinical trial experience, such as older women, minority women, and the spectrum of socioeconomic status (SES) and in recruitment plans that cover many years. Nevertheless, the committee encourages NIH to seek diversity within the sample and suggests that attempts should be made to include the entire SES range in this study. The acceptability of the various branches of the CT to women is unclear at this stage, especially since the interventions are difficult and have potential side effects. To maintain adequate statistical power, the CT must have funds available to boost recruitment efforts if, as the committee expects, recruitment rates are lower than anticipated. Secular Trends If secular trends toward a decreasing fat content in the U.S. diet continue, and if there is appreciable nonadherence in the DM treatment group, the difference between the treatment and control diets is likely to be too small to show a treatment effect. Provision of Health Care Services to Participants The current protocol includes a referral to a regular source of care. This is not adequately responsible. The committee suggests that the clinical centers must continue to develop adequate links with reliable community providers to ensure that adequate follow-up care is available. It may become essential for the project to pay for some kinds of follow-up for some poor or uninsured women. Research staff need to spend considerable time discussing side effects with participants, and dealing with associated apprehension, both in the clinic and on the telephone. To fail

OCR for page 25
An Assessment of the NIH Women's Health Initiative to do so is to risk unethical behavior and increased study dropout. The current budget may not include adequate staff time for these activities. Cost The committee believes that the total costs of the CT will be greater than the $625 million provided by NIH. NIH and Vanguard Center representatives have indicated that the additional funds necessary for successful completion of the trial will be covered by the institutions at which the Vanguard Centers are based. This reliance on institutional support may be reasonable in the case of the Vanguard Centers, but the committee felt it is unlikely that an additional 29 institutions can be identified that have both the experience to carry out the tasks of high quality research and the ability to provide additional resources. Potential sources of budget shortfall include lagging participant recruitment, which could require increased staff resources; staff turnover, which could require training and travel resources and might delay recruitment, threaten adherence, and, therefore, affect study validity; and cross-over of participants between study intervention regimens and control status. The CT funding per person per year is less than half that for other recent NIH studies of women's health, including, specifically, those that use similar drug regimens and approaches. There does not seem to be a budget adjustment plan for unanticipated changes in either the scope of work or medical technology during the course of the trial. In addition to its concerns about initial funding levels, the committee was concerned about long-term funding and suggested that NIH clarify what the contract requires financially in terms of anticipated or unanticipated changes throughout the duration of the study. RECOMMENDATIONS Finally, the committee was charged to begin with the existing WHI design, consider threats to its successful completion—whether design, financial, or ethical—and to consider whether it would yield reliable results. The committee recommends that the dietary modification-breast cancer hypothesis be considered a subsidiary rather than a primary hypothesis, shifting the emphasis to the effect of dietary modification on coronary heart disease outcomes, making those the primary hypotheses.

OCR for page 25
An Assessment of the NIH Women's Health Initiative The committee recommends that the consent process be outlined more carefully, be conscientiously implemented and monitored across all centers, and be evaluated and updated as needed. The committee recommends that the CT be scheduled to end in mid-2002, rather than close out the interventions by April 2005, and that the findings of an Objective Prescheduled Reassessment (OPR) be available by April 2002 (see Figure 2-4). The OPR, managed through an internal or external review board, would consider whether continuation or modification of the CT could be justified. Recruitment for the CT began in September 1993, so the project would run unimpeded for more than eight years (unless the Data Safety and Monitoring Board moves to stop the trial sooner based on external or interim data). Data analysis would begin in October 2001 and conclude with a recommendation by April 2002. Between October 2001 and the decision to extend, modify, or terminate, the CT would continue in its active mode. Sufficient time would be provided for closeout or redesign and data analyses. This recommendation addresses the primary concerns of the committee in the following ways: Data from nearly six years mean follow-up time would be available for the OPR. According to NIH power calculations (see Appendix J), this timeframe would allow hypotheses regarding stronger, expected associations (HRT and coronary heart disease; and HRT and combined fractures) to be tested and findings disseminated in a timely manner. If the intervention effect is strong, this timeframe also allows the hypotheses regarding the weaker, expected associations (DM and CHD; CaD and hip fractures; and HRT and hip fractures) to be tested. This timeframe does not allow for adequate follow-up for the DM and breast cancer hypothesis, the DM and colorectal cancer hypothesis, or the HRT and breast cancer hypothesis. However, the committee feels that, as currently designed, the CT does not have a high probability of yielding statistically significant results for the DM and breast cancer hypothesis or the HRT and breast cancer hypothesis, even after more prolonged follow-up. The committee would therefore prefer to see the other hypotheses analyzed in an appropriate timeframe. While the DM and colorectal cancer hypothesis is reasonable, it alone does not justify continuing the CT. This recommendation allows an assessment that would be informed by recruitment, retention, adherence, and incidence experience; if any of these estimates have not been or are not being met, the problem can be addressed. For example, if HRT is demonstrated to be favorable compared with control, the CT could reassign the control participants (with their permission) to ERT or PERT, thus increasing statistical power for that direct comparison, which as designed is not currently adequate. If there is evidence that the DM-breast cancer investigation should continue, justifications for that should be offered at the same time. If recruitment or adherence experience is so poor

OCR for page 25
An Assessment of the NIH Women's Health Initiative that an adequate test of a hypothesis would not be possible in any reasonable time frame, the CT or a branch of it could terminate. If, on the other hand, recruitment or adherence problems are discretely identifiable, the study could be redesigned for the remaining duration to compensate for these problems. Any clinically beneficial findings of the CT can be made available to participants. Clinical knowledge resulting from other studies can also be applied to participants in both intervention and control arms of the CT. Therefore, WHI investigators would not be pressured to deny benefits to women in the CT to keep intact its overlapping studies.

OCR for page 25
An Assessment of the NIH Women's Health Initiative FIGURE 2-4 Objective Prescheduled Reassessment Timeline.