Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 16
The Selection of Endpoints in
Evaluative Research
JOHN P. BUNKER
Having repeatedly urged that we make a greater investment in the evaluation
of medical technologies, it is perhaps only fitting that I discuss the endpoints
one should address during the various stages of the development process, and
when one might rely on intermediate endpoints as surrogates for clinical end-
points. I will consider condition-specific mortality versus all-cause mortality,
and where mortality is not a central issue—condition-specific outcomes ver-
sus all-cause outcomes. I will also address the underlying issue of risks and
benefits; that is, the issue of trade-offs in the evaluation of therapeutic technology.
SURROGATE VERSUS CLINICAL ENDPOINTS
The surrogate-versus-clinical endpoints battle is particularly prominent now
in the drug arena. The issue is when new drugs should be released for clinical
use. Under most circumstances, the Food and Drug Administration (FDA) has
required evidence of clinical improvement and has rejected surrogate endpoints
in making such decisions. The resultant delay has brought continuing opprobri-
um on the FDA. In a highly controversial and well-publicized decision, the
FDA initially withheld approval of tissue-type plasminogen activator (t-PA),
although evidence had been presented that :-PA lysed coronary thrombi and that
arterial patency was achieved more frequently with :-PA than with streptoki-
nase. But there was no evidence at the time that :-PA increased survival over
that obtainable with streptokinase. Among the outraged critics of the decision
was the Wall Street Journal which, under the headline "Human Sacrifice,"
mounted one of its many attacks on the regulatory bureaucracy of the FDA.
The FDA later did approve t-PA, which for a brief period appeared the treat-
16
OCR for page 17
ENDPOINTS IN EVALUATIVE RESEARCH
1,
7
ment of choice. Now there are beginning to be second thoughts. A New
England Journal of Medicine report from New Zealand (1), showing no differ-
ence in ventricular function, coronary artery patency rates, and
reinfarction again, incidentally, surrogate endpoints suggests that there is no
difference between these two drugs other than cost. One should bear in mind
that the sample size was small, with 130 and 135 patients receiving streptoki-
nase and :-PA respectively. Of course, one major advantage of surrogate end-
points is that smaller sample sizes may be adequate. Even with such a small
sample, it is interesting to note that after 30 days there were 10 deaths in
patients receiving streptokinase and 5 in patients receiving t-PA; after 9 months
there were 12 and 8 deaths respectively. While the differences in mortality may
appear suggestive, the p-values, 0.2 and 0.34 for the two time periods, did not
reach the conventional level of statistical significance.
The other major surrogate-versus-clinical endpoints battle has been fought
over cancer chemotherapy. Again, the FDA has been denounced by the Wall
Street Journal for foot-dragging. The question under debate is whether we
should expedite the introduction of drugs and under what circumstances. It has
always seemed to me that for most drugs the public is better served by the rela-
tively measured and cautious policy adopted by the FDA. My personal view
reflects a concern for the risks, both known and unknown, of hastily introduced
technology. I believe it was Harold Green, chair of the 1973 Artificial Heart
Assessment Panel, who suggested that a delay in introducing a new therapy
means only that the public has to live with the status quo, while the widespread
use of inadequately tested treatments can possibly expose the public to substan-
tial harm. The views of potential recipients of treatment may be quite different,
depending on the severity of the condition. While most of medicine is con-
cerned with conditions that are not life-threatening, it is entirely appropriate that
we adopt different attitudes and policies for introducing drugs which treat life-
threatening conditions as opposed to those for treating the large proportion of
routine medicine.
However well or cautiously we evaluate drugs in Phases I, II, and III, a
major shortcoming in how we introduce drugs in this country is in follow-up.
Once a drug has been introduced, we have no systematic and comprehensive
way to detect or control long-term risks and benefits. It has been observed that
Great Britain is willing to introduce drugs at an earlier stage in their develop-
ment because the British system of post-marketing surveillance (PMS) may be
more effective than the United States post-marketing system; see for instance
the contribution of Inman in this volume. It is a source of considerable chagrin
that our country failed to act on the recommendations of the President's
Commission on Post-Marketing Surveillance that would have established a reli-
able system of PMS a decade ago.
The problem of post-marketing surveillance is at least as great for medical
devices and procedures as for drugs. Surgeons in particular do not have good
data on long-term outcomes. Note, for example, the incredulity of urologists
OCR for page 18
8
JOHN P. BUNKER
who learned from John Wennberg's research about the number of patients who
die within a year after transurethral resection of the prostate (see Chapter 41.
CLINICAL ENDPOINTS
I will return to the question of how good a surrogate for clinical endpoints an
intermediate endpoint may be, but first it will be useful to examine the clinical
endpoint itself: How reliable are clinical endpoints? Are they adequate gold
standards themselves? The debate over condition-specific versus all-cause mor-
tality is particularly interesting and sobering. It is well recognized that all-cause
mortality is a purer endpoint than disease-specific deaths, because all-cause
mortality helps avoid such problems as bias in patient selection, missing data,
and changes in classification over time.
Proponents of new therapies understandably would prefer to judge their
results on the basis of the specific condition the treatment is intended to relieve.
An investigator might well ask why death from a completely unrelated cause
should count against the proposed therapy. But it is not always clear that the
"unrelated" cause is really unrelated. The latest example to come to my atten-
tion is a report from Scotland, in the British Medical Journal, in which the
authors report an observational study correlating blood cholesterol levels with
cardiac deaths and other endpoints, cancer in particular (2~. The investigators
found the predicted association between cholesterol level and cardiac deaths,
but the reduction in cardiac deaths associated with lower cholesterol was offset
by an equal increase in cancer deaths.
You may be familar with the unpleasant fact that in three major lipid drug tri-
als, the fall in cardiac mortality associated with lower blood cholesterol was off-
set by increased accidental deaths in the experimental groups, and total mortali-
ty was unchanged (3,4,5~. Investigators still are trying to figure out whether
this awkward relationship between lower cholesterol and accidental deaths is
causal.
Offsetting mortalities can, of course, go the other way. In studying the possi-
ble condition-specific mortality risk of a therapy, it is equally important to
examine the possibility that the therapy produces an offsetting fall in total mor-
tality. For example, in the National Halothane Study, we were concerned that
some patients receiving the anesthetic halothane would die of liver failure. We
were also aware of the possibility that halothane, because of its superior clinical
properties, might have offsetting decreases in mortality from other causes. As it
turned out, there were but a handful of deaths from liver necrosis, and these
were more than offset by a fall in all-cause mortality for patients receiving
halothane.
From the foregoing considerations, I posit that the phenomenon of offsetting
risks is important and perhaps not adequately appreciated. It is by no means
limited to mortality. Mortality is what we tend to study, not only because it is
OCR for page 19
ENDPOINTS IN EVALUATIVE RESEARCH
19
important, but also because it is easier to measure than many other things we
would like to know and that are also important. When a technology intended to
improve quality of life has both benefits and risks, they are likely to be very dif-
ficult to compare. It is the old apples-versus-oranges problem, but even worse
since there may be several baskets of different fruits to be balanced in the equa-
tion.
Improvement in quality of life is not only an important outcome of medical
care; it is the only intended outcome of most of what we do in medicine. In
commenting on the failure of cholesterol-lowering drugs to reduce total mortali-
ty, Fries, Green, and Levine point out that "the primary purpose of most health
promotion activities . . . is to improve quality of life" and that, by implication, it
may be unrealistic to expect or demand that length of life be extended (61. More
important, they suggest, is the decreased morbidity and improved quality of life
that accompany a decrease in risk factors and improved cardiovascular function.
There are any number of therapies, intended to improve quality of life, which
have offsetting adverse effects. I will mention a few: thalidomide causing pho-
comelia, diethylstilbestrol (DES) causing vaginal cancer in the offspring of
women receiving it, swine flu vaccine followed by Guillain-Barre syndrome.
The latter is of particular interest because a very large clinical trial was not large
enough to pick up the rare but extremely serious syndrome. The ever-present
risk of side effects, many unknown, with everyday treatment is part of the price
we must pay when therapy is effective. It is not quite so easy to accept the
inevitable complications and ill effects of other therapies, such as the severe
malabsorption problems that followed gastric bypass surgery, metabolic imbal-
ances that could have been easily predicted.
Two common operations that are performed to improve quality of life may
have an opposite effect. As Wennberg et al. remind us, prostatectomy is often
followed by impotence and incontinence (71. Hysterectomy may be followed by
depression and an increase in urinary tract infections. Recent data suggest,
however, that the improvements in quality of life for many or most patients
undergoing elective hysterectomy or prostatectomy may more than offset the
potential ill effects of the procedures. To balance the quality-of-life benefits and
risks of such procedures we must consider the values of the patient. These
depend heavily on how individual patients perceive the benefits and risks of the
procedures. Unfortunately, we do not yet know how to present the issue of risk
to patients in a meaningful way; nor, I suspect, do those of us in the profession
fully understand these risks. It is clear, however, that different patients have dif-
ferent values, and that patients' values may differ widely from those ascribed to
them by their physicians (81.
There is another difficulty. A quality-of-life therapy may have as its goal a
single condition-specific benefit that is easily measurable, but we don't have
any single all-cause index to identify and measure possible offsetting negative
effects. Indeed, we may not even know what side effects to look for when a
OCR for page 20
20
JOHN P. BUNKER
therapy is introduced. An observational study before introduction may, howev-
er, give us some clues as to potential side effects to look for in long-term
surveillance.
As Chapter 4 indicates, all outcomes that are relevant to a patient should be
included in evaluative research: mortality and morbidity, complications, symp-
tom reduction, and functional status improvement, as well as the standard physi-
ologic and biochemical surrogates. For this purpose, Fries and Spitz, in a
recently published book, have proposed a hierarchy of quality-of-life assess-
ment indices for surveillance: death, disability, discomfort, drug side effects,
and dollar costs (9), each of the latter four subdivided into relevant components
(e.g., pain, fatigue, depression, anxiety).
While it is of course desirable to obtain a definitive evaluation of a new
product or treatment as soon as possible, haste can create serious problems, as
we saw with t-PA. Another example of the importance of timing is the use of
injected chymopapain as an alternative treatment for the relief of ruptured inter-
vertebral disks. In clinical trials it appeared safe and effective. But serious
complications (transverse myelitis and anaphylactoid reactions) were reported
shortly after the FDA released chymopapain for general use (101. A third inter-
esting and sobering example is the recent report that, in randomized clinical tri-
als comparing mastectomy with mastectomy plus radiation to the chest wall for
breast cancer, there was a late increase in serious cardiac events, coronary artery
disease, and unrelated malignancies in the group receiving radiation (11~. With
:-PA and chymopapain the adverse effects were detected very quickly. With
DES and with radiation in the foregoing example, they occurred much later. It
may even be necessary to wait years.
Radiation techniques for breast cancer have changed and presumably
improved considerably, so that patients undergoing lumpectomy and radiation
now may be spared these complications, but we simply do not know if that is
so. As Chapter 12 points out, devices and procedures are generally subject to
incremental innovation. Not only will an operation or procedure differ among
different physicians, but the procedure or device itself will be modified over
time. It may be difficult ever to know when to evaluate devices and procedures
and we may therefore need to follow patients for long periods. We need to
invoke the right and accept the responsibility to review the effects of treatment
continually, and to revise our clinical decisions as new evidence becomes avail-
able.
SURROGATE ENDPOINTS
Returning to surrogate endpoints: they have all the problems of clinical end-
points plus a good many of their own. They can be related only to condition-
specific outcomes, and their relationship to hoped-for clinical outcomes may not
be a strong one. I might point out that this is analogous to the well-known prob-
OCR for page 21
ENDPOINTS IN EVALUATIVE RESEARCH
21
lem central to the medical audit, the process-versus-outcome relationship that
we have all womed about.
The potential usefulness of surrogate endpoints during the early stages of
development appears to be strongest in the cardiac area, but we have already
seen the problems experienced with t-PA. The use of surrogate endpoints has
also been explored with some enthusiasm for cancer chemotherapy, with shrink-
age of tumor size the usual proposed surrogate for increased life expectancy.
However strong the association between such surrogates and their intended
effects may prove to be, a serious limitation of surrogates as a basis for evalua-
tion is that none of the offsetting adverse effects can be determined when SU~To-
gate outcomes are used.
I would like to call your attention to the April 1989 issue of Statistics in
Medicine, the first four articles of which are devoted to discussion of surrogate
endpoints. They explore in depth three conditions that one might consider as
having the greatest potential: cancer (12), cardiovascular disease (13), and oph-
thalmologic disorders (14~. One advantage that is emphasized is that surrogates
provide earlier answers. But it is of interest that, in its attempt to expedite the
availability of drugs to treat AIDS and cancer, the FDA has not moved to allow
surrogate endpoints; the enhanced speed is achieved by collapsing Phases II and
III and giving such drugs priorly treatment (J. Goyan, personal communication,
1989~.
In conclusion, I will make four points. First, when dealing with mortality as
an endpoint of treatment, all-cause mortality is ignored at the peril of the inves-
tigators and the public. Second, when dealing with quality of life, multiple or
hierarchical endpoints must be considered; their identity may not be known in
advance; arid they cannot be summarized in a single number, for there is no all-
cause quality of life equivalent of all-cause mortality. Third, a more systematic
and comprehensive method of long-te~m monitoring or surveillance is needed.
If one is established, a greater reliance on surrogate endpoints might be justified.
Finally, we must be concerned with the complex issue of an informed public's
wants and values.
REFERENCES
1. White HD, Rivers JT, Maslowski AH, et al. Effect of intravenous streptokinase as
compared with that of tissue plasminogen activator on left ventricular function after
first myocardial infarction. New England Journal of Medicine 1989;320:817-821.
2. Isles CG, Hole DJ, Gillis CR, Hawthorne VM, Lever AF. Plasma cholesterol, coro-
nary heart disease, and cancer in Renfrew and Paisley survey. British Medical
Journal 1989;298:920-924.
3. Frick MH, Elo O. Haapa K, Heinonen OP. Helsinki Heart Study: Primary-preven-
tion trial with gemfibrozil in middle-aged men with dyslipidemia. New England
Journal of Medicine 1987;317:1237-1245.
OCR for page 22
22
JOHN P. BUNKER
4. Coronary heart disease death, nonfatal acute myocardial infarction and other clini-
cal outcomes in the Multiple Risk Factor Intervention Trial Research Group.
American Journalof Cardiology 1986;58:1-13.
5. The Lipid Research Clinics Coronary Primary Prevention Trial results. I.
Reduction of incidence of coronary heart disease. Journal of the American Medical
Association 1984;251:351-364.
6. Fries JF, Green LW, Levine S. Health promotion and the compression of morbidity.
Lancet 1989;1:481-483.
7. Wennberg JE, Roos N. Sola L, Schori A, Jaffe R. Use of claims data systems to
evaluate health care outcomes: Morbidity and reoperation following prostatectomy.
Journal of the American Medical Association 1987;257:933-936.
8. McNeil BJ, Weichselbaum R. Pauker SG. Fallacy of the five-year survival in lung
cancer. New England Journal of Medicine 1978;299:1397.
9. Fries IF, Spitz PW. The hierarchy of patient outcomes. In SpiLker B (ed.) Quality of
Life Assessments in Clinical Trials. New York: Raven Press, 1990:25-35.
10. Blue Shield of California, Medical Policy Committee (March 4, 1987~.
11. Houghton J. Baum M. Adjuvant radiotherapy in breast cancer: Considerations of
cost-benefits in relation to the CRC (King's/Cambridge) trial. International Journal
of Technology Assessment in Health Care 1989;5:415-422.
12. Ellenberg SS, Hamilton JM. Surrogate endpoints in clinical trials: Cancer.
Statistics in Medicine 1989;8:405-413.
13. Wittes J. Latos E. Surrogate endpoints in clinical trials: Cardiovascular diseases.
Statistics in Medicine 1989;8:415-426.
14. Hillis A, Seigel D. Surrogate endpoints in clinical trials: Ophthalmologic disorders.
Statistics in Medicine 1989;8:427-430.
Representative terms from entire chapter:
clinical endpoints