Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 121
3
Evidence and Decision-Making
In Chapter 2, the committee recommends a framework for the US Food and
Drug Administration (FDA) regulatory decision-making process in which scien -
tific evidence plays a critical role, together with other factors including ethical
considerations and the perspectives of patients and other stakeholders. This chap-
ter focuses on the evaluation of the scientific evidence and on how FDA should
use evidence in its decisions. Just as courts determine when evidence is admis -
sible and which standard of proof to apply in a given case, scientific evidence
must be evaluated for its quality and applicability to the public health question
that is the focus of regulatory decision-making. FDA needs to base its decisions
on the best available scientific evidence related to that question. Different people,
however, can interpret and judge scientific evidence in various ways. Decisions
in which there is disagreement among experts about what decisions are best sup -
ported by a given body of evidence are among the most difficult that FDA must
make. For these decisions to properly incorporate all the relevant uncertainties
and values, the regulators need to understand the bases of the various judgments
that the experts are making. As has been shown in many difficult cases that FDA
has had to decide, evidence does not speak for itself.
This chapter will categorize and discuss the sources of technical disagree -
ments between experts about the kinds of data that FDA typically deals with.
It will start with a short primer on approaches to statistical inference, with an
introduction to Bayesian methods, followed by a discussion of the distinctions
between scientific data and evidence. It then discusses why scientists sometimes
disagree about the evidence of a drug’s benefits and risks and how their disagree -
ments may affect regulatory decision-making.
121
OCR for page 122
122 STUDYING THE SAFETY OF APPROVED DRUGS
STATISTICAL INFERENCE AND DECISION-MAKING
Evidence
Although the terms data and evidence are often used interchangeably, data
is not a synonym for evidence. The Compact Oxford English Dictionary defines
data as “facts and statistics collected together for reference or analysis” and evi-
dence as “the available body of facts or information indicating whether a belief
or proposition is true” (Oxford Dictionaries, 2011). The difference is whether or
not the information is being used to draw scientific conclusions about a specific
proposition. In the context of a drug study, the “proposition” is a hypothesis about
a drug effect, often stated in the form of a scientific question, such as “Do broad-
spectrum antibiotics increase the risk of colitis”? In the broader context of FDA’s
regulatory decisions, the proposition may be implicit in the public health question
that prompts the need for a regulatory decision, such as, “Does the risk of coli -
tis caused by broad-spectrum antibiotics outweigh their benefits to the public’s
health”? In this way, evidence is defined with respect to the questions developed
in the first step of the decision-making framework described in Chapter 2.
Statistical methods help to ascertain the “strength of the evidence” support-
ing a given hypothesis by measuring the degree to which the data support one
hypothesis rather than the other. The evidence in turn affects the likelihood that
either hypothesis is true. The most common scientific hypothesis in the realm of
drug evaluation is the “null hypothesis”—that in a given treated population, the
drug has no effect relative to a comparator treatment. For the concept of evidence
to have meaning, however, there must be at least one other hypothesis under
consideration, such as that the drug has some effect.
A small change in the scientific hypotheses being compared can change
the strength of the evidence provided by a given set of data. For example, if the
question above changed from whether broad-spectrum antibiotics produce any
increase in the risk of colitis to whether broad-spectrum antibiotics produce a
clinically important increase in the risk of colitis—say, an increase of more
than 10 percent—the strength of the evidence provided by the same data could
change. Where one observer might see a four percent increase in risk as strong
evidence of some excess risk, another could regard it as strong evidence against a
10 percent increase in risk.1 Agreement on the strength of the evidence therefore
requires agreement on the hypotheses being contrasted and on the public health
questions that gives rise to them.
1 Confusion can result from use of the word significant to describe an effect that is both statistically
significant and clinically relevant; the latter is often termed clinically significant. The two uses should
remain separate.
OCR for page 123
123
EVIDENCE AND DECISION-MAKING
Inference
Good science, together with proper statistics, has a dual role. The first role is
to decrease uncertainty about which hypotheses are true; the second is to properly
measure the remaining uncertainty. These are carried out in part through a process
called statistical inference. Statistical inference involves the process of summariz-
ing data, estimating the uncertainty around the summary, and using the summary
to reach conclusions about the underlying truth that gave rise to the data.
The two main approaches to statistical inference are the standard “frequen -
tist” approach and the Bayesian approach. Each has distinctive strengths and
weaknesses when used as bases for decision-making; including both approaches
in the technical and conceptual toolbox can be extraordinarily important in mak -
ing proper decisions in the face of complex evidence and substantial uncertainty.
The frequentist approach to statistical inference is familiar to medical research -
ers and is the basis for most FDA rules and guidance. The Bayesian approach is
less widely used and understood, however, it has many attractive properties that
can both elucidate the reasons for disagreements, and provide an analytic model
for decision-making. This model allows decision-makers to combine the chance
of being wrong about risks and benefits, together with the seriousness of those
errors, to support optimal decisions.
The frequentist approach employs such measures as P values, confidence
intervals, and type I and II errors, as well as practices such as hypothesis-testing.
Evidence against a specified hypothesis is measured with a P value. P values are
typically used within a hypothesis-testing paradigm that declares results “statisti -
cally significant” or “not significant”, with the threshold for significance usually
being a P value less than 0.05. By convention, type I (false-positive) error rates
in individual studies are set in the design stage at 5 percent or lower, and type II
(false-negative) rates at 20 percent or below (Gordis, 2004).
In the colitis example, if the null hypothesis posits that broad-spectrum
antibiotics do not increase the risk of colitis, a P value less than 0.05 would lead
one to reject that null hypothesis and conclude that broad-spectrum antibiotics
do increase the risk of colitis. The range of that elevation statistically consistent
with the evidence would be captured by the confidence interval. If the P value
exceeded 0.05, several conclusions could be supported, depending on the loca-
tion and width of the confidence interval; either that a clinically negligible effect
is likely, or that the study cannot rule out either a null or clinically important
effect and thus is inconclusive. In the drug-approval setting, the FDA regulatory
threshold of “substantial evidence”2 for effectiveness is generally defined as two
well controlled trials that have achieved statistical significance on an agreed
upon endpoint, although there can be exceptions (Carpenter, 2010; Garrison et
al., 2010).
2 21 USC § 355(d) (2010).
OCR for page 124
124 STUDYING THE SAFETY OF APPROVED DRUGS
Hypothesis-testing provides a yes-or-no verdict that is useful for regulatory
purposes, and its value has been demonstrated over time, both procedurally and
inferentially. Its emphasis on pre-specification of endpoints, study procedures
and analytic plans has regulatory and often inferential benefits. But hypothesis
tests, P values, and confidence intervals do not provide decision-makers with
an important measure—the probability that a hypothesis is right or wrong. In
settings where a difficult balancing of various decisional consequences must be
made in the face of uncertainty about both the presence and magnitude of ben -
efits and risks, the probability that a given hypothesis is true plays a central role.
The failure to assign a degree of certainty to a conclusion is a weakness of the
frequentist approach when it is used for regulatory decisions (Berry et al., 1992;
Etzioni and Kadane, 1995; IOM, 2008; Parmigiani, 2002).
In contrast, the Bayesian approach to inference allows a calculation on the
basis of results from an experiment of how likely a hypothesis is to be true or
false. However, this calculation is premised on an estimated probability that a
hypothesis is true prior to the conduct of the experiment, a probability that is
not uniquely scientifically defined and about which scientists can differ. Both in
spite of this and because of this, Bayesian approaches can be very useful comple -
ments to traditional frequentist analyses, and can yield insights into the reasons
why scientists disagree, a topic that will be discussed in more depth later in this
chapter.
The use of Bayesian approaches is not new to FDA. FDA’s Center for
Devices and Radiological Health (CDRH) has published guidance for the use of
Bayesian statistics in medical device clinical trials (FDA, 2010a) and FDA has
used Bayesian approaches in regulatory decisions. A 2004 FDA workshop on
the use of Bayesian methods for regulatory decision-making included extensive
discussion by FDA scientists, as well as Center for Drug Evaluation and Research
(CDER) and CDRH leadership, of ways in which Bayesian approaches could
enhance the science of premarketing approval.3 Campbell (2011), director of the
CDRH Biostatistics division, discussed the uses of Bayesian methods for FDA
decision-making, and presented 17 requests for premarketing approval submitted
to and approved by the CDRH for medical devices that used Bayesian methods.
Although Bayesian methods have been little used by CDER, Berry (2006) dis -
cusses how a Bayesian meta-analysis served as the basis for a CDER approval of
Pravigard™ Pac (co-packaged pravastin and buffered aspirin) to lower the risk of
cardiovascular events. Bayesian sensitivity analyses were used to help evaluate
the literature investigating the possible association between antidepressants and
suicidal outcomes (Laughren, 2006; Levenson and Holland, 2006), elaborated
later in Kaizar (2006). Finally, FDA staff has recently proposed Bayesian meth -
odology for analysis of safety endpoints in clinical trials (McEvoy et al., 2012).
3 Published papers from the workshop are available in the August 2005 issue of Clinical Trials
(2:271-378).
OCR for page 125
125
EVIDENCE AND DECISION-MAKING
The Bayesian approach does not use a P value to measure evidence; rather, it
uses an index called the Bayes factor (Goodman, 1999; Kass and Raftery, 1995).
The Bayes factor encodes mathematically the principle presented earlier—that
the role of evidence is to help adjudicate between two or more competing hypoth-
eses. The Bayes factor modifies the probability of whether a hypothesis is true.
Decision-makers can then use that probability to characterize the likelihood that
their decisions will be wrong. In its simplest form, Bayes theorem can be defined
in the following equation (Goodman, 1999; Kass and Raftery, 1995):
The odds that a The odds that a The strength of
hypothesis is true = hypothesis is true × new evidence
after new evidence before new evidence (the Bayes factor)
The Bayes factor is sometimes regarded as the “weight of the evidence”
comparing how strongly the data support one hypothesis (or combination of
hypotheses) to another (Good, 1950; Kass and Raftery, 1995). Most important is
the role that the Bayes factor plays in Bayes theorem; it modifies the probability
that a given hypothesis is true. This concept that a hypothesis has a certain “truth
probability” has no counterpart in standard frequentist approaches.
There is not a one-to-one relationship between P values and Bayes factors,
because the magnitude of an observed effect and the prior probabilities of hypoth-
eses also can affect the Bayes factor calculation itself. But in most common
statistical situations, there exists a strongest possible Bayes factor, and that can
be defined as a function of the observed P value. That relationship can be used to
calculate the maximum chance that the non-null hypothesis is true as a function
of the P value and a prior probability (Goodman, 2001; Royall, 1997).
Assume that the null hypothesis is that a given drug does not cause a given
harm, and that the alternative hypothesis is that it does elevate the risk of that
harm. Table 3-1 shows how a given P value (translated into the strongest Bayes
factor) alters the probability of the hypothesis of harm, defining the null hypoth -
esis as stating that a given drug does not harm, and the alternative hypothesis
is that it does elevate the risk of that harm. For example, if a new randomized
controlled trial (RCT) yields a P value of 0.03 for a newly reported adverse effect
of a drug and there was deemed to be only a 1 percent chance before the RCT
of that unsuspected adverse effect being caused by the drug, the new evidence
increases the chance of the causal relationship to at most 10 percent (see Table
3-1). A regulatory decision predicated on the harm being real would therefore be
wrong more than 90 percent of the time.
Without a formal Bayesian interpretation, that high probability of error
would not be apparent from any standard analysis. Using conventional measures,
such a study might report that “a previously unreported association of tinnitus
was observed with the drug, OR [odds ratio] = 3.5, 95% CI [confidence interval]
1.1 to 11.1. P = 0.03”. This statement does not actually indicate how likely it is
OCR for page 126
126 STUDYING THE SAFETY OF APPROVED DRUGS
TABLE 3-1 Maximum Change in the Probability of a Drug Effect as a Function
of P Value and Bayes Factor, Calculated by Using Bayes’ Theorem
Maximum
Probability
P Value in Strongest Strength of Prior Probability After the
Evidencea of an Effect, %b
New Study Bayes Factor New Study, %
0.10 0.26 Weak 1 2.5
25 46
50 79
83 95
0.05 0.15 Moderate 1 6
25 69
50 87
76 95
0.03 0.10 Moderately 1 10
25 78
Strong
50 81
67 95
0.01 0.04 Strong 1 21
25 90
40 95
50 96.5
0.001 0.005 Very Strong 1 75
8 95
25 99
50 99.5
aThe qualitative descriptor of the strength of the evidence is made on the basis of the quantitative
change in the probability of truth of a null-null drug effect.
bThe prior truth probabilities of 1%, 25%, or 50% are arbitrarily chosen to span a wide range of
strength of prior evidence. The shaded prior probability illustrates the minimum prior probability re-
quired to provide a 95% probability of a drug effect after observing a result with the reported P value.
SOURCE: Modified from Goodman (1999).
that the drug actually raises the risk of tinnitus. For that, a prior probability is
needed, and the Bayes factor. If the mechanism or some preliminary observa -
tions justified a 25 percent prior chance of a harmful effect, the same evidence
would raise that to at most a 78 percent chance of harm—that is, at least a 22
percent chance that the drug does not cause that harm. Table 3-1 shows that after
observing P = 0.03 for an elevated risk of harm, in order to be 95 percent certain
that this elevation was true, the prior probability of a risk elevation would have
to have been at least 67 percent before the study. That might be the case if there
was an established mechanism for the adverse effect, if other drugs in the same
class were known to produce this effect, or if a prior study showed the same
effect.
In practice, however, there exist no conventions or empirical data to deter-
mine exactly how to assign such prior probabilities, although the elicitation of
prior probabilities from experts has been much studied (Chaloner, 1996; Kadane
OCR for page 127
127
EVIDENCE AND DECISION-MAKING
and Wolfson, 1998). FDA incorporated the notion of a prior informally in its
incorporation of “biologic plausibility” into decision-making of how to respond
to drug safety signals that arise in the course of pharmacovigilance, in March
2012 draft guidance (FDA, 2012):
CDER will consider whether there is a biologically plausible explanation for
the association of the drug and the safety signal, based on what is known from
systems biology and the drug’s pharmacology. The more biologically plausible
a risk is, the greater consideration will be made to classifying a safety issue as
a priority.
As demonstrated in the above paragraph, biologic plausibility and other
forms of external evidence are currently accommodated qualitatively; Bayesian
approaches allows that to be done quantitatively, providing a formal structure
by which both prior evidence and other sources of information (for example, on
common mechanisms underlying different harms, or their relationship to disease
processes) should affect decisions.
This discussion illustrates a number of important issues
• Given new evidence, the probability that a drug will be harmful can vary
widely depending on the strength of the prior or external information,
represented as a prior probability distribution.
• The chance that a drug will be harmful, based on P values for a harmful
effect in the borderline significant range (0.01–0.05), is often far lower
than is suspected, unless there are fairly strong reasons to believe in the
harm before the study.
• The Bayesian approach allows the calculation of intermediate levels of
certainty (for example, less than 95 percent) that might be sufficient for
regulatory action, particularly for drug harms.
• Without agreed-upon conventions or empirical bases for assigning prior
probabilities, the prior probabilities derived from a given body of evi-
dence will differ among scientists, resulting in different conclusions from
the same data.
The probability that a given harm will be caused by a drug is a key attribute
in regulatory decision-making. How sure regulators must be to take a given action
varies according to the consequences of decisions. In some cases, 95 percent
certainty might be needed, in others 75 percent, and in still others less than 50
percent. The Bayesian approach provides numbers that feed into that judgment
(Kadane, 2005).
Despite these advantages, one of the weaknesses of Bayesian calculations is
that there is no unique way to assign a prior probability to the strength of external
evidence, particularly if that evidence is difficult to quantify, such as biologic
OCR for page 128
128 STUDYING THE SAFETY OF APPROVED DRUGS
plausibility. Although it may be impossible to assess subtle differences in prior
probability, even crude distinctions can be helpful, such as whether the prior evi-
dence justifies probability ranges of 1–5 percent, 15–50 percent, 60–80 percent,
or 90+ percent. Such categorizations often provide fine enough discrimination to
be useful for decision-making. In the absence of agreement on prior probabilities,
“non-informative” prior distributions can be used that rely almost exclusively
on the observed data, and sensitivity analyses with different kinds of prior prob -
abilities from different decision-makers can be conducted (Emerson et al., 2007;
Greenhouse and Waserman, 1995). At a minimum, these prior probabilities
should be elicited and their evidential bases made explicit so that this potential
source of disagreement can be better understood, and perhaps diminished.
The difference between Bayesian and frequentist approaches can go well
beyond the incorporation of prior evidence, extending to more complex aspects
of how the analytic problem is structured and analyzed. Madigan et al. (2010)
provide a comprehensive suite of Bayesian methods to analyze safety signals
arising from a broad range of study designs likely to be employed in the post -
marketing setting.
WHY SCIENTISTS DISAGREE
When new information arises that puts into question a drug’s benefits and
risks, FDA’s decision-makers often face sharp disagreements among scientists
over how to interpret that information in the context of pre-existing information
and over what regulatory action, if any, should be taken in response to the new
information. Such disagreements are often unavoidable, and moving forward
with appropriate decision-making is difficult if the underlying reasons for them
are unknown or misunderstood. The committee identified a number of reasons
for the disagreements about scientific evidence that occur among scientists. Those
reasons, which are listed in Box 3-1, are discussed below.
Different Prior Beliefs About the Existence of an Effect
People’s beliefs about the plausibility of an effect of a drug are determined,
in part, by their knowledge and interpretation of prior evidence about the drug’s
benefits and risks (Eraker et al., 1984). That knowledge shapes their responses
to new evidence. Prior evidence can come directly from earlier clinical studies
of the drug’s effects, from studies of drugs in the same class that demonstrate
the effect, and from information about the drug’s mechanism of action. Newly
observed evidence might be interpreted as resulting in a higher chance that a drug
is harmful if earlier studies have also demonstrated the harm. If other drugs in the
same class have been associated with a particular adverse effect, the drug has a
higher prior probability of causing that effect than a drug in a class whose mem -
OCR for page 129
129
EVIDENCE AND DECISION-MAKING
BOX 3-1
Why Scientists Disagree About the Strength
of Evidence Supporting Drug Safety
Prior Evidence
1. Different weights given to pre-existing mechanistic or empirical evi-
dence supporting a given benefit or risk.
Quality of the New Study
2. Different views about the reliability of the data sources.
3. Different confidence in the design’s ability to eliminate the effect of
factors unrelated to drug exposure.
4. Different views on the appropriateness of statistical models.
Relevance of the New Evidence to the Public Health Question
5. Different views of the hypotheses needing evaluation.
6. Different assessments of the transportability of results.
Synthesizing the Evidence
7. Different ideas about how to weigh and combine all the available evi-
dence from disparate sources relevant to the public health question.
Appropriate Regulatory Response to the Body of Evidence
8. Different opinions among scientists regarding the thresholds of cer-
tainty to justify concern or regulatory action, which can affect how they
view the evidence
bers have not produced such an effect. If a drug has a mechanism of action that
has been implicated in a particular adverse effect, it has a higher prior probability
of causing that effect than a drug for which such a mechanism is implausible.
For example, the prior probability that a topical steroid would produce significant
internal injury would be very low because what is known about the absorption,
metabolism, and physiologic actions of topical steroids makes it difficult to
imagine how such an injury could occur, but the prior probability of an adverse
dermatologic effect would be much higher.
Evidential bases of prior probability can take two forms: an assessment of
the evidence supporting the mechanistic explanation of a proposed effect and the
cumulative weight of previous empirical studies. Marciniak, in the FDA Office
of New Drugs (OND) Division of Cardiovascular and Renal Products discussed
mechanism directly in a letter that was provided for a July 2010 FDA Advisory
Committee meeting related to Avandia (Marciniak, 2010):
OCR for page 130
130 STUDYING THE SAFETY OF APPROVED DRUGS
Others have speculated that rosiglitazone could increase MI [myocardial infarc -
tion] rates through its effects upon lipids or by the same mechanism whereby it
increases HF [heart failure] rates. There are no clinical studies establishing these
mechanisms. We propose that there is a third mechanism for which there is some
evidence from clinical studies. The third possible mechanism is the following:
The Avandia label states that “In vitro data demonstrate that rosiglitazone is
predominantly metabolized by Cytochrome® P450 (CYP) isoenzyme 2C8, with
CYP2C9 contributing as a minor pathway.” The published literature suggests that
rosiglitazone may also function as an inhibitor of CYP2C8 . . . . Allelic variants of
the CYP2C9 gene have been associated in epidemiological studies with increased
risk of myocardial infarction and atherosclerosis. . . . Recently, CYP2C8 vari -
ants has also been associated with increased risk of MI. . . . CYP2C9 and 2C8
catalyze the metabolism of arachidonic acid to vasoactive substances, providing
one potential mechanism for affecting cardiac disease. Interference with ciga -
rette toxin metabolism is another. . . . Rosiglitazone effects upon CYP2C8 and
CYP2C9 could be the mechanism for its CV adverse effects. Regardless, there
are several possible mechanisms for CV toxicity of rosiglitazone.
The above paragraph describes a mechanism that is fairly speculative, as
labeled. There is no suggestion or claim that such a mechanism would definitely
or even probably produce adverse cardiovascular effects. Rather, this particular
exposition is exploratory and aimed at establishing that such an effect is possible
rather than probable. Those who have a good understanding of this particular set
of pathways might interpret the explanation differently and establish a different
starting point for the probability of such an effect. It is unlikely, though, that on
the basis of such evidence general consensus could be garnered for a high prior
probability of effect.
Mechanistic explanations generally provide weak evidence when they are
offered post hoc to support an observed result. They carry more weight when they
are proposed before such an effect is observed. Misbin (2007) raised questions
about the safety of rosiglitazone on the basis of its effects on body weight and
lipids—both well-established risk factors for cardiovascular disease—long before
any risk of myocardial infarction (MI) was seen in any studies.
Another, more subtle way in which mechanistic considerations can affect
inferences is in the choice of endpoints, as illustrated in discussions by Marcin -
iak, from the FDA Office of New Drugs (OND) Division of Cardiovascular and
Renal Products, of the wisdom of combining silent and clinical MIs into a single
endpoint (Marciniak, 2010):
There is additional evidence from RECORD [the Rosiglitazone Evaluated for
Cardiac Outcomes and Regulation of Glycemia in Diabetes trial] that the MI
risk for rosiglitazone is real rather than a random variation:
We prospectively excluded silent MIs from our primary analysis
because we had concerns that silent MIs might represent a different
disease mechanism than symptomatic MIs, e.g., could they represent
OCR for page 131
131
EVIDENCE AND DECISION-MAKING
gradual necrosis from diabetic microvascular disease rather than an
acute event with coronary thrombosis in an epicardial coronary artery?
Whether or not silent and clinical MIs should be combined—a critical deci -
sion in assessing the evidence—is framed here as contingent on whether or not
they represent different manifestations of the same pathophysiologic process.
What is important to recognize is that the numbers arising from an analysis that
excludes silent MIs are only as credible as the underlying mechanistic explana -
tion. This example shows how a mechanistic explanation can affect the analyses,
especially exploratory analysis, even if it is not explicitly invoked as an evidential
basis of a claim.
Even if two scientists agree about what evidence new data provides, if they
have different assessments of the strength of prior evidence they might disagree
about the probability of a higher drug risk. Such a disagreement might appear
outwardly to be about the new evidence when in fact the disagreement is about
the prior probability. That phenomenon is captured quantitatively by Bayes theo -
rem, as previously noted (Fisher, 1999), which can use sensitivity analyses with
different priors to illustrate the plausible range of chances that the drug induces
unacceptable safety risks.
Quality of the New Study
Standard approaches to evaluating evidence rely on the use of evidence
hierarchies, which traditionally emphasize the type of study design as the main
determinant of evidential quality; an example is the US Preventive Services Task
Force guidance (AHRQ, 2008). Many scientists judge a study on the basis of its
type of design above all other considerations. The type of study design, however,
is only one of the factors that should be taken into account in assessing the qual -
ity of a study and thereby the quality of the evidence from the study. In addition
to the type of study, such other aspects as the source and reliability of the data,
study conduct, whether there are missing or misclassified data, and data analyses
influence the quality of the evidence generated by a study. Some of these reflected
in the Grading of Recommendations Assessment, Development and Evaluation
(GRADE) approach to evidence assessment (Guyatt et al., 2008). Those factors
and their role in disagreements among scientists are discussed below.
Different Views about the Reliability of the Data Source
Most evidence hierarchies assume that data in a study are generated for
research purposes and that outcome measures are specified in advance. Much
postmarketing research about a drug’s benefits and risks, however, whether an
RCT or an observational study, depends at least in part on data gathered with
systems developed for other purposes. For example, billing data that happen to
OCR for page 158
158 STUDYING THE SAFETY OF APPROVED DRUGS
statistical code, and information about how decisions were made to produce the
analytic dataset from the raw measured data. Optimally, it involves some form
of data-sharing. Such data sharing permitted the reanalysis of the RECORD trial
that was presented to FDA in the rosiglitazone case. The review revealed that
innumerable discrepancies and judgment calls frequently occurred in the original
study—from defining a clinical event to the choice of analytic method—and those
discrepancies and judgments affected the weight that the results were given in the
regulatory decision-making process. For critical research that is to be the basis of
regulatory decisions, which can be primary studies like RECORD or can be meta-
analyses, standards should be developed within FDA to adhere to reproducible
research principles so that the basis of the many judgments can be examined and
adjudicated by scientists and regulators when disputes over data interpretation
and its implications arise.
Going a step beyond reproducibility, FDA is well-positioned to help assure
the accurate public reporting of risk information submitted to it as part of the
premarketing approval process. These are often, but not always, published after
approval and included in postmarketing safety assessments. FDA scientists
themselves have identified the discordance of published data from that submit -
ted to FDA as a problem for the validity of postmarketing safety meta-analyses
(Hammad et al., 2011), and there are numerous examples of under or delayed
reporting of harms that had been previously reported to regulatory authorities (for
example, Carragee et al., 2011; Lee et al., 2008; Melander et al., 2003; Vedula
et al., 2009). FDAAA addressed this problem by requiring that all clinical trials
submitted for new drug approval or for new labeling be registered at inception
at ClinicalTrials.gov, and that the summary results of all pre-specified outcomes
be posted within one year of drug approval for new drugs, or three years for new
indications (Miller, 2010; Wood, 2009). However, recently reported evidence
has shown that compliance with this aspect of FDAAA has been low (Law et al.,
2011). In addition, the FDA policy on the reporting of studies submitted for non-
approved drugs has not been settled (Miller, 2010). Finally, publishing summary
results is not equivalent to sharing primary data, which allows for re-analyses.
New approaches are needed to facilitate the publication of safety data submitted
to FDA for approved drugs, and to find ways to release similar data for drugs
that are disapproved, but whose information might be extremely valuable for the
interpretation of safety information from approved drugs in the same class.
FINDINGS AND RECOMMENDATIONS
Finding 3.1
Some of FDA’s most difficult decisions are those in which experts disagree about
how compelling the evidence that informs the public health question is. Under-
standing the nature and sources of those disagreements and their implications for
OCR for page 159
159
EVIDENCE AND DECISION-MAKING
FDA’s decisions is key to improving the agency’s decision-making process. For
example, experts can disagree about the plausibility of a new risk (or decreased
benefit) on the basis of different assessments of prior evidence, the quality of
new data, the adequacy of confounding control in the relevant studies, the trans -
portability of results, the appropriateness of the statistical analysis, the relevance
of the new evidence to the public health question, how the evidence should be
weighed and synthesized, or the threshold for regulatory actions.
Recommendation 3.1
FDA should use the framework for decision-making proposed in Recom-
mendation 2.1 to ensure a thorough discussion and clear understanding of the
sources of disagreement about the available evidence among all participants
in the regulatory decision-making process. In the interest of transparency,
FDA should use the BRAMP document proposed in Recommendation 2.2 to
ensure that such disagreements and how they were resolved are documented
and made public.
Finding 3.2
Such methods as Bayesian analyses or other approaches to integrating external
relevant information with newly emerging information could provide decision-
makers with useful quantitative assessments of evidence. An example would be
sensitivity analyses of clinical-trial data that illustrate the influence of prior prob -
abilities on estimates of probabilities that an intervention has unacceptable safety
risks. These approaches can inform judgments, allow more rational decision-
making, and permit input from multiple stakeholders and experts.
Recommendation 3.2
FDA should ensure that it has adequate expertise in Bayesian approaches, in
combination with expertise in relevant frequentist and causal inference meth -
ods, to assess the probability that observed associations reflect actual causal
effects, to incorporate multiple sources of uncertainty into the decision-
making process, and to evaluate the sensitivity of those conclusions to dif -
ferent representations of external evidence. To facilitate the use of Bayesian
approaches, FDA should develop a guidance document for the use of Bayes -
ian methods for assessing a drug’s benefits, risks, and benefit–risk profile.
Finding 3.3
Traditionally, the main criteria for evaluating a study are ones that contribute to
its internal validity. A well-conducted RCT typically has higher internal valid -
ity than a well-conducted observational study. Results of observational studies,
however, can have greater transportability if their participants are more similar
OCR for page 160
160 STUDYING THE SAFETY OF APPROVED DRUGS
to the target clinical population than to the participants in a clinical trial. In some
circumstances, such as an evaluation of the association between a drug and an
uncommon unexpected adverse event, observational studies may produce esti -
mates closer to the actual risk in the general population than can be achieved in
clinical trials. In assessing the relevance of study findings to a public health ques-
tion, the transportability of the study results is as important as the determinants
of its internal validity.
Recommendation 3.3
In assessing the benefits and risks associated with a drug in the postmarketing
context, FDA should develop guidance and review processes that ensure that
observational studies with high internal validity are given appropriate weight
in the evaluation of drug harms and that transportability is given emphasis
similar to that given bias and other errors in assessing the weight of evidence
that a study provides to inform a public health question.
Finding 3.4
The principles of reproducible research are important for ensuring the integrity
of postmarketing research used by FDA. Those principles include providing
information on the provenance of data (from measurement to analytic dataset)
and, when possible, making available properly annotated analytic datasets, study
protocols (including statistical analysis plan) and their amendments, and statisti -
cal codes.
Recommendation 3.4
All analyses, whether conducted independently of FDA or by FDA staff,
whose results are relied on for postmarketing regulatory decisions should use
the principles of reproducible research when possible, subject to legal con-
straints. To that end, FDA should present data and analyses in a fashion that
allows independent analysts either to reproduce the findings or to understand
how FDA generated the results in sufficient detail to understand the strengths,
weaknesses, and assumptions of the relevant analyses.
Finding 3.5
The ability of researchers in and outside FDA to analyze new information about
the benefits and risks associated with a marketed drug and to design appropri -
ate postmarketing research—including conducting individual-patient meta-
analyses—is enhanced by access to data and analyses from all studies of the drug
and others in the same drug class that were reported in the preapproval process.
Although disclosure of such information is likely to advance the public’s health,
such disclosures raise concerns about the privacy of participants in the research
OCR for page 161
161
EVIDENCE AND DECISION-MAKING
that generated the information and may threaten industry interest in maintain -
ing proprietary information, which is deemed important for innovation. New
approaches to resolving this tension are needed.
Recommendation 3.5
FDA should establish and coordinate a working group, including industry and
patient and consumer representatives, to find ways that appropriately balance
public health, privacy, and proprietary interests to facilitate disclosure of data
for trials and studies relevant to postmarketing research decisions.
Finding 3.6
The elements of the benefit–risk profile of a drug are best estimated by using all
the available high-quality data, and meta-analysis is a useful tool for summarizing
such data and evaluating heterogeneity. However, because the reporting of harms
in published RCTs and observational studies is often poor or inconsistent and
because there is often substantial publication bias in studies of drug risk, steps
are needed to improve both the reporting of harms and the design of studies of
harm. That can be done through prospective planning for selected meta-analyses
and by monitoring compliance with the FDAAA requirement that summary trial
results for all primary and secondary outcomes be published at ClinicalTrials.gov.
Recommendation 3.6
For drugs that are likely to have required postmarketing observational stud -
ies or trials, FDA should use the BRAMP to specify potential public health
questions of interest as early as possible; should prospectively recommend
standards for uniform definition of key variables and complete ascertainment
of events among studies or convene researchers in the field to suggest such
standards and promote data-sharing; should prospectively plan meta-analyses
of the data with reference to specified exposures, outcomes, comparators, and
covariates; should conduct the meta-analyses of the data; and should make
appropriate regulatory decisions in a timely fashion. FDA can also improve
the validity of meta-analyses by monitoring and encouraging compliance
with FDAAA requirements for reporting to ClinicalTrials.gov.
Finding 3.7
FDA produced a high-quality guidance document on the use of the noninferior-
ity design for the study of efficacy. Increasingly, FDA is using the noninferiority
design to evaluate drug-safety endpoints as the primary outcomes in randomized
trials. The use of noninferiority analyses to establish the acceptability of the
benefit–risk profile of a drug can take the decision about how to balance the risks
and benefits of two drugs out of the hands of regulators. Noninferiority trials also
OCR for page 162
162 STUDYING THE SAFETY OF APPROVED DRUGS
have the disadvantage of being biased toward equivalence when trial design or
conduct is suboptimal; this is of particular concern when such trials are used to
estimate risks.
Recommendation 3.7.1
FDA should develop a guidance document on the design and conduct of
noninferiority postmarketing trials for the study of safety of a drug. The guid-
ance should include discussion of criteria for choosing the standard therapy
to be used in the active-treatment control arm; of methods for selecting a
noninferiority margin in safety trials and ensuring high-quality trial conduct;
of the optimal analytic methods, including Bayesian approaches; and of the
interpretation of the findings in terms of the drug’s benefit–risk profile.
Recommendation 3.7.2
FDA should closely scrutinize the design and conduct of any noninferiority
safety studies for aspects that may inappropriately make the arms appear
similar. FDA should use the observed-effect estimate and confidence interval
as a basis for decision-making, not the binary noninferiority verdict.
REFERENCES
AHRQ (Agency for Healthcare Research and Quality). 2008. U.S. Preventive Services Task Force
procedure manual. Washington, DC: Department of Health and Human Services.
Baggerly, K. 2010. Disclose all data in publications. Nature 467(7314):401.
Baigent, C., A. Keech, P. M. Kearney, and L. Blackwell. 2005. Efficacy and safety of cholesterol-
lowering treatment: Prospective meta-analysis of data from 90,056 participants in 14 randomised
trials of statins. Lancet 366(9493):1267-1278.
Barton, M. B., T. Miller, T. Wolff, D. Petitti, M. LeFevre, G. Sawaya, B. Yawn, J. Guirguis-Blake, N.
Calonge, R. Harris, and U.S. Preventive Services Task Force. 2007. How to read the new recom -
mendation statement: Methods update from the U.S. Preventive Services Task Force. Annals of
Internal Medicine 147(2):123-127.
Becker, M. C., T. H. Wang, L. Wisniewski, K. Wolski, P. Libby, T. F. Lüscher, J. S. Borer, A. M.
Mascette, M. E. Husni, D. H. Solomon, D. Y. Graham, N. D. Yeomans, H. Krum, F. Ruschitzka,
A. M. Lincoff, and S. E. Nissen. 2009. Rationale, design, and governance of Prospective
Randomized Evaluation of Celecoxib Integrated Safety versus Ibuprofen or Naproxen
(PRECISION), a cardiovascular end point trial of nonsteroidal antiinflammatory agents in
patients with arthritis. American Heart Journal 157(4):606-612.
Bent, S., A. Padula, and A. L. Avins. 2006. Brief communication: Better ways to question patients
about adverse medical events: A randomized, controlled trial. Annals of Internal Medicine
144(4):257-261.
Berry, D. A. 2006. Bayesian clinical trials. Nature Reviews Drug Discovery 5(1):27-36.
Berry, D. A., M. C. Wolff, and D. Sack. 1992. Public health decision making: A sequential vaccine
trial. In Bayesian statistics, edited by J. Bernardo, J. Berger, A. Dawid and A. Smith. Oxford,
UK: Oxford University Press. Pp. 79-96.
Camm, A. J., A. Capucci, S. H. Hohnloser, C. Torp-Pedersen, I. C. Van Gelder, B. Mangal, and
G. Beatch. 2011. A randomized active-controlled study comparing the efficacy and safety of
vernakalant to amiodarone in recent-onset atrial fibrillation. Journal of the American College
of Cardiology 57(3):313-321.
OCR for page 163
163
EVIDENCE AND DECISION-MAKING
Campbell, G. 2011. Bayesian statistics in medical devices: Innovation sparked by the FDA. Journal
of Biopharmaceutical Statistics 21(5):871-887.
Carey, V. J., and V. Stodden. 2010. Reproducible research concepts and tools for cancer bioinformat -
ics. In Biomedical informatics for cancer research, edited by M. F. Ochs, J. T. Casagrande and
R. V. Davuluri. Springer US. Pp. 149-175.
Carpenter, D. 2010. Reputation and power institutionalized: Scientific networks, congressional hear-
ings, and judicial affirmation, 1963-1986. In Reputation and power: Organizational image and
pharmaceutical regulation at the FDA. Cambridge, NY: Princeton University Press. Pp. 298-392.
Carragee, E. J., E. L. Hurwitz, and B. K. Weiner. 2011. A critical review of recombinant human bone
morphogenetic protein-2 trials in spinal surgery: Emerging safety concerns and lessons learned.
Spine Journal 11(6):471-491.
Chaloner, K. 1996. Elicitation of prior distributions. In Bayesian biostatistics, edited by D. A. Berry
and D. K. Stangl. New York: Marcel Dekker.
Chan, A.-W., A. Hróbjartsson, M. T. Haahr, P. C. Gøtzsche, and D. G. Altman. 2004. Empirical
evidence for selective reporting of outcomes in randomized trials. JAMA 291(20):2457-
2465.
Chowdhury, B. A., and G. Dal Pan. 2010. The FDA and safe use of long-acting beta-agonists in the
treatment of asthma. New England Journal of Medicine 362(13):1169-1171.
Chowdhury, B. A., S. M. Seymour, and M. S. Levenson. 2011. Assessing the safety of adding
LABAs to inhaled corticosteroids for treating asthma. New England Journal of Medicine
364(26):2473-2475.
Claxton, K., J. T. Cohen, and P. J. Neumann. 2005. When is evidence sufficient? Health Affairs
24(1):93-101.
Cooper, H., and E. A. Patall. 2009. The relative benefits of meta-analysis conducted with individual
participant data versus aggregated data. Psychological Methods 14(2):165-176.
Dal Pan, G. J. 2010. Memorandum from Gerald Dal Pan to Janet Woodcock (dated September
12, 2010). Re: Recommendations for regulatory action for rosiglitazone and rosiglitazone-
containing products (NDA 21-071, supplement 035, incoming submission dated August 25,
2009). Washington, DC: Department of Health and Human Services.
Darby, S., P. McGale, C. Correa, C. Taylor, R. Arriagada, M. Clarke, D. Cutter, C. Davies, M.
Ewertz, J. Godwin, R. Gray, L. Pierce, T. Whelan, Y. Wang, and R. Peto. 2011. Effect of radio -
therapy after breast-conserving surgery on 10-year recurrence and 15-year breast cancer death:
Meta-analysis of individual patient data for 10,801 women in 17 randomised trials. Lancet
378(9804):1707-1716.
Davies, C., J. Godwin, R. Gray, M. Clarke, D. Cutter, S. Darby, P. McGale, H. C. Pan, C. Taylor, Y. C.
Wang, M. Dowsett, J. Ingle, and R. Peto. 2011. Relevance of breast cancer hormone receptors
and other factors to the efficacy of adjuvant tamoxifen: Patient-level meta-analysis of randomised
trials. Lancet 378(9793):771-784.
Emerson, S. S., J. M. Kittelson, and D. L. Gillen. 2007. Bayesian evaluation of group sequential
clinical trial designs. Statistics in Medicine 26(7):1431-1449.
Eraker, S. A., J. P. Kirscht, and M. H. Becker. 1984. Understanding and improving patient compliance.
Annals of Internal Medicine 100(2):258.
Erik, C. 2007. Methodology of superiority vs. equivalence trials and non-inferiority trials. Journal
of Hepatology 46(5):947-954.
Etzioni, R. D., and J. B. Kadane. 1995. Bayesian statistical methods in public health and medicine.
Annual Review of Public Health 16(1):23-41.
FDA (US Food and Drug Administration). 2008. Guidance for industry. Diabetes mellitus—evaluat-
ing cardiovascular risk in new antidiabetic therapies to treat type 2 diabetes. Washington, DC:
Department of Health and Human Services.
FDA. 2010a. Guidance for industry and FDA staff: Guidance for the use of Bayesian statistics in
medical device clinical trials. Rockville, MD: Department of Health and Human Services.
OCR for page 164
164 STUDYING THE SAFETY OF APPROVED DRUGS
FDA. 2010b. Guidance for industry: Non-inferiority clinical trials, draft guidance. Washington, DC:
Department of Health and Human Services.
FDA. 2010c. FDA briefing document. Advisory committee meeting for NDA 21071: Avandia (rosigli -
tazone maleate tablet). Silver Spring, MD: Department of Health and Human Services.
FDA. 2012. Classifying significant postmarketing drug safety issues: Draft guidance . Washington,
DC: Department of Health and Human Services.
Fisher, D. J., A. J. Copas, J. F. Tierney, and M. K. B. Parmar. 2011. A critical review of methods for
the assessment of patient-level interactions in individual participant data meta-analysis of ran -
domized trials, and guidance for practitioners. Journal of Clinical Epidemiology 64(9):949-967.
Fisher, L. D. 1999. Carvedilol and the Food and Drug Administration (FDA) approval process: The
FDA paradigm and reflections on hypothesis testing. Controlled Clinical Trials 20(1):16-39.
Fleming, T. R. 2008. Current issues in non-inferiority trials. Statistics in Medicine 27(3):317-332.
Fleming, T.R., K. Odem-Davis, M. Rothmann, and Y. Li Shen. 2011. Some essential considerations
in the design and conduct of non-inferiority trials. Clinical Trials 8:432-439.
Frank, E., G. B. Cassano, P. Rucci, A. Fagiolini, L. Maggi, H. C. Kraemer, D. J. Kupfer, B. Pollock, R.
Bies, V. Nimgaonkar, P. Pilkonis, M. K. Shear, W. K. Thompson, V. J. Grochocinski, P. Scocco,
J. Buttenfield, and R. N. Forgione. 2008. Addressing the challenges of a cross-national inves -
tigation: Lessons from the Pittsburgh-PISA study of treatment-relevant phenotypes of unipolar
depression. Clinical Trials 5(3):253-261.
Furberg, C. D., and B. Pitt. 2001. Commentary: Withdrawl of cerivastatin from the world market.
Current Controlled Trials in Cardiovascular Medicine 2(5):205-207.
GAO (Government Accountability Office). 2010a. Drug safety: FDA has conducted more foreign
inspections and begun to improve its information on foreign establishments, but more progress
is needed. Washington, DC: Government Accountability Office.
GAO. 2010b. Food and Drug Administration: Overseas offices have taken steps to help ensure import
safety, but more long-term planning is needed. Washington, DC: Government Accountability
Office.
GAO. 2010c. New drug approval: FDA’s consideration of evidence from certain clinical trials.
Washington, DC: Government Accountability Office.
Garrison, L. P., Jr., P. J. Neumann, P. Radensky, and S. D. Walcoff. 2010. A flexible approach to
evidentiary standards for comparative effectiveness research. Health Affairs 29(10):1812-1817.
Gelfand, A. E., and B. K. Mallick. 1995. Bayesian analysis of proportional hazards models built from
monotone functions. Biometrics 51(3):843-852.
Golder, S., Y. K. Loke, and M. Bland. 2011. Meta-analyses of adverse effects data derived from
randomised controlled trials as compared to observational studies: Methodological overview.
PLoS Med 8(5):e1001026.
Good, I. J. 1950. Probability and the weighting of evidence. London, UK: Charles Griffin & Co.
Goodman, S. N. 1999. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of
Internal Medicine 130(12):1005-1013.
Goodman, S. N. 2001. Of P-values and Bayes: A modest proposal. Epidemiology 12(3):295-297.
Gordis, L. 2004. Epidemiology. Third ed. Philadelphia, PA: Elsevier Inc.
Graham, D. J., and K. Gelperin. 2010a. Memorandum to Mary Parks regarding comments on
RECORD, TIDE, and the benefit-risk assessment of rosiglitazone vs. pioglitazone. In FDA
Briefing Document Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone male -
ate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health and Human Services.
Graham, D. J., and K. Gelperin. 2010b. TIDE and benefit-risk considerations. http://www.fda.gov/
downloads/AdvisoryCommittees/CommitteesMeetingMaterials/Drugs/EndocrinologicandMeta
bolicDrugsAdvisoryCommittee/UCM224732.pdf (accessed October 11, 2011).
Greene, B. M., A. M. Geiger, E. L. Harris, A. Altschuler, L. Nekhlyudov, M. B. Barton, S. J. Rolnick,
J. G. Elmore, and S. Fletcher. 2006. Impact of IRB requirements on a multicenter survey of
prophylactic mastectomy outcomes. Annals of Epidemiology 16(4):275-278.
OCR for page 165
165
EVIDENCE AND DECISION-MAKING
Greenhouse, J. B., and L. Waserman. 1995. Robust Bayesian methods for monitoring clinical trials.
Statistics in Medicine 14(12):1379-1391.
Guyatt, G. H., A. D. Oxman, G. E. Vist, R. Kunz, Y. Falck-Ytter, P. Alonso-Coello, and H. J.
Schunemann. 2008. GRADE: An emerging consensus on rating quality of evidence and strength
of recommendations. BMJ 336(7650):924-926.
Hamburg, M. A. 2011. Commentary: The growing role of epidemiology in drug safety regulation.
Epidemiology 22(5):622-624.
Hammad, T. A., S. P. Pinheiro, and G. A. Neyarapally. 2011b. Secondary use of randomized con -
trolled trials to evaluate drug safety: A review of methodological considerations. Clinical Trials
8(5):559-570.
Hernán, M. A., and S. Hernandez-Diaz. 2012. Beyond the intention-to-treat in comparative effective -
ness research. Clinical Trials 9(1):48-55.
Hernán, M. A., and C. Robins. 2012. Causal inference. New York: Chapman & Hall/CRC.
Hernán, M. A., and J. M. Robins. 2006. Instruments for causal inference: An epidemiologist’s dream?
Epidemiology 17(4):360-372.
Ioannidis, J. P. A., and J. Lau. 2001. Completeness of safety reporting in randomized trials: An evalu -
ation of 7 medical areas. JAMA 285(4):437-443.
Ioannidis, J. P. A., S. J. W. Evans, P. C. Gøtzsche, R. T. O’Neill, D. G. Altman, K. Schulz, and D.
Moher. 2004. Better reporting of harms in randomized trials: An extension of the consort state -
ment. Annals of Internal Medicine 141(10):781-788.
Ioannidis, J. P., C. D. Mulrow, and S. N. Goodman. 2006. Adverse events: The more you search, the
more you find. Annals of Internal Medicine 144(4):298-300.
IOM (Institute of Medicine). 2008. Improving the presumptive disability decision-making process for
veterans. Washington, DC: The National Academies Press.
Ives, D. G., A. L. Fitzpatrick, D. E. Bild, B. M. Psaty, L. H. Kuller, P. M. Crowley, R. G. Cruise, and
S. Theroux. 1995. Surveillance and ascertainment of cardiovascular events: The Cardiovascular
Health Study. Annals of Epidemiology 5(4):278-285.
Ives, D. G., P. Samuel, B. M. Psaty, and L. H. Kuller. 2009. Agreement between nosologist and car-
diovascular health study review of deaths: Implications of coding differences. Journal of the
American Geriatrics Society 57(1):133-139.
Jencks, S. F., D. K. Williams, and T. L. Kay. 1988. Assessing hospital-associated deaths from discharge
data. JAMA 260(15):2240-2246.
Jenkins, J. K. 2010. Memorandum from John Jenkins to Janet Woodcock (dated September, 2010).
Re: Recommendations for regulatory actions—Rosiglitazone. Washington, DC: US Food and
Drug Administration.
Jones, A. P., R. D. Riley, P. R. Williamson, and A. Whitehead. 2009. Meta-analysis of individu -
al patient data versus aggregate data from longitudinal clinical trials. Clinical Trials 6(1):
16-27.
Juurlink, D. N. 2010. Rosiglitazone and the case for safety over certainty. JAMA 304(4):469-471.
Kadane, J. B. 2005. Bayesian methods for health-related decision making. Statistics in Medicine
24(4):563-567.
Kadane, J., and L. J. Wolfson. 1998. Experiences in elicitation. Journal of the Royal Statistical Society:
Series D (The Statistician) 47(1):3-19.
Kaizar, E. E., J. B. Greenhouse, H. Seltman, and K. Kelleher. 2006. Do antidepressants cause sui -
cidality in children? A Bayesian meta-analysis. Clinical Trials 3(2):73-90; discussion 91-98.
Kass, R. E., and A. E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association
90(430):773-795.
Kaul, S., and G. A. Diamond. 2006. Good enough: A primer on the analysis and interpretation of
noninferiority trials. Annals of Internal Medicine 145(1):62-69.
OCR for page 166
166 STUDYING THE SAFETY OF APPROVED DRUGS
Kaul, S., and G. A. Diamond. 2007. Making sense of noninferiority: A clinical and statistical per-
spective on its application to cardiovascular clinical trials. Progress in Cardiovascular Diseases
49(4):284-299.
Kufner, S., A. de Waha, F. Tomai, S.-W. Park, S.-W. Lee, D.-S. Lim, M. H. Kim, A. M. Galloe, M.
Maeng, C. Briguori, A. Dibra, A. Schömig, and A. Kastrati. 2011. A meta-analysis of specifi -
cally designed randomized trials of sirolimus-eluting versus paclitaxel-eluting stents in diabetic
patients with coronary artery disease. American Heart Journal 162(4):740-747.
Laine, C., S. N. Goodman, M. E. Griswold, and H. C. Sox. 2007. Reproducible research: Moving
toward research the public can really trust. Annals of Internal Medicine 146(6):450-453.
Lanctot, K. L., and C. A. Naranjo. 1995. Comparison of the Bayesian approach and a simple algorithm
for assessment of adverse drug events. Clinical Pharmacology & Therapeutics 58(6):692-698.
Lau, H. S., A. de Boer, K. S. Beuning, and A. Porsius. 1997. Validation of pharmacy records in drug
exposure assessment. Journal of Clinical Epidemiology 50(5):619-625.
Laughren, T. P. 2006. Overview for December 13 meeting of psychopharmacologic drugs advisory
committee (PDAC).
Law, M. R., Y. Kawasumi, and S. G. Morgan. 2011. Despite law, fewer than one in eight com-
pleted studies of drugs and biologics are reported on time on ClinicalTrials.gov. Health Affairs
30(12):2338-2345.
Lee, K., P. Bacchetti, and I. Sim. 2008. Publication of clinical trials supporting successful new drug
applications: A literature analysis. PLoS Medicine 5(9):e191.
Lesaffre, E. 2008. Superiority, equivalence, and non-inferiority trials. Bulletin of the NYU Hospital
for Joint Diseases 66(2):150-154.
Levenson, M., and C. Holland. 2006. Slide presentation: Antidepressants and suicidality in adults: Sta-
tistical evaluation. http://www.fda.gov/ohrms/dockets/ac/06/slides/2006-4272s1-04-FDA_files/
frame.htm (accessed April 6, 2012).
Lilford, R. J., M. A. Mohammed, D. Braunholtz, and T. P. Hofer. 2003. The measurement of active
errors: Methodological issues. Quality and Safety in Health Care 12(Suppl 2):ii8-ii12.
Lin, D. Y., and D. Zeng. 2010. Meta-analysis of genome-wide association studies: No efficiency gain
in using individual participant data. Genetic Epidemiology 34(1):60-66.
Madigan, D., P. Ryan, S. E. Simpson, and I. Zorych. 2010. Bayesian methods in pharmacovigilance.
In Bayesian statistics 9, edited by J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D.
Heckerman, A. F. M. Smith and M. West. Oxford, UK: Oxford University Press.
Manion, F., R. Robbins, W. Weems, and R. Crowley. 2009. Security and privacy requirements for a
multi-institutional cancer research data grid: An interview-based study. BMC Medical Informat-
ics and Decision Making 9(1):31.
Marciniak, T. A. 2010. Memorandum from Thomas Marciniak to Jena Weber (dated June 14, 2010)
regarding cardiovascular events in RECORD, NDA 21-071/s-035. In FDA Briefing Document
Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone maleate) tablet: July 13
and 14, 2010. Washington, DC: Department of Health and Human Services.
McEvoy, B., R. R. Nandy, and R. C. Tiwari. 2012. Applications of Bayesian model selection criteria for
clinical safety data (abstract, ASA joint statistical meetings). http://www.amstat.org/meetings/
jsm/2012/onlineprogram/AbstractDetails.cfm?abstractid=305627 (accessed April 5, 2012).
Melander, H., J. Ahlqvist-Rastad, G. Meijer, and B. Beermann. 2003. Evidence b(i)ased medicine—
selective reporting from studies sponsored by pharmaceutical industry: Review of studies in new
drug applications. BMJ 326(7400):1171-1173.
Miller, J. D. 2010. Registering clinical trial results: The next step. JAMA 303(8):773-774.
Misbin, R. I. 2007. Lessons from the Avandia controversy: A new paradigm for the development of
drugs to treat type 2 diabetes. Diabetes Care 30(12):3141-3144.
Nissen, S. E., and K. Wolski. 2007. Effect of rosiglitazone on the risk of myocardial infarction and
death from cardiovascular causes. New England Journal of Medicine 356(24):2457-2471.
OCR for page 167
167
EVIDENCE AND DECISION-MAKING
NRC (National Research Council). 2010. The prevention and treatment of missing data in clinical
trials. Panel on handling missing data in clinical trials. Washington, DC: The National Acad-
emies Press.
Owens, D. K., K. N. Lohr, D. Atkins, J. R. Treadwell, J. T. Reston, E. B. Bass, S. Chang, and M.
Helfand. 2010. AHRQ series paper 5: Grading the strength of a body of evidence when com -
paring medical interventions-Agency for Healthcare Research and Quality and the Effective
Health-care Program. Journal of Clinical Epidemiology 63(5):513-523.
Oxford Dictionaries. 2011. Oxford English Dictionary online. Oxford University Press.
Parks, M. H. 2010. Memorandum from Mary Parks to Curtis Rosebraugh (dated August 19, 2010).
Re: Recommendations on marketing status of Avandia (rosiglitazone maleate) and the required
post-marketing trial, Thiazolidinedione Intervention and Vitamin D Evaluation (TIDE) follow -
ing the July 13 and 14, 2010 Public Advisory Committee Meeting. Silver Spring, MD: US Food
and Drug Administration.
Parmigiani, G. 2002. Modeling in medical decision making: A Bayesian approach (statistics in
practice). New York: Wiley.
Peng, R. D., F. Dominici, and S. L. Zeger. 2006. Reproducible epidemiologic research. American
Journal of Epidemiology 163(9):783-789.
PMA (Cochrane Prospective Meta-Analysis Methods Group). 2010. Welcome: The prospective meta-
analysis methods group. http://pma.cochrane.org/ (accessed December 12, 2011).
Psaty, B. M., and D. S. Siscovick. 2010. Minimizing bias due to confounding by indication in com -
parative effectiveness research: The importance of restriction. JAMA 304(8):897-898.
Psaty, B. M., R. Boineau, L. H. Kuller, and R. V. Luepker. 1999. The potential costs of upcoding for
heart failure in the United States. The American Journal of Cardiology 84(1):108-109.
Psaty, B. M., C. J. O’Donnell, V. Gudnason, K. L. Lunetta, A. R. Folsom, J. I. Rotter, A. G. Uitterlinden,
T. B. Harris, J. C. M. Witteman, E. Boerwinkle, and (on behalf of the CHARGE Consortium).
2009. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium.
Circulation: Cardiovascular Genetics 2(1):73-80.
Reade, M., A. Delaney, M. Bailey, D. Harrison, D. Yealy, P. Jones, K. Rowan, R. Bellomo, and D.
Angus. 2010. Prospective meta-analysis using individual patient data in intensive care medicine.
Intensive Care Medicine 36(1):11-21.
Royall, R. M. 1997. Statistical evidence: A likelihood paradigm. London, UK: Chapman & Hall.
Saunders, K., K. Dunn, J. Merrill, M. Sullivan, C. Weisner, J. Braden, B. Psaty, and M. Von Korff.
2010. Relationship of opioid use and dosage levels to fractures in older chronic pain patients.
Journal of General Internal Medicine 25(4):310-315.
Staffa, J. A., J. Chang, and L. Green. 2002. Cerivastatin and reports of fatal rhabdomyolysis. New
England Journal of Medicine 346(7):539-540.
Talbot, J. C. C., and P. Walker. 2004. Stephens’ detection of new adverse drug reactions. 5th ed. West
Sussex, England: John Wiley & Sons Ltd.
Temple, R., and S. S. Ellenberg. 2000. Placebo-controlled trials and active-control trials in the
evaluation of new treatments. Part 1: Ethical and scientific issues. Annals of Internal Medicine
133(6):455-463.
Ten Have, T. R., S. L. Normand, S. M. Marcus, C. H. Brown, P. Lavori, and N. Duan. 2008. Intent-to-
treat vs. non-intent-to-treat analyses under treatment non-adherence in mental health randomized
trials. Psychiatrics Annals 38(12):772-783.
Thomas, E. J., and L. A. Petersen. 2003. Measuring errors and adverse events in health care. Journal
of General Internal Medicine 18(1):61-67.
Thompson, S. G., and S. J. Sharp. 1999. Explaining heterogeneity in meta-analysis: A comparison of
methods. Statistics in Medicine 18:2693-2708.
Toh, S., and M. A. Hernán. 2008. Causal inference from longitudinal studies with baseline randomiza -
tion. International Journal of Biostatistics 4(1):Article22.
OCR for page 168
168 STUDYING THE SAFETY OF APPROVED DRUGS
Turner, E. H., A. M. Matthews, E. Linardatos, R. A. Tell, and R. Rosenthal. 2008. Selective publica -
tion of antidepressant trials and its influence on apparent efficacy. New England Journal of
Medicine 358(3):252-260.
Unger, E. 2010. Memorandum to the file regarding NDA: 21-071; suppl 35, 36, 37 Avandia (rosi -
glitazone). In FDA Briefing Document: Advisory Committee Meeting for NDA 21071: Avandia
(rosiglitazone maleate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health
and Human Services.
Vandenbroucke, J. P. 2006. What is the best evidence for determining harms of medical treatment?
Canadian Medical Association Journal 174(5):645-646.
Vandenbroucke, J. P., and B. M. Psaty. 2008. Benefits and risks of drug treatments: How to combine
the best evidence on benefits with the best data about adverse effects. JAMA 300(20):2417-2419.
Vedula, S. S., L. Bero, R. W. Scherer, and K. Dickersin. 2009. Outcome reporting in industry-sponsored
trials of gabapentin for off-label use. New England Journal of Medicine 361(20):1963-1971.
Weiss, N. S., T. D. Koepsell, and B. M. Psaty. 2008. Generalizability of the results of randomized
trials. Archives of Internal Medicine 168(2):133-135.
Wood, A. J. J. 2009. Progress and deficiencies in the registration of clinical trials. New England
Journal of Medicine 360(8):824-830.
Yap, J. S. 2010. Statistical review and evaluation: Clinical studies NDA 21-071/35 and 21-073. In FDA
Briefing Document Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone male -
ate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health and Human Services.