3
Evidence and Decision-Making

In Chapter 2, the committee recommends a framework for the US Food and Drug Administration (FDA) regulatory decision-making process in which scientific evidence plays a critical role, together with other factors including ethical considerations and the perspectives of patients and other stakeholders. This chapter focuses on the evaluation of the scientific evidence and on how FDA should use evidence in its decisions. Just as courts determine when evidence is admissible and which standard of proof to apply in a given case, scientific evidence must be evaluated for its quality and applicability to the public health question that is the focus of regulatory decision-making. FDA needs to base its decisions on the best available scientific evidence related to that question. Different people, however, can interpret and judge scientific evidence in various ways. Decisions in which there is disagreement among experts about what decisions are best supported by a given body of evidence are among the most difficult that FDA must make. For these decisions to properly incorporate all the relevant uncertainties and values, the regulators need to understand the bases of the various judgments that the experts are making. As has been shown in many difficult cases that FDA has had to decide, evidence does not speak for itself.

This chapter will categorize and discuss the sources of technical disagreements between experts about the kinds of data that FDA typically deals with. It will start with a short primer on approaches to statistical inference, with an introduction to Bayesian methods, followed by a discussion of the distinctions between scientific data and evidence. It then discusses why scientists sometimes disagree about the evidence of a drug’s benefits and risks and how their disagreements may affect regulatory decision-making.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 121
3 Evidence and Decision-Making In Chapter 2, the committee recommends a framework for the US Food and Drug Administration (FDA) regulatory decision-making process in which scien - tific evidence plays a critical role, together with other factors including ethical considerations and the perspectives of patients and other stakeholders. This chap- ter focuses on the evaluation of the scientific evidence and on how FDA should use evidence in its decisions. Just as courts determine when evidence is admis - sible and which standard of proof to apply in a given case, scientific evidence must be evaluated for its quality and applicability to the public health question that is the focus of regulatory decision-making. FDA needs to base its decisions on the best available scientific evidence related to that question. Different people, however, can interpret and judge scientific evidence in various ways. Decisions in which there is disagreement among experts about what decisions are best sup - ported by a given body of evidence are among the most difficult that FDA must make. For these decisions to properly incorporate all the relevant uncertainties and values, the regulators need to understand the bases of the various judgments that the experts are making. As has been shown in many difficult cases that FDA has had to decide, evidence does not speak for itself. This chapter will categorize and discuss the sources of technical disagree - ments between experts about the kinds of data that FDA typically deals with. It will start with a short primer on approaches to statistical inference, with an introduction to Bayesian methods, followed by a discussion of the distinctions between scientific data and evidence. It then discusses why scientists sometimes disagree about the evidence of a drug’s benefits and risks and how their disagree - ments may affect regulatory decision-making. 121

OCR for page 121
122 STUDYING THE SAFETY OF APPROVED DRUGS STATISTICAL INFERENCE AND DECISION-MAKING Evidence Although the terms data and evidence are often used interchangeably, data is not a synonym for evidence. The Compact Oxford English Dictionary defines data as “facts and statistics collected together for reference or analysis” and evi- dence as “the available body of facts or information indicating whether a belief or proposition is true” (Oxford Dictionaries, 2011). The difference is whether or not the information is being used to draw scientific conclusions about a specific proposition. In the context of a drug study, the “proposition” is a hypothesis about a drug effect, often stated in the form of a scientific question, such as “Do broad- spectrum antibiotics increase the risk of colitis”? In the broader context of FDA’s regulatory decisions, the proposition may be implicit in the public health question that prompts the need for a regulatory decision, such as, “Does the risk of coli - tis caused by broad-spectrum antibiotics outweigh their benefits to the public’s health”? In this way, evidence is defined with respect to the questions developed in the first step of the decision-making framework described in Chapter 2. Statistical methods help to ascertain the “strength of the evidence” support- ing a given hypothesis by measuring the degree to which the data support one hypothesis rather than the other. The evidence in turn affects the likelihood that either hypothesis is true. The most common scientific hypothesis in the realm of drug evaluation is the “null hypothesis”—that in a given treated population, the drug has no effect relative to a comparator treatment. For the concept of evidence to have meaning, however, there must be at least one other hypothesis under consideration, such as that the drug has some effect. A small change in the scientific hypotheses being compared can change the strength of the evidence provided by a given set of data. For example, if the question above changed from whether broad-spectrum antibiotics produce any increase in the risk of colitis to whether broad-spectrum antibiotics produce a clinically important increase in the risk of colitis—say, an increase of more than 10 percent—the strength of the evidence provided by the same data could change. Where one observer might see a four percent increase in risk as strong evidence of some excess risk, another could regard it as strong evidence against a 10 percent increase in risk.1 Agreement on the strength of the evidence therefore requires agreement on the hypotheses being contrasted and on the public health questions that gives rise to them. 1 Confusion can result from use of the word significant to describe an effect that is both statistically significant and clinically relevant; the latter is often termed clinically significant. The two uses should remain separate.

OCR for page 121
123 EVIDENCE AND DECISION-MAKING Inference Good science, together with proper statistics, has a dual role. The first role is to decrease uncertainty about which hypotheses are true; the second is to properly measure the remaining uncertainty. These are carried out in part through a process called statistical inference. Statistical inference involves the process of summariz- ing data, estimating the uncertainty around the summary, and using the summary to reach conclusions about the underlying truth that gave rise to the data. The two main approaches to statistical inference are the standard “frequen - tist” approach and the Bayesian approach. Each has distinctive strengths and weaknesses when used as bases for decision-making; including both approaches in the technical and conceptual toolbox can be extraordinarily important in mak - ing proper decisions in the face of complex evidence and substantial uncertainty. The frequentist approach to statistical inference is familiar to medical research - ers and is the basis for most FDA rules and guidance. The Bayesian approach is less widely used and understood, however, it has many attractive properties that can both elucidate the reasons for disagreements, and provide an analytic model for decision-making. This model allows decision-makers to combine the chance of being wrong about risks and benefits, together with the seriousness of those errors, to support optimal decisions. The frequentist approach employs such measures as P values, confidence intervals, and type I and II errors, as well as practices such as hypothesis-testing. Evidence against a specified hypothesis is measured with a P value. P values are typically used within a hypothesis-testing paradigm that declares results “statisti - cally significant” or “not significant”, with the threshold for significance usually being a P value less than 0.05. By convention, type I (false-positive) error rates in individual studies are set in the design stage at 5 percent or lower, and type II (false-negative) rates at 20 percent or below (Gordis, 2004). In the colitis example, if the null hypothesis posits that broad-spectrum antibiotics do not increase the risk of colitis, a P value less than 0.05 would lead one to reject that null hypothesis and conclude that broad-spectrum antibiotics do increase the risk of colitis. The range of that elevation statistically consistent with the evidence would be captured by the confidence interval. If the P value exceeded 0.05, several conclusions could be supported, depending on the loca- tion and width of the confidence interval; either that a clinically negligible effect is likely, or that the study cannot rule out either a null or clinically important effect and thus is inconclusive. In the drug-approval setting, the FDA regulatory threshold of “substantial evidence”2 for effectiveness is generally defined as two well controlled trials that have achieved statistical significance on an agreed upon endpoint, although there can be exceptions (Carpenter, 2010; Garrison et al., 2010). 2 21 USC § 355(d) (2010).

OCR for page 121
124 STUDYING THE SAFETY OF APPROVED DRUGS Hypothesis-testing provides a yes-or-no verdict that is useful for regulatory purposes, and its value has been demonstrated over time, both procedurally and inferentially. Its emphasis on pre-specification of endpoints, study procedures and analytic plans has regulatory and often inferential benefits. But hypothesis tests, P values, and confidence intervals do not provide decision-makers with an important measure—the probability that a hypothesis is right or wrong. In settings where a difficult balancing of various decisional consequences must be made in the face of uncertainty about both the presence and magnitude of ben - efits and risks, the probability that a given hypothesis is true plays a central role. The failure to assign a degree of certainty to a conclusion is a weakness of the frequentist approach when it is used for regulatory decisions (Berry et al., 1992; Etzioni and Kadane, 1995; IOM, 2008; Parmigiani, 2002). In contrast, the Bayesian approach to inference allows a calculation on the basis of results from an experiment of how likely a hypothesis is to be true or false. However, this calculation is premised on an estimated probability that a hypothesis is true prior to the conduct of the experiment, a probability that is not uniquely scientifically defined and about which scientists can differ. Both in spite of this and because of this, Bayesian approaches can be very useful comple - ments to traditional frequentist analyses, and can yield insights into the reasons why scientists disagree, a topic that will be discussed in more depth later in this chapter. The use of Bayesian approaches is not new to FDA. FDA’s Center for Devices and Radiological Health (CDRH) has published guidance for the use of Bayesian statistics in medical device clinical trials (FDA, 2010a) and FDA has used Bayesian approaches in regulatory decisions. A 2004 FDA workshop on the use of Bayesian methods for regulatory decision-making included extensive discussion by FDA scientists, as well as Center for Drug Evaluation and Research (CDER) and CDRH leadership, of ways in which Bayesian approaches could enhance the science of premarketing approval.3 Campbell (2011), director of the CDRH Biostatistics division, discussed the uses of Bayesian methods for FDA decision-making, and presented 17 requests for premarketing approval submitted to and approved by the CDRH for medical devices that used Bayesian methods. Although Bayesian methods have been little used by CDER, Berry (2006) dis - cusses how a Bayesian meta-analysis served as the basis for a CDER approval of Pravigard™ Pac (co-packaged pravastin and buffered aspirin) to lower the risk of cardiovascular events. Bayesian sensitivity analyses were used to help evaluate the literature investigating the possible association between antidepressants and suicidal outcomes (Laughren, 2006; Levenson and Holland, 2006), elaborated later in Kaizar (2006). Finally, FDA staff has recently proposed Bayesian meth - odology for analysis of safety endpoints in clinical trials (McEvoy et al., 2012). 3 Published papers from the workshop are available in the August 2005 issue of Clinical Trials (2:271-378).

OCR for page 121
125 EVIDENCE AND DECISION-MAKING The Bayesian approach does not use a P value to measure evidence; rather, it uses an index called the Bayes factor (Goodman, 1999; Kass and Raftery, 1995). The Bayes factor encodes mathematically the principle presented earlier—that the role of evidence is to help adjudicate between two or more competing hypoth- eses. The Bayes factor modifies the probability of whether a hypothesis is true. Decision-makers can then use that probability to characterize the likelihood that their decisions will be wrong. In its simplest form, Bayes theorem can be defined in the following equation (Goodman, 1999; Kass and Raftery, 1995): The odds that a The odds that a The strength of hypothesis is true = hypothesis is true × new evidence after new evidence before new evidence (the Bayes factor) The Bayes factor is sometimes regarded as the “weight of the evidence” comparing how strongly the data support one hypothesis (or combination of hypotheses) to another (Good, 1950; Kass and Raftery, 1995). Most important is the role that the Bayes factor plays in Bayes theorem; it modifies the probability that a given hypothesis is true. This concept that a hypothesis has a certain “truth probability” has no counterpart in standard frequentist approaches. There is not a one-to-one relationship between P values and Bayes factors, because the magnitude of an observed effect and the prior probabilities of hypoth- eses also can affect the Bayes factor calculation itself. But in most common statistical situations, there exists a strongest possible Bayes factor, and that can be defined as a function of the observed P value. That relationship can be used to calculate the maximum chance that the non-null hypothesis is true as a function of the P value and a prior probability (Goodman, 2001; Royall, 1997). Assume that the null hypothesis is that a given drug does not cause a given harm, and that the alternative hypothesis is that it does elevate the risk of that harm. Table 3-1 shows how a given P value (translated into the strongest Bayes factor) alters the probability of the hypothesis of harm, defining the null hypoth - esis as stating that a given drug does not harm, and the alternative hypothesis is that it does elevate the risk of that harm. For example, if a new randomized controlled trial (RCT) yields a P value of 0.03 for a newly reported adverse effect of a drug and there was deemed to be only a 1 percent chance before the RCT of that unsuspected adverse effect being caused by the drug, the new evidence increases the chance of the causal relationship to at most 10 percent (see Table 3-1). A regulatory decision predicated on the harm being real would therefore be wrong more than 90 percent of the time. Without a formal Bayesian interpretation, that high probability of error would not be apparent from any standard analysis. Using conventional measures, such a study might report that “a previously unreported association of tinnitus was observed with the drug, OR [odds ratio] = 3.5, 95% CI [confidence interval] 1.1 to 11.1. P = 0.03”. This statement does not actually indicate how likely it is

OCR for page 121
126 STUDYING THE SAFETY OF APPROVED DRUGS TABLE 3-1 Maximum Change in the Probability of a Drug Effect as a Function of P Value and Bayes Factor, Calculated by Using Bayes’ Theorem Maximum Probability P Value in Strongest Strength of Prior Probability After the Evidencea of an Effect, %b New Study Bayes Factor New Study, % 0.10 0.26 Weak 1 2.5 25 46 50 79 83 95 0.05 0.15 Moderate 1 6 25 69 50 87 76 95 0.03 0.10 Moderately 1 10 25 78 Strong 50 81 67 95 0.01 0.04 Strong 1 21 25 90 40 95 50 96.5 0.001 0.005 Very Strong 1 75 8 95 25 99 50 99.5 aThe qualitative descriptor of the strength of the evidence is made on the basis of the quantitative change in the probability of truth of a null-null drug effect. bThe prior truth probabilities of 1%, 25%, or 50% are arbitrarily chosen to span a wide range of strength of prior evidence. The shaded prior probability illustrates the minimum prior probability re- quired to provide a 95% probability of a drug effect after observing a result with the reported P value. SOURCE: Modified from Goodman (1999). that the drug actually raises the risk of tinnitus. For that, a prior probability is needed, and the Bayes factor. If the mechanism or some preliminary observa - tions justified a 25 percent prior chance of a harmful effect, the same evidence would raise that to at most a 78 percent chance of harm—that is, at least a 22 percent chance that the drug does not cause that harm. Table 3-1 shows that after observing P = 0.03 for an elevated risk of harm, in order to be 95 percent certain that this elevation was true, the prior probability of a risk elevation would have to have been at least 67 percent before the study. That might be the case if there was an established mechanism for the adverse effect, if other drugs in the same class were known to produce this effect, or if a prior study showed the same effect. In practice, however, there exist no conventions or empirical data to deter- mine exactly how to assign such prior probabilities, although the elicitation of prior probabilities from experts has been much studied (Chaloner, 1996; Kadane

OCR for page 121
127 EVIDENCE AND DECISION-MAKING and Wolfson, 1998). FDA incorporated the notion of a prior informally in its incorporation of “biologic plausibility” into decision-making of how to respond to drug safety signals that arise in the course of pharmacovigilance, in March 2012 draft guidance (FDA, 2012): CDER will consider whether there is a biologically plausible explanation for the association of the drug and the safety signal, based on what is known from systems biology and the drug’s pharmacology. The more biologically plausible a risk is, the greater consideration will be made to classifying a safety issue as a priority. As demonstrated in the above paragraph, biologic plausibility and other forms of external evidence are currently accommodated qualitatively; Bayesian approaches allows that to be done quantitatively, providing a formal structure by which both prior evidence and other sources of information (for example, on common mechanisms underlying different harms, or their relationship to disease processes) should affect decisions. This discussion illustrates a number of important issues • Given new evidence, the probability that a drug will be harmful can vary widely depending on the strength of the prior or external information, represented as a prior probability distribution. • The chance that a drug will be harmful, based on P values for a harmful effect in the borderline significant range (0.01–0.05), is often far lower than is suspected, unless there are fairly strong reasons to believe in the harm before the study. • The Bayesian approach allows the calculation of intermediate levels of certainty (for example, less than 95 percent) that might be sufficient for regulatory action, particularly for drug harms. • Without agreed-upon conventions or empirical bases for assigning prior probabilities, the prior probabilities derived from a given body of evi- dence will differ among scientists, resulting in different conclusions from the same data. The probability that a given harm will be caused by a drug is a key attribute in regulatory decision-making. How sure regulators must be to take a given action varies according to the consequences of decisions. In some cases, 95 percent certainty might be needed, in others 75 percent, and in still others less than 50 percent. The Bayesian approach provides numbers that feed into that judgment (Kadane, 2005). Despite these advantages, one of the weaknesses of Bayesian calculations is that there is no unique way to assign a prior probability to the strength of external evidence, particularly if that evidence is difficult to quantify, such as biologic

OCR for page 121
128 STUDYING THE SAFETY OF APPROVED DRUGS plausibility. Although it may be impossible to assess subtle differences in prior probability, even crude distinctions can be helpful, such as whether the prior evi- dence justifies probability ranges of 1–5 percent, 15–50 percent, 60–80 percent, or 90+ percent. Such categorizations often provide fine enough discrimination to be useful for decision-making. In the absence of agreement on prior probabilities, “non-informative” prior distributions can be used that rely almost exclusively on the observed data, and sensitivity analyses with different kinds of prior prob - abilities from different decision-makers can be conducted (Emerson et al., 2007; Greenhouse and Waserman, 1995). At a minimum, these prior probabilities should be elicited and their evidential bases made explicit so that this potential source of disagreement can be better understood, and perhaps diminished. The difference between Bayesian and frequentist approaches can go well beyond the incorporation of prior evidence, extending to more complex aspects of how the analytic problem is structured and analyzed. Madigan et al. (2010) provide a comprehensive suite of Bayesian methods to analyze safety signals arising from a broad range of study designs likely to be employed in the post - marketing setting. WHY SCIENTISTS DISAGREE When new information arises that puts into question a drug’s benefits and risks, FDA’s decision-makers often face sharp disagreements among scientists over how to interpret that information in the context of pre-existing information and over what regulatory action, if any, should be taken in response to the new information. Such disagreements are often unavoidable, and moving forward with appropriate decision-making is difficult if the underlying reasons for them are unknown or misunderstood. The committee identified a number of reasons for the disagreements about scientific evidence that occur among scientists. Those reasons, which are listed in Box 3-1, are discussed below. Different Prior Beliefs About the Existence of an Effect People’s beliefs about the plausibility of an effect of a drug are determined, in part, by their knowledge and interpretation of prior evidence about the drug’s benefits and risks (Eraker et al., 1984). That knowledge shapes their responses to new evidence. Prior evidence can come directly from earlier clinical studies of the drug’s effects, from studies of drugs in the same class that demonstrate the effect, and from information about the drug’s mechanism of action. Newly observed evidence might be interpreted as resulting in a higher chance that a drug is harmful if earlier studies have also demonstrated the harm. If other drugs in the same class have been associated with a particular adverse effect, the drug has a higher prior probability of causing that effect than a drug in a class whose mem -

OCR for page 121
129 EVIDENCE AND DECISION-MAKING BOX 3-1 Why Scientists Disagree About the Strength of Evidence Supporting Drug Safety Prior Evidence 1. Different weights given to pre-existing mechanistic or empirical evi- dence supporting a given benefit or risk. Quality of the New Study 2. Different views about the reliability of the data sources. 3. Different confidence in the design’s ability to eliminate the effect of factors unrelated to drug exposure. 4. Different views on the appropriateness of statistical models. Relevance of the New Evidence to the Public Health Question 5. Different views of the hypotheses needing evaluation. 6. Different assessments of the transportability of results. Synthesizing the Evidence 7. Different ideas about how to weigh and combine all the available evi- dence from disparate sources relevant to the public health question. Appropriate Regulatory Response to the Body of Evidence 8. Different opinions among scientists regarding the thresholds of cer- tainty to justify concern or regulatory action, which can affect how they view the evidence bers have not produced such an effect. If a drug has a mechanism of action that has been implicated in a particular adverse effect, it has a higher prior probability of causing that effect than a drug for which such a mechanism is implausible. For example, the prior probability that a topical steroid would produce significant internal injury would be very low because what is known about the absorption, metabolism, and physiologic actions of topical steroids makes it difficult to imagine how such an injury could occur, but the prior probability of an adverse dermatologic effect would be much higher. Evidential bases of prior probability can take two forms: an assessment of the evidence supporting the mechanistic explanation of a proposed effect and the cumulative weight of previous empirical studies. Marciniak, in the FDA Office of New Drugs (OND) Division of Cardiovascular and Renal Products discussed mechanism directly in a letter that was provided for a July 2010 FDA Advisory Committee meeting related to Avandia (Marciniak, 2010):

OCR for page 121
130 STUDYING THE SAFETY OF APPROVED DRUGS Others have speculated that rosiglitazone could increase MI [myocardial infarc - tion] rates through its effects upon lipids or by the same mechanism whereby it increases HF [heart failure] rates. There are no clinical studies establishing these mechanisms. We propose that there is a third mechanism for which there is some evidence from clinical studies. The third possible mechanism is the following: The Avandia label states that “In vitro data demonstrate that rosiglitazone is predominantly metabolized by Cytochrome® P450 (CYP) isoenzyme 2C8, with CYP2C9 contributing as a minor pathway.” The published literature suggests that rosiglitazone may also function as an inhibitor of CYP2C8 . . . . Allelic variants of the CYP2C9 gene have been associated in epidemiological studies with increased risk of myocardial infarction and atherosclerosis. . . . Recently, CYP2C8 vari - ants has also been associated with increased risk of MI. . . . CYP2C9 and 2C8 catalyze the metabolism of arachidonic acid to vasoactive substances, providing one potential mechanism for affecting cardiac disease. Interference with ciga - rette toxin metabolism is another. . . . Rosiglitazone effects upon CYP2C8 and CYP2C9 could be the mechanism for its CV adverse effects. Regardless, there are several possible mechanisms for CV toxicity of rosiglitazone. The above paragraph describes a mechanism that is fairly speculative, as labeled. There is no suggestion or claim that such a mechanism would definitely or even probably produce adverse cardiovascular effects. Rather, this particular exposition is exploratory and aimed at establishing that such an effect is possible rather than probable. Those who have a good understanding of this particular set of pathways might interpret the explanation differently and establish a different starting point for the probability of such an effect. It is unlikely, though, that on the basis of such evidence general consensus could be garnered for a high prior probability of effect. Mechanistic explanations generally provide weak evidence when they are offered post hoc to support an observed result. They carry more weight when they are proposed before such an effect is observed. Misbin (2007) raised questions about the safety of rosiglitazone on the basis of its effects on body weight and lipids—both well-established risk factors for cardiovascular disease—long before any risk of myocardial infarction (MI) was seen in any studies. Another, more subtle way in which mechanistic considerations can affect inferences is in the choice of endpoints, as illustrated in discussions by Marcin - iak, from the FDA Office of New Drugs (OND) Division of Cardiovascular and Renal Products, of the wisdom of combining silent and clinical MIs into a single endpoint (Marciniak, 2010): There is additional evidence from RECORD [the Rosiglitazone Evaluated for Cardiac Outcomes and Regulation of Glycemia in Diabetes trial] that the MI risk for rosiglitazone is real rather than a random variation: We prospectively excluded silent MIs from our primary analysis because we had concerns that silent MIs might represent a different disease mechanism than symptomatic MIs, e.g., could they represent

OCR for page 121
131 EVIDENCE AND DECISION-MAKING gradual necrosis from diabetic microvascular disease rather than an acute event with coronary thrombosis in an epicardial coronary artery? Whether or not silent and clinical MIs should be combined—a critical deci - sion in assessing the evidence—is framed here as contingent on whether or not they represent different manifestations of the same pathophysiologic process. What is important to recognize is that the numbers arising from an analysis that excludes silent MIs are only as credible as the underlying mechanistic explana - tion. This example shows how a mechanistic explanation can affect the analyses, especially exploratory analysis, even if it is not explicitly invoked as an evidential basis of a claim. Even if two scientists agree about what evidence new data provides, if they have different assessments of the strength of prior evidence they might disagree about the probability of a higher drug risk. Such a disagreement might appear outwardly to be about the new evidence when in fact the disagreement is about the prior probability. That phenomenon is captured quantitatively by Bayes theo - rem, as previously noted (Fisher, 1999), which can use sensitivity analyses with different priors to illustrate the plausible range of chances that the drug induces unacceptable safety risks. Quality of the New Study Standard approaches to evaluating evidence rely on the use of evidence hierarchies, which traditionally emphasize the type of study design as the main determinant of evidential quality; an example is the US Preventive Services Task Force guidance (AHRQ, 2008). Many scientists judge a study on the basis of its type of design above all other considerations. The type of study design, however, is only one of the factors that should be taken into account in assessing the qual - ity of a study and thereby the quality of the evidence from the study. In addition to the type of study, such other aspects as the source and reliability of the data, study conduct, whether there are missing or misclassified data, and data analyses influence the quality of the evidence generated by a study. Some of these reflected in the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach to evidence assessment (Guyatt et al., 2008). Those factors and their role in disagreements among scientists are discussed below. Different Views about the Reliability of the Data Source Most evidence hierarchies assume that data in a study are generated for research purposes and that outcome measures are specified in advance. Much postmarketing research about a drug’s benefits and risks, however, whether an RCT or an observational study, depends at least in part on data gathered with systems developed for other purposes. For example, billing data that happen to

OCR for page 121
158 STUDYING THE SAFETY OF APPROVED DRUGS statistical code, and information about how decisions were made to produce the analytic dataset from the raw measured data. Optimally, it involves some form of data-sharing. Such data sharing permitted the reanalysis of the RECORD trial that was presented to FDA in the rosiglitazone case. The review revealed that innumerable discrepancies and judgment calls frequently occurred in the original study—from defining a clinical event to the choice of analytic method—and those discrepancies and judgments affected the weight that the results were given in the regulatory decision-making process. For critical research that is to be the basis of regulatory decisions, which can be primary studies like RECORD or can be meta- analyses, standards should be developed within FDA to adhere to reproducible research principles so that the basis of the many judgments can be examined and adjudicated by scientists and regulators when disputes over data interpretation and its implications arise. Going a step beyond reproducibility, FDA is well-positioned to help assure the accurate public reporting of risk information submitted to it as part of the premarketing approval process. These are often, but not always, published after approval and included in postmarketing safety assessments. FDA scientists themselves have identified the discordance of published data from that submit - ted to FDA as a problem for the validity of postmarketing safety meta-analyses (Hammad et al., 2011), and there are numerous examples of under or delayed reporting of harms that had been previously reported to regulatory authorities (for example, Carragee et al., 2011; Lee et al., 2008; Melander et al., 2003; Vedula et al., 2009). FDAAA addressed this problem by requiring that all clinical trials submitted for new drug approval or for new labeling be registered at inception at ClinicalTrials.gov, and that the summary results of all pre-specified outcomes be posted within one year of drug approval for new drugs, or three years for new indications (Miller, 2010; Wood, 2009). However, recently reported evidence has shown that compliance with this aspect of FDAAA has been low (Law et al., 2011). In addition, the FDA policy on the reporting of studies submitted for non- approved drugs has not been settled (Miller, 2010). Finally, publishing summary results is not equivalent to sharing primary data, which allows for re-analyses. New approaches are needed to facilitate the publication of safety data submitted to FDA for approved drugs, and to find ways to release similar data for drugs that are disapproved, but whose information might be extremely valuable for the interpretation of safety information from approved drugs in the same class. FINDINGS AND RECOMMENDATIONS Finding 3.1 Some of FDA’s most difficult decisions are those in which experts disagree about how compelling the evidence that informs the public health question is. Under- standing the nature and sources of those disagreements and their implications for

OCR for page 121
159 EVIDENCE AND DECISION-MAKING FDA’s decisions is key to improving the agency’s decision-making process. For example, experts can disagree about the plausibility of a new risk (or decreased benefit) on the basis of different assessments of prior evidence, the quality of new data, the adequacy of confounding control in the relevant studies, the trans - portability of results, the appropriateness of the statistical analysis, the relevance of the new evidence to the public health question, how the evidence should be weighed and synthesized, or the threshold for regulatory actions. Recommendation 3.1 FDA should use the framework for decision-making proposed in Recom- mendation 2.1 to ensure a thorough discussion and clear understanding of the sources of disagreement about the available evidence among all participants in the regulatory decision-making process. In the interest of transparency, FDA should use the BRAMP document proposed in Recommendation 2.2 to ensure that such disagreements and how they were resolved are documented and made public. Finding 3.2 Such methods as Bayesian analyses or other approaches to integrating external relevant information with newly emerging information could provide decision- makers with useful quantitative assessments of evidence. An example would be sensitivity analyses of clinical-trial data that illustrate the influence of prior prob - abilities on estimates of probabilities that an intervention has unacceptable safety risks. These approaches can inform judgments, allow more rational decision- making, and permit input from multiple stakeholders and experts. Recommendation 3.2 FDA should ensure that it has adequate expertise in Bayesian approaches, in combination with expertise in relevant frequentist and causal inference meth - ods, to assess the probability that observed associations reflect actual causal effects, to incorporate multiple sources of uncertainty into the decision- making process, and to evaluate the sensitivity of those conclusions to dif - ferent representations of external evidence. To facilitate the use of Bayesian approaches, FDA should develop a guidance document for the use of Bayes - ian methods for assessing a drug’s benefits, risks, and benefit–risk profile. Finding 3.3 Traditionally, the main criteria for evaluating a study are ones that contribute to its internal validity. A well-conducted RCT typically has higher internal valid - ity than a well-conducted observational study. Results of observational studies, however, can have greater transportability if their participants are more similar

OCR for page 121
160 STUDYING THE SAFETY OF APPROVED DRUGS to the target clinical population than to the participants in a clinical trial. In some circumstances, such as an evaluation of the association between a drug and an uncommon unexpected adverse event, observational studies may produce esti - mates closer to the actual risk in the general population than can be achieved in clinical trials. In assessing the relevance of study findings to a public health ques- tion, the transportability of the study results is as important as the determinants of its internal validity. Recommendation 3.3 In assessing the benefits and risks associated with a drug in the postmarketing context, FDA should develop guidance and review processes that ensure that observational studies with high internal validity are given appropriate weight in the evaluation of drug harms and that transportability is given emphasis similar to that given bias and other errors in assessing the weight of evidence that a study provides to inform a public health question. Finding 3.4 The principles of reproducible research are important for ensuring the integrity of postmarketing research used by FDA. Those principles include providing information on the provenance of data (from measurement to analytic dataset) and, when possible, making available properly annotated analytic datasets, study protocols (including statistical analysis plan) and their amendments, and statisti - cal codes. Recommendation 3.4 All analyses, whether conducted independently of FDA or by FDA staff, whose results are relied on for postmarketing regulatory decisions should use the principles of reproducible research when possible, subject to legal con- straints. To that end, FDA should present data and analyses in a fashion that allows independent analysts either to reproduce the findings or to understand how FDA generated the results in sufficient detail to understand the strengths, weaknesses, and assumptions of the relevant analyses. Finding 3.5 The ability of researchers in and outside FDA to analyze new information about the benefits and risks associated with a marketed drug and to design appropri - ate postmarketing research—including conducting individual-patient meta- analyses—is enhanced by access to data and analyses from all studies of the drug and others in the same drug class that were reported in the preapproval process. Although disclosure of such information is likely to advance the public’s health, such disclosures raise concerns about the privacy of participants in the research

OCR for page 121
161 EVIDENCE AND DECISION-MAKING that generated the information and may threaten industry interest in maintain - ing proprietary information, which is deemed important for innovation. New approaches to resolving this tension are needed. Recommendation 3.5 FDA should establish and coordinate a working group, including industry and patient and consumer representatives, to find ways that appropriately balance public health, privacy, and proprietary interests to facilitate disclosure of data for trials and studies relevant to postmarketing research decisions. Finding 3.6 The elements of the benefit–risk profile of a drug are best estimated by using all the available high-quality data, and meta-analysis is a useful tool for summarizing such data and evaluating heterogeneity. However, because the reporting of harms in published RCTs and observational studies is often poor or inconsistent and because there is often substantial publication bias in studies of drug risk, steps are needed to improve both the reporting of harms and the design of studies of harm. That can be done through prospective planning for selected meta-analyses and by monitoring compliance with the FDAAA requirement that summary trial results for all primary and secondary outcomes be published at ClinicalTrials.gov. Recommendation 3.6 For drugs that are likely to have required postmarketing observational stud - ies or trials, FDA should use the BRAMP to specify potential public health questions of interest as early as possible; should prospectively recommend standards for uniform definition of key variables and complete ascertainment of events among studies or convene researchers in the field to suggest such standards and promote data-sharing; should prospectively plan meta-analyses of the data with reference to specified exposures, outcomes, comparators, and covariates; should conduct the meta-analyses of the data; and should make appropriate regulatory decisions in a timely fashion. FDA can also improve the validity of meta-analyses by monitoring and encouraging compliance with FDAAA requirements for reporting to ClinicalTrials.gov. Finding 3.7 FDA produced a high-quality guidance document on the use of the noninferior- ity design for the study of efficacy. Increasingly, FDA is using the noninferiority design to evaluate drug-safety endpoints as the primary outcomes in randomized trials. The use of noninferiority analyses to establish the acceptability of the benefit–risk profile of a drug can take the decision about how to balance the risks and benefits of two drugs out of the hands of regulators. Noninferiority trials also

OCR for page 121
162 STUDYING THE SAFETY OF APPROVED DRUGS have the disadvantage of being biased toward equivalence when trial design or conduct is suboptimal; this is of particular concern when such trials are used to estimate risks. Recommendation 3.7.1 FDA should develop a guidance document on the design and conduct of noninferiority postmarketing trials for the study of safety of a drug. The guid- ance should include discussion of criteria for choosing the standard therapy to be used in the active-treatment control arm; of methods for selecting a noninferiority margin in safety trials and ensuring high-quality trial conduct; of the optimal analytic methods, including Bayesian approaches; and of the interpretation of the findings in terms of the drug’s benefit–risk profile. Recommendation 3.7.2 FDA should closely scrutinize the design and conduct of any noninferiority safety studies for aspects that may inappropriately make the arms appear similar. FDA should use the observed-effect estimate and confidence interval as a basis for decision-making, not the binary noninferiority verdict. REFERENCES AHRQ (Agency for Healthcare Research and Quality). 2008. U.S. Preventive Services Task Force procedure manual. Washington, DC: Department of Health and Human Services. Baggerly, K. 2010. Disclose all data in publications. Nature 467(7314):401. Baigent, C., A. Keech, P. M. Kearney, and L. Blackwell. 2005. Efficacy and safety of cholesterol- lowering treatment: Prospective meta-analysis of data from 90,056 participants in 14 randomised trials of statins. Lancet 366(9493):1267-1278. Barton, M. B., T. Miller, T. Wolff, D. Petitti, M. LeFevre, G. Sawaya, B. Yawn, J. Guirguis-Blake, N. Calonge, R. Harris, and U.S. Preventive Services Task Force. 2007. How to read the new recom - mendation statement: Methods update from the U.S. Preventive Services Task Force. Annals of Internal Medicine 147(2):123-127. Becker, M. C., T. H. Wang, L. Wisniewski, K. Wolski, P. Libby, T. F. Lüscher, J. S. Borer, A. M. Mascette, M. E. Husni, D. H. Solomon, D. Y. Graham, N. D. Yeomans, H. Krum, F. Ruschitzka, A. M. Lincoff, and S. E. Nissen. 2009. Rationale, design, and governance of Prospective Randomized Evaluation of Celecoxib Integrated Safety versus Ibuprofen or Naproxen (PRECISION), a cardiovascular end point trial of nonsteroidal antiinflammatory agents in patients with arthritis. American Heart Journal 157(4):606-612. Bent, S., A. Padula, and A. L. Avins. 2006. Brief communication: Better ways to question patients about adverse medical events: A randomized, controlled trial. Annals of Internal Medicine 144(4):257-261. Berry, D. A. 2006. Bayesian clinical trials. Nature Reviews Drug Discovery 5(1):27-36. Berry, D. A., M. C. Wolff, and D. Sack. 1992. Public health decision making: A sequential vaccine trial. In Bayesian statistics, edited by J. Bernardo, J. Berger, A. Dawid and A. Smith. Oxford, UK: Oxford University Press. Pp. 79-96. Camm, A. J., A. Capucci, S. H. Hohnloser, C. Torp-Pedersen, I. C. Van Gelder, B. Mangal, and G. Beatch. 2011. A randomized active-controlled study comparing the efficacy and safety of vernakalant to amiodarone in recent-onset atrial fibrillation. Journal of the American College of Cardiology 57(3):313-321.

OCR for page 121
163 EVIDENCE AND DECISION-MAKING Campbell, G. 2011. Bayesian statistics in medical devices: Innovation sparked by the FDA. Journal of Biopharmaceutical Statistics 21(5):871-887. Carey, V. J., and V. Stodden. 2010. Reproducible research concepts and tools for cancer bioinformat - ics. In Biomedical informatics for cancer research, edited by M. F. Ochs, J. T. Casagrande and R. V. Davuluri. Springer US. Pp. 149-175. Carpenter, D. 2010. Reputation and power institutionalized: Scientific networks, congressional hear- ings, and judicial affirmation, 1963-1986. In Reputation and power: Organizational image and pharmaceutical regulation at the FDA. Cambridge, NY: Princeton University Press. Pp. 298-392. Carragee, E. J., E. L. Hurwitz, and B. K. Weiner. 2011. A critical review of recombinant human bone morphogenetic protein-2 trials in spinal surgery: Emerging safety concerns and lessons learned. Spine Journal 11(6):471-491. Chaloner, K. 1996. Elicitation of prior distributions. In Bayesian biostatistics, edited by D. A. Berry and D. K. Stangl. New York: Marcel Dekker. Chan, A.-W., A. Hróbjartsson, M. T. Haahr, P. C. Gøtzsche, and D. G. Altman. 2004. Empirical evidence for selective reporting of outcomes in randomized trials. JAMA 291(20):2457- 2465. Chowdhury, B. A., and G. Dal Pan. 2010. The FDA and safe use of long-acting beta-agonists in the treatment of asthma. New England Journal of Medicine 362(13):1169-1171. Chowdhury, B. A., S. M. Seymour, and M. S. Levenson. 2011. Assessing the safety of adding LABAs to inhaled corticosteroids for treating asthma. New England Journal of Medicine 364(26):2473-2475. Claxton, K., J. T. Cohen, and P. J. Neumann. 2005. When is evidence sufficient? Health Affairs 24(1):93-101. Cooper, H., and E. A. Patall. 2009. The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychological Methods 14(2):165-176. Dal Pan, G. J. 2010. Memorandum from Gerald Dal Pan to Janet Woodcock (dated September 12, 2010). Re: Recommendations for regulatory action for rosiglitazone and rosiglitazone- containing products (NDA 21-071, supplement 035, incoming submission dated August 25, 2009). Washington, DC: Department of Health and Human Services. Darby, S., P. McGale, C. Correa, C. Taylor, R. Arriagada, M. Clarke, D. Cutter, C. Davies, M. Ewertz, J. Godwin, R. Gray, L. Pierce, T. Whelan, Y. Wang, and R. Peto. 2011. Effect of radio - therapy after breast-conserving surgery on 10-year recurrence and 15-year breast cancer death: Meta-analysis of individual patient data for 10,801 women in 17 randomised trials. Lancet 378(9804):1707-1716. Davies, C., J. Godwin, R. Gray, M. Clarke, D. Cutter, S. Darby, P. McGale, H. C. Pan, C. Taylor, Y. C. Wang, M. Dowsett, J. Ingle, and R. Peto. 2011. Relevance of breast cancer hormone receptors and other factors to the efficacy of adjuvant tamoxifen: Patient-level meta-analysis of randomised trials. Lancet 378(9793):771-784. Emerson, S. S., J. M. Kittelson, and D. L. Gillen. 2007. Bayesian evaluation of group sequential clinical trial designs. Statistics in Medicine 26(7):1431-1449. Eraker, S. A., J. P. Kirscht, and M. H. Becker. 1984. Understanding and improving patient compliance. Annals of Internal Medicine 100(2):258. Erik, C. 2007. Methodology of superiority vs. equivalence trials and non-inferiority trials. Journal of Hepatology 46(5):947-954. Etzioni, R. D., and J. B. Kadane. 1995. Bayesian statistical methods in public health and medicine. Annual Review of Public Health 16(1):23-41. FDA (US Food and Drug Administration). 2008. Guidance for industry. Diabetes mellitus—evaluat- ing cardiovascular risk in new antidiabetic therapies to treat type 2 diabetes. Washington, DC: Department of Health and Human Services. FDA. 2010a. Guidance for industry and FDA staff: Guidance for the use of Bayesian statistics in medical device clinical trials. Rockville, MD: Department of Health and Human Services.

OCR for page 121
164 STUDYING THE SAFETY OF APPROVED DRUGS FDA. 2010b. Guidance for industry: Non-inferiority clinical trials, draft guidance. Washington, DC: Department of Health and Human Services. FDA. 2010c. FDA briefing document. Advisory committee meeting for NDA 21071: Avandia (rosigli - tazone maleate tablet). Silver Spring, MD: Department of Health and Human Services. FDA. 2012. Classifying significant postmarketing drug safety issues: Draft guidance . Washington, DC: Department of Health and Human Services. Fisher, D. J., A. J. Copas, J. F. Tierney, and M. K. B. Parmar. 2011. A critical review of methods for the assessment of patient-level interactions in individual participant data meta-analysis of ran - domized trials, and guidance for practitioners. Journal of Clinical Epidemiology 64(9):949-967. Fisher, L. D. 1999. Carvedilol and the Food and Drug Administration (FDA) approval process: The FDA paradigm and reflections on hypothesis testing. Controlled Clinical Trials 20(1):16-39. Fleming, T. R. 2008. Current issues in non-inferiority trials. Statistics in Medicine 27(3):317-332. Fleming, T.R., K. Odem-Davis, M. Rothmann, and Y. Li Shen. 2011. Some essential considerations in the design and conduct of non-inferiority trials. Clinical Trials 8:432-439. Frank, E., G. B. Cassano, P. Rucci, A. Fagiolini, L. Maggi, H. C. Kraemer, D. J. Kupfer, B. Pollock, R. Bies, V. Nimgaonkar, P. Pilkonis, M. K. Shear, W. K. Thompson, V. J. Grochocinski, P. Scocco, J. Buttenfield, and R. N. Forgione. 2008. Addressing the challenges of a cross-national inves - tigation: Lessons from the Pittsburgh-PISA study of treatment-relevant phenotypes of unipolar depression. Clinical Trials 5(3):253-261. Furberg, C. D., and B. Pitt. 2001. Commentary: Withdrawl of cerivastatin from the world market. Current Controlled Trials in Cardiovascular Medicine 2(5):205-207. GAO (Government Accountability Office). 2010a. Drug safety: FDA has conducted more foreign inspections and begun to improve its information on foreign establishments, but more progress is needed. Washington, DC: Government Accountability Office. GAO. 2010b. Food and Drug Administration: Overseas offices have taken steps to help ensure import safety, but more long-term planning is needed. Washington, DC: Government Accountability Office. GAO. 2010c. New drug approval: FDA’s consideration of evidence from certain clinical trials. Washington, DC: Government Accountability Office. Garrison, L. P., Jr., P. J. Neumann, P. Radensky, and S. D. Walcoff. 2010. A flexible approach to evidentiary standards for comparative effectiveness research. Health Affairs 29(10):1812-1817. Gelfand, A. E., and B. K. Mallick. 1995. Bayesian analysis of proportional hazards models built from monotone functions. Biometrics 51(3):843-852. Golder, S., Y. K. Loke, and M. Bland. 2011. Meta-analyses of adverse effects data derived from randomised controlled trials as compared to observational studies: Methodological overview. PLoS Med 8(5):e1001026. Good, I. J. 1950. Probability and the weighting of evidence. London, UK: Charles Griffin & Co. Goodman, S. N. 1999. Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130(12):1005-1013. Goodman, S. N. 2001. Of P-values and Bayes: A modest proposal. Epidemiology 12(3):295-297. Gordis, L. 2004. Epidemiology. Third ed. Philadelphia, PA: Elsevier Inc. Graham, D. J., and K. Gelperin. 2010a. Memorandum to Mary Parks regarding comments on RECORD, TIDE, and the benefit-risk assessment of rosiglitazone vs. pioglitazone. In FDA Briefing Document Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone male - ate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health and Human Services. Graham, D. J., and K. Gelperin. 2010b. TIDE and benefit-risk considerations. http://www.fda.gov/ downloads/AdvisoryCommittees/CommitteesMeetingMaterials/Drugs/EndocrinologicandMeta bolicDrugsAdvisoryCommittee/UCM224732.pdf (accessed October 11, 2011). Greene, B. M., A. M. Geiger, E. L. Harris, A. Altschuler, L. Nekhlyudov, M. B. Barton, S. J. Rolnick, J. G. Elmore, and S. Fletcher. 2006. Impact of IRB requirements on a multicenter survey of prophylactic mastectomy outcomes. Annals of Epidemiology 16(4):275-278.

OCR for page 121
165 EVIDENCE AND DECISION-MAKING Greenhouse, J. B., and L. Waserman. 1995. Robust Bayesian methods for monitoring clinical trials. Statistics in Medicine 14(12):1379-1391. Guyatt, G. H., A. D. Oxman, G. E. Vist, R. Kunz, Y. Falck-Ytter, P. Alonso-Coello, and H. J. Schunemann. 2008. GRADE: An emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336(7650):924-926. Hamburg, M. A. 2011. Commentary: The growing role of epidemiology in drug safety regulation. Epidemiology 22(5):622-624. Hammad, T. A., S. P. Pinheiro, and G. A. Neyarapally. 2011b. Secondary use of randomized con - trolled trials to evaluate drug safety: A review of methodological considerations. Clinical Trials 8(5):559-570. Hernán, M. A., and S. Hernandez-Diaz. 2012. Beyond the intention-to-treat in comparative effective - ness research. Clinical Trials 9(1):48-55. Hernán, M. A., and C. Robins. 2012. Causal inference. New York: Chapman & Hall/CRC. Hernán, M. A., and J. M. Robins. 2006. Instruments for causal inference: An epidemiologist’s dream? Epidemiology 17(4):360-372. Ioannidis, J. P. A., and J. Lau. 2001. Completeness of safety reporting in randomized trials: An evalu - ation of 7 medical areas. JAMA 285(4):437-443. Ioannidis, J. P. A., S. J. W. Evans, P. C. Gøtzsche, R. T. O’Neill, D. G. Altman, K. Schulz, and D. Moher. 2004. Better reporting of harms in randomized trials: An extension of the consort state - ment. Annals of Internal Medicine 141(10):781-788. Ioannidis, J. P., C. D. Mulrow, and S. N. Goodman. 2006. Adverse events: The more you search, the more you find. Annals of Internal Medicine 144(4):298-300. IOM (Institute of Medicine). 2008. Improving the presumptive disability decision-making process for veterans. Washington, DC: The National Academies Press. Ives, D. G., A. L. Fitzpatrick, D. E. Bild, B. M. Psaty, L. H. Kuller, P. M. Crowley, R. G. Cruise, and S. Theroux. 1995. Surveillance and ascertainment of cardiovascular events: The Cardiovascular Health Study. Annals of Epidemiology 5(4):278-285. Ives, D. G., P. Samuel, B. M. Psaty, and L. H. Kuller. 2009. Agreement between nosologist and car- diovascular health study review of deaths: Implications of coding differences. Journal of the American Geriatrics Society 57(1):133-139. Jencks, S. F., D. K. Williams, and T. L. Kay. 1988. Assessing hospital-associated deaths from discharge data. JAMA 260(15):2240-2246. Jenkins, J. K. 2010. Memorandum from John Jenkins to Janet Woodcock (dated September, 2010). Re: Recommendations for regulatory actions—Rosiglitazone. Washington, DC: US Food and Drug Administration. Jones, A. P., R. D. Riley, P. R. Williamson, and A. Whitehead. 2009. Meta-analysis of individu - al patient data versus aggregate data from longitudinal clinical trials. Clinical Trials 6(1): 16-27. Juurlink, D. N. 2010. Rosiglitazone and the case for safety over certainty. JAMA 304(4):469-471. Kadane, J. B. 2005. Bayesian methods for health-related decision making. Statistics in Medicine 24(4):563-567. Kadane, J., and L. J. Wolfson. 1998. Experiences in elicitation. Journal of the Royal Statistical Society: Series D (The Statistician) 47(1):3-19. Kaizar, E. E., J. B. Greenhouse, H. Seltman, and K. Kelleher. 2006. Do antidepressants cause sui - cidality in children? A Bayesian meta-analysis. Clinical Trials 3(2):73-90; discussion 91-98. Kass, R. E., and A. E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90(430):773-795. Kaul, S., and G. A. Diamond. 2006. Good enough: A primer on the analysis and interpretation of noninferiority trials. Annals of Internal Medicine 145(1):62-69.

OCR for page 121
166 STUDYING THE SAFETY OF APPROVED DRUGS Kaul, S., and G. A. Diamond. 2007. Making sense of noninferiority: A clinical and statistical per- spective on its application to cardiovascular clinical trials. Progress in Cardiovascular Diseases 49(4):284-299. Kufner, S., A. de Waha, F. Tomai, S.-W. Park, S.-W. Lee, D.-S. Lim, M. H. Kim, A. M. Galloe, M. Maeng, C. Briguori, A. Dibra, A. Schömig, and A. Kastrati. 2011. A meta-analysis of specifi - cally designed randomized trials of sirolimus-eluting versus paclitaxel-eluting stents in diabetic patients with coronary artery disease. American Heart Journal 162(4):740-747. Laine, C., S. N. Goodman, M. E. Griswold, and H. C. Sox. 2007. Reproducible research: Moving toward research the public can really trust. Annals of Internal Medicine 146(6):450-453. Lanctot, K. L., and C. A. Naranjo. 1995. Comparison of the Bayesian approach and a simple algorithm for assessment of adverse drug events. Clinical Pharmacology & Therapeutics 58(6):692-698. Lau, H. S., A. de Boer, K. S. Beuning, and A. Porsius. 1997. Validation of pharmacy records in drug exposure assessment. Journal of Clinical Epidemiology 50(5):619-625. Laughren, T. P. 2006. Overview for December 13 meeting of psychopharmacologic drugs advisory committee (PDAC). Law, M. R., Y. Kawasumi, and S. G. Morgan. 2011. Despite law, fewer than one in eight com- pleted studies of drugs and biologics are reported on time on ClinicalTrials.gov. Health Affairs 30(12):2338-2345. Lee, K., P. Bacchetti, and I. Sim. 2008. Publication of clinical trials supporting successful new drug applications: A literature analysis. PLoS Medicine 5(9):e191. Lesaffre, E. 2008. Superiority, equivalence, and non-inferiority trials. Bulletin of the NYU Hospital for Joint Diseases 66(2):150-154. Levenson, M., and C. Holland. 2006. Slide presentation: Antidepressants and suicidality in adults: Sta- tistical evaluation. http://www.fda.gov/ohrms/dockets/ac/06/slides/2006-4272s1-04-FDA_files/ frame.htm (accessed April 6, 2012). Lilford, R. J., M. A. Mohammed, D. Braunholtz, and T. P. Hofer. 2003. The measurement of active errors: Methodological issues. Quality and Safety in Health Care 12(Suppl 2):ii8-ii12. Lin, D. Y., and D. Zeng. 2010. Meta-analysis of genome-wide association studies: No efficiency gain in using individual participant data. Genetic Epidemiology 34(1):60-66. Madigan, D., P. Ryan, S. E. Simpson, and I. Zorych. 2010. Bayesian methods in pharmacovigilance. In Bayesian statistics 9, edited by J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West. Oxford, UK: Oxford University Press. Manion, F., R. Robbins, W. Weems, and R. Crowley. 2009. Security and privacy requirements for a multi-institutional cancer research data grid: An interview-based study. BMC Medical Informat- ics and Decision Making 9(1):31. Marciniak, T. A. 2010. Memorandum from Thomas Marciniak to Jena Weber (dated June 14, 2010) regarding cardiovascular events in RECORD, NDA 21-071/s-035. In FDA Briefing Document Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone maleate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health and Human Services. McEvoy, B., R. R. Nandy, and R. C. Tiwari. 2012. Applications of Bayesian model selection criteria for clinical safety data (abstract, ASA joint statistical meetings). http://www.amstat.org/meetings/ jsm/2012/onlineprogram/AbstractDetails.cfm?abstractid=305627 (accessed April 5, 2012). Melander, H., J. Ahlqvist-Rastad, G. Meijer, and B. Beermann. 2003. Evidence b(i)ased medicine— selective reporting from studies sponsored by pharmaceutical industry: Review of studies in new drug applications. BMJ 326(7400):1171-1173. Miller, J. D. 2010. Registering clinical trial results: The next step. JAMA 303(8):773-774. Misbin, R. I. 2007. Lessons from the Avandia controversy: A new paradigm for the development of drugs to treat type 2 diabetes. Diabetes Care 30(12):3141-3144. Nissen, S. E., and K. Wolski. 2007. Effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular causes. New England Journal of Medicine 356(24):2457-2471.

OCR for page 121
167 EVIDENCE AND DECISION-MAKING NRC (National Research Council). 2010. The prevention and treatment of missing data in clinical trials. Panel on handling missing data in clinical trials. Washington, DC: The National Acad- emies Press. Owens, D. K., K. N. Lohr, D. Atkins, J. R. Treadwell, J. T. Reston, E. B. Bass, S. Chang, and M. Helfand. 2010. AHRQ series paper 5: Grading the strength of a body of evidence when com - paring medical interventions-Agency for Healthcare Research and Quality and the Effective Health-care Program. Journal of Clinical Epidemiology 63(5):513-523. Oxford Dictionaries. 2011. Oxford English Dictionary online. Oxford University Press. Parks, M. H. 2010. Memorandum from Mary Parks to Curtis Rosebraugh (dated August 19, 2010). Re: Recommendations on marketing status of Avandia (rosiglitazone maleate) and the required post-marketing trial, Thiazolidinedione Intervention and Vitamin D Evaluation (TIDE) follow - ing the July 13 and 14, 2010 Public Advisory Committee Meeting. Silver Spring, MD: US Food and Drug Administration. Parmigiani, G. 2002. Modeling in medical decision making: A Bayesian approach (statistics in practice). New York: Wiley. Peng, R. D., F. Dominici, and S. L. Zeger. 2006. Reproducible epidemiologic research. American Journal of Epidemiology 163(9):783-789. PMA (Cochrane Prospective Meta-Analysis Methods Group). 2010. Welcome: The prospective meta- analysis methods group. http://pma.cochrane.org/ (accessed December 12, 2011). Psaty, B. M., and D. S. Siscovick. 2010. Minimizing bias due to confounding by indication in com - parative effectiveness research: The importance of restriction. JAMA 304(8):897-898. Psaty, B. M., R. Boineau, L. H. Kuller, and R. V. Luepker. 1999. The potential costs of upcoding for heart failure in the United States. The American Journal of Cardiology 84(1):108-109. Psaty, B. M., C. J. O’Donnell, V. Gudnason, K. L. Lunetta, A. R. Folsom, J. I. Rotter, A. G. Uitterlinden, T. B. Harris, J. C. M. Witteman, E. Boerwinkle, and (on behalf of the CHARGE Consortium). 2009. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. Circulation: Cardiovascular Genetics 2(1):73-80. Reade, M., A. Delaney, M. Bailey, D. Harrison, D. Yealy, P. Jones, K. Rowan, R. Bellomo, and D. Angus. 2010. Prospective meta-analysis using individual patient data in intensive care medicine. Intensive Care Medicine 36(1):11-21. Royall, R. M. 1997. Statistical evidence: A likelihood paradigm. London, UK: Chapman & Hall. Saunders, K., K. Dunn, J. Merrill, M. Sullivan, C. Weisner, J. Braden, B. Psaty, and M. Von Korff. 2010. Relationship of opioid use and dosage levels to fractures in older chronic pain patients. Journal of General Internal Medicine 25(4):310-315. Staffa, J. A., J. Chang, and L. Green. 2002. Cerivastatin and reports of fatal rhabdomyolysis. New England Journal of Medicine 346(7):539-540. Talbot, J. C. C., and P. Walker. 2004. Stephens’ detection of new adverse drug reactions. 5th ed. West Sussex, England: John Wiley & Sons Ltd. Temple, R., and S. S. Ellenberg. 2000. Placebo-controlled trials and active-control trials in the evaluation of new treatments. Part 1: Ethical and scientific issues. Annals of Internal Medicine 133(6):455-463. Ten Have, T. R., S. L. Normand, S. M. Marcus, C. H. Brown, P. Lavori, and N. Duan. 2008. Intent-to- treat vs. non-intent-to-treat analyses under treatment non-adherence in mental health randomized trials. Psychiatrics Annals 38(12):772-783. Thomas, E. J., and L. A. Petersen. 2003. Measuring errors and adverse events in health care. Journal of General Internal Medicine 18(1):61-67. Thompson, S. G., and S. J. Sharp. 1999. Explaining heterogeneity in meta-analysis: A comparison of methods. Statistics in Medicine 18:2693-2708. Toh, S., and M. A. Hernán. 2008. Causal inference from longitudinal studies with baseline randomiza - tion. International Journal of Biostatistics 4(1):Article22.

OCR for page 121
168 STUDYING THE SAFETY OF APPROVED DRUGS Turner, E. H., A. M. Matthews, E. Linardatos, R. A. Tell, and R. Rosenthal. 2008. Selective publica - tion of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine 358(3):252-260. Unger, E. 2010. Memorandum to the file regarding NDA: 21-071; suppl 35, 36, 37 Avandia (rosi - glitazone). In FDA Briefing Document: Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone maleate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health and Human Services. Vandenbroucke, J. P. 2006. What is the best evidence for determining harms of medical treatment? Canadian Medical Association Journal 174(5):645-646. Vandenbroucke, J. P., and B. M. Psaty. 2008. Benefits and risks of drug treatments: How to combine the best evidence on benefits with the best data about adverse effects. JAMA 300(20):2417-2419. Vedula, S. S., L. Bero, R. W. Scherer, and K. Dickersin. 2009. Outcome reporting in industry-sponsored trials of gabapentin for off-label use. New England Journal of Medicine 361(20):1963-1971. Weiss, N. S., T. D. Koepsell, and B. M. Psaty. 2008. Generalizability of the results of randomized trials. Archives of Internal Medicine 168(2):133-135. Wood, A. J. J. 2009. Progress and deficiencies in the registration of clinical trials. New England Journal of Medicine 360(8):824-830. Yap, J. S. 2010. Statistical review and evaluation: Clinical studies NDA 21-071/35 and 21-073. In FDA Briefing Document Advisory Committee Meeting for NDA 21071: Avandia (rosiglitazone male - ate) tablet: July 13 and 14, 2010. Washington, DC: Department of Health and Human Services.