ESTABLISHING CAUSAL RELATIONSHIPS IN MEDICINE FOR PUBLIC HEALTH POLICY
Statistician Joel Greenhouse of Carnegie Mellon University presented on how scientific evidence is best marshaled for public health policy decisions. The issue of what types of evidence are needed to establish a cause-and-effect relationship between antidepressants and suicidality faces the same steep hurdles as any other health policy issue. One overarching point is that regardless of the issue at hand, no single study is sufficient. The evidence must be synthesized and interpreted as a whole. To determine if there is a causal relationship, the landmark criteria often used as a guide are those developed by Hill (1965). These criteria remain in force today and are used routinely by numerous committees of the National Academy of Sciences when asked by Congress or any other public policy organization to evaluate the body of evidence to answer a public policy dilemma (Box 3-1). In the case of the meta-analysis supporting the Food and Drug Administration’s (FDA’s) black box warning, the foremost issues are related to the benefits and limitations of a meta-analysis. Several speakers in this session addressed the limitations. They focused primarily on the limitations of meta-analysis as a study design and the multiple alternative explanations for the apparent findings.
Evidence for Causal Relationships
SOURCE: Hill, 1965.
BENEFITS AND LIMITATIONS OF META-ANALYSES
The evidence presented by the FDA to support a black box warning for the use of antidepressants in children and adolescents has several limitations, Greenhouse said. In its meta-analysis, the FDA relied on 24 randomized placebo-controlled trials with roughly 4,600 children and adolescents. None of the individual trials found completed suicides. The trials were designed to determine the efficacy of drugs for treating various disorders such as depression, anxiety, and attention deficit hyperactivity disorder. Greenhouse pointed out that the study populations consisted of young people with different disorders. The efficacy endpoints were different, as were the antidepressant medications and classes of medications, he said. The outcome of greatest interest to the FDA was suicidality, which in this case included suicide behavior and ideation. There were 87 cases reporting suicide behavior or ideation, assessed retrospectively. The graphic depiction used in the FDA’s meta-analysis, the forest plot, revealed that only one study of antidepressants and suicidality found a significant relationship, the TADS (Treatment of Adolescents with Depression Study) (Figure 3-1). Nevertheless, the overall result of the meta-analysis was an odds ratio of about 2 (OR = 2.0, 95% CI 1.3–3.1) using a fixed-effect model. A random effects model generated a similar result.
The strength of using a meta-analysis is that it provides the means to pool data for study of a rare event. A randomized controlled trial (RCT) could never alone have enough subjects under study because of the rarity of suicidality, according to Greenhouse. Meta-analyses also provided an opportunity to look at heterogeneity across studies using study-level variables. That gave the FDA the opportunity to look at the relationship
between the risk of suicidality and age, which revealed itself to be an important finding for public health. Greenhouse reported being surprised that there was little heterogeneity across the different diagnostic categories and different classes of drugs. On the other hand, the use of meta-analyses also has limitations. By definition, meta-analyses are observational studies, a weaker type of study design, even if the individual studies being looked at are RCTs. Observational studies invite alternative explanations to explain their findings. Alternative explanations abound in the FDA data, according to presentations by Greenhouse and Robert Gibbons of the University of Illinois at Chicago.
SOURCES OF BIAS IN ANTIDEPRESSANT CLINICAL TRIALS
A significant portion of the workshop covered methodological limitations stemming from bias. Several sources of bias were identified by a few speakers as potentially limiting the results of individual RCTs or their pooling through meta-analysis: selection bias (or selection effect), regression toward the mean, natural course of the disease, and confounding by indication. These sources of bias can compromise both the external and internal validity of clinical studies, according to presentations by Greenhouse, Gibbons, Marc Stone (FDA), and Robert Valuck (University of Colorado at Denver).
Several types of selection bias may have threatened the validity of the results of antidepressant RCTs. Greenhouse agreed with an earlier suggestion from Kelly Posner, that one form of selection bias, ascertainment bias, could have occurred among the groups receiving antidepressants as opposed to the placebo groups. Posner said that antidepressant recipients may have been more likely than placebo patients to report suicidality because they were prompted to report any adverse effects.
Another source of bias, sampling bias, jeopardizes the generalizability or external validity of findings. This could have occurred because most RCT studies used for the meta-analysis had strict exclusion criteria specifying that subjects at high risk of suicide be prohibited from participating. Strict exclusion criteria usually leave the RCT population “healthier,” that is, with less severe forms of illness. Said another way, the RCT population is healthier than other patients in the general population being treated with antidepressants and hence not representative of the population of interest.
To underscore problems with generalizability, Greenhouse compared the RCT population of adolescents to the population in a nationally representative database used in the Centers for Disease Control and Prevention’s long-established “Youth Risk Behavior Surveillance System.” He found that rates of suicidality were significantly higher in the nationally representative survey of youth than in the RCTs (Bridge et al., 2008; Greenhouse et al., 2008). Using statistical modeling techniques, another study of the selection effect in antidepressant trials found that more restrictive inclusion/exclusion criteria of an RCT generates greater
potential for inflation of the relative risk (Weisberg et al., 2009). The authors concluded that narrow study eligibility for cautionary reasons might, in the long run, harm “exactly those people whom the study is designed to help.”
Building on that argument, Gibbons marshaled an analogous example of studying suicidality in thousands of bipolar patients treated with antiepileptic drugs as opposed to antidepressants. The suicide rate in the placebo group was lower than that in untreated bipolar patients in the general population, suggesting that placebo patients were healthier (Gibbons et al., 2009). That incurs higher likelihood of erroneously elevating the relative risk, assuming that the experimental group excludes patients more likely to be helped by the medications. Gibbons concluded that lack of generalizability presents the largest threat to RCT validity.
Regression Toward the Mean
Gibbons raised similar limitations of meta-analyses and other types of observational studies. He pointed out that meta-analysis is an observational study of other studies. In general, analysis of observational studies is fraught with difficulty. He also stressed the possibility of alternative explanations (i.e., confounders) of an apparent association between suicidality and antidepressants. One confounder was regression toward the mean, that is, the statistical term referring to the long-standing observation that if we initiate our observational period at a time of high risk, there is a natural tendency for the risk to decrease over time from the index episode (e.g., diagnosis of depression). Applying that alternative explanation here suggests that decreases in the risk of suicidality following initiation of treatment may at least in part be due to the natural decline in rate of suicidality over time and not a protective effect of the medication. To examine this possibility, he presented an illustrative example of a person-time logistic regression analysis of the relationship between antiepileptic drugs and suicide attempts in patients with bipolar illness. This permitted a comparison of suicide attempt rates in treated and untreated patients, adjusting for the natural decay in suicide rates over time from the index episode (Gibbons et al., 2009).
Natural Course of Illness
The natural course of the illness can also bias study findings, noted Gibbons. A suggestion is that the highest rates of suicide attempts happen before people initiate treatment (Gibbons and Mann, 2009). Thereafter, suicide attempts decrease exponentially with time. In longitudinal studies, for example, data may appear to show a protective effect of a medication, but instead the effect is an artifact of the natural course of the illness, that is, the effect would have occurred without any intervention.
Confounding by Indication
Confounding by indication refers to the bias introduced when the risk factor (e.g., antidepressant treatment) and the outcome (e.g., suicide) are both related to a third variable (e.g., depression) and therefore appear to be directly related. Depressed patients have increased risk of suicide and are more likely to take antidepressants as well, leading to the appearance that antidepressant use increases the risk of suicide. In this example, the relationship between antidepressants and suicide is confounded by the indication of depression. Similarly, in an observational study of depressed patients only, those patients who receive pharmacological treatment are generally sicker than those who do not receive such treatment. Therefore, they also would be expected to have a higher risk of suicide, if the antidepressant conveyed no protective effect. This is another example of confounding by indication, where the indication for treatment is increased severity, which is also directly related to the outcome—in this case, suicide.
Gibbons proceeded to discuss the pros and cons of a range of study designs (Box 3-2) that could be used to enhance understanding of the relationship between antidepressants and suicidality. He stressed that no current study design is ideal. The FDA’s Adverse Event Reporting System, or AERS (events occurring after a drug’s release into the market), is useful, but it is far from ideal because of heightened media reporting and the lack of a denominator (AERS only gives how many individuals receiving the drug report adverse events, not the total number of people receiving the drug). The design that most appealed to him was analysis of medical claims data, largely because of their huge sample size. The major detriments of this design are that while the data list prescriptions, they do not ensure that the patient actually took the drug.
Moreover, this design does not ensure that diagnoses are reliable or valid, among other problems.
Other Problems with Meta-Analyses
Marc Stone of the FDA offered his perspective in a presentation on meta-analysis and its broader problems when applied to rare events. He began by defining “suicidality” in a comprehensive way to include any suicide-related phenomena of interest. His first concern was with patient withdrawals from RCTs, because withdrawals related to treatment assignment have a high probability of confounding results. He asked the generic question of whether propensity to withdraw stemmed from susceptibility to drug-related adverse events (or lack of therapeutic effect in placebo subject), drug effect on tolerability of adverse events, and/or the
effect of drug on willingness to adhere to the protocol in the face of personality, lifestyle, and life events. All of these propensities can bias study outcome.
Non-random acts of withdrawal may be a prodrome for suicidality, he noted. It is also conceivable that they may be signs of remission with good prognosis. All of the use issues affect adherence to the protocol and desire to withdraw, which can differentially affect the drug and placebo groups. His overall point was that non-random acts of withdrawal from RCTs, affected by multiple factors, might reduce or negate the benefits of randomization. He was also concerned about whether the drug effect is constant over time. The length of the study is crucially important. If incidence rates vary over time, there can be significant effects on modeling and regression lines. He demonstrated how choice of statistical model could result in different estimates of comparative effect between drug and placebo groups. His bottom line was that both withdrawals from the RCT and statistical models affect meta-analysis findings, and the impact may be greater given the rarity of completed suicide.
During the workshop’s discussion period devoted to methodological limitations of RCTs and their use in meta-analysis, FDA’s Tom Laughren acknowledged that every method of study has its flaws. Still, he noted that an academic team using a case-control study produced results that were very similar to those of the FDA’s, finding an association between antidepressants and suicidality in the pediatric population but none in the adult population (Olfson et al., 2006). Laughren asserted that he had not “heard anything that convinces me, with as much admitted weakness as there is in the data we have … [that we] reached the wrong conclusion.” He also pointed out that a black box warning is “simply intended to alert clinicians to a potential risk that they need to pay attention to.”
Additional studies and forms of analysis were also highlighted at the meeting that may offer equally important, albeit different, inferences. For example, a study of depressed adolescents receiving psychotherapy—as opposed to pharmacotherapy—showed risks of suicidality similar to that in the FDA trials (Bridge et al., 2005). This finding suggests that the suicidality in FDA trials is a treatment effect, not specifically a drug effect. Another study found that the benefits of antidepressants outweighed their risks (Bridge et al., 2007). However, several participants described the limitations of meta-analyses and examined potential opportunities in using other forms of analysis. Potter pointed out that this workshop would have been unnecessary had there been better methodologies available, especially prospective observational studies. However, Stone drew a distinction between the kinds of inferences that can be drawn from a clinical trial versus
observational data from a large dataset. The former allows inferences about causality, the latter about associations that may or may not be causal.
Robert Valuck offered a different approach to study the effects of antidepressants, using epidemiology rather than RCTs. He described a newly linked network of health datasets consisting of nearly 500,000 patients (Pace et al., 2009). That large patient base readily lends itself to many types of observational epidemiological studies. Although such studies are almost universally deemed to be of lesser validity than RCTs, observational studies may be better suited to studying rare clinical outcomes in real-world settings, where patients are more ill and more complicated.
The foremost benefit of observational studies is the sample size and thus generalizability. Other benefits are their greater statistical power for studying rare outcomes and their capacity to compare the effectiveness of different treatments (considering that the FDA only requires treatments to be tested against placebos). The dataset also holds the potential for being undertaken as prospective cohort studies designed to measure and confirm outcome measures of interest. But these studies are not without disadvantages. Their pros and cons are summarized in Box 3-3.
SOURCE: Valuck, 2009.
Tapping into the dataset, Valuck described the Distributed Ambulatory Research in Therapeutics Network (DARTNet). DARTNet is a federally supported network of electronic health data created to promote comparative effectiveness research. The database already covers 500,000 individuals (without identifiers). Because of the nature of data acquisition, and lack of possibility of random allocation, it has limitations. But the dataset is highly useful because its patients represent the “real world” of treatment, rather than the exacting standards of treatment necessary for randomized controlled trials. It includes primary care, where, Valuck noted, 50 percent of depression care is rendered.
The database has not yet been used to study suicidality, but it does offer potential for such study, especially for conduct of prospective cohort studies. It also facilitates the conduct of retrospective and case-control studies. Within each of its more than 500 sites of clinical practice, it captures a broad mix of patient-level information (e.g., vital signs, social history, family history) from electronic health records, laboratory tests, imaging results, pharmacy use databases, and billing systems (e.g., Medicaid and Department of Veterans Affairs). The data can be used to determine if patients actually fill prescriptions, for example. The system does have drawbacks: lack of severity data, unmeasured covariates, and unvalidated outcomes, among others.
Valuck and colleagues, drawing from another dataset, conducted a large nested case-control study of suicide attempts using claims data from managed care organizations (Valuck et al., 2009). Although claims data are not ideal, the study examined 10,500 suicide attempters over the period 1999 to 2006 against nearly 42,000 controls. After controlling for confounders related to depression severity, antidepressants were shown to protect against a suicide attempt, while antidepressant discontinuation was a significant risk factor for having a suicide attempt. Nevertheless, the study did show that the highest risk of a suicide attempt was indeed associated with the initiation of treatment, which is a finding consistent with other studies.