D
Questions and Methods in Surveillance Programs

Stephen Lagakos, Ph.D.* and William DuMouchel, Ph.D.*

This appendix reviews some technical aspects of the strategies that can be used after a medical device is marketed to assess its safety and efficacy or effectiveness. It begins by looking at various measures of interest in medical device surveillance, for example, the probability of an adverse event or the relative risk of experiencing an adverse event under different circumstances. It then discusses how these probabilities or other measures would be estimated with ideal information and how the actual data available to evaluators departs from the ideal. Particular attention is focused on analytic methods based only on numerator data (event counts) because these are applicable to many kinds of surveillance data. The last sections discuss different strategies, including disproportionality analysis, for evaluating device safety.

CHARACTERISTICS OF INTEREST

Probabilities of Outcome Events

An analysis of adverse device events should distinguish between events that are clearly identifiable as being due to a device (e.g., fracture of a cochlear implant) and those that can occur in individuals with the same disease or condition whether or not they have the device (e.g., development of meningitis in someone with a cochlear implant). Ideally, an analysis of

*  

Members, Institute of Medicine Committee on Postmarket Surveillance of Pediatric Medical Devices.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 327
Safe Medical Devices for Children D Questions and Methods in Surveillance Programs Stephen Lagakos, Ph.D.* and William DuMouchel, Ph.D.* This appendix reviews some technical aspects of the strategies that can be used after a medical device is marketed to assess its safety and efficacy or effectiveness. It begins by looking at various measures of interest in medical device surveillance, for example, the probability of an adverse event or the relative risk of experiencing an adverse event under different circumstances. It then discusses how these probabilities or other measures would be estimated with ideal information and how the actual data available to evaluators departs from the ideal. Particular attention is focused on analytic methods based only on numerator data (event counts) because these are applicable to many kinds of surveillance data. The last sections discuss different strategies, including disproportionality analysis, for evaluating device safety. CHARACTERISTICS OF INTEREST Probabilities of Outcome Events An analysis of adverse device events should distinguish between events that are clearly identifiable as being due to a device (e.g., fracture of a cochlear implant) and those that can occur in individuals with the same disease or condition whether or not they have the device (e.g., development of meningitis in someone with a cochlear implant). Ideally, an analysis of *   Members, Institute of Medicine Committee on Postmarket Surveillance of Pediatric Medical Devices.

OCR for page 327
Safe Medical Devices for Children adverse event reports would also allow one to identify characteristics of a device user (e.g., gender, age, or disease severity) that might affect the probability of an adverse event.1 Quantitatively, for adverse events (AE) that are clearly identifiable as being due to a device, the two probabilities (P) of interest (where Z is an individual characteristic) are P[AE | device], and P[AE | device, Z]. In contrast, for adverse events that could be due to the device or other causes (e.g., stroke in persons with aortic valve replacement; cf Ionescu et al., 2003), interest would more naturally focus on probabilities indicating the difference in risk for those with the device and those without, that is, P[AE | device] versus P[AE | no device], and P[AE | device, Z] versus P[AE | no device, Z]. As described below, different types of data structures would give rise to different strategies for comparing risks between individuals who do and do not have the device. A full evaluation of the possible harms posed by a device should put harms in context. Thus, it should include information about the comparative probabilities of desired outcomes or benefit (e.g., reduced mortality, improved hearing, or cessation of tremor). Associations Between Device Use and Outcomes While the above probabilities represent the quantities we ideally would like to have to evaluate the safety of a device, the statistical association between use of a device and the probability of an adverse event or other outcome can be quantified as a function of these probabilities. One frequently used statistic, the odds ratio (OR), is a measure of the association between a device and an outcome, defined by Thus, an odds ratio of 1 corresponds to no association between use of the device and the probability of an adverse event, while an odds ratio greater 1   In practice, it is often also of interest to know the time from use of the device until the occurrence of an event, in which case the probabilities above would be replaced by the distribution function for the time until the event.

OCR for page 327
Safe Medical Devices for Children than (less than) 1 indicates that use of the device is associated with a higher (lower) probability of the event. As we see below, some types of studies enable the estimation of an odds ratio without being able to estimate the individual probabilities that comprise it. When the probability of an adverse event is small, say less than 10 percent, the odds ratio is numerically similar to another statistic, relative risk (RR). This statistic is defined by As with the odds ratio, a relative risk ratio of 1 corresponds to no association between use of the device and the risk of an adverse event. A relative risk greater than (less than) one corresponds to an increased (decreased) risk for those with a device. Thus, studies that permit estimation of the odds ratio can also estimate the relative risk when event rates are small. In some situations, as described below, another assumption can allow relative risk to be estimated without estimates of individual probabilities. Ideally, each of the quantities described above could be estimated using appropriate numerator and denominator data. Depending on the probability of interest, the denominator might be the number of individuals with a medical condition and the numerator, the number of those in the group that had experienced an adverse device event. For other probabilities, the denominator might be those in the group who had a particular characteristic (e.g., being male) and the numerator, the number in that group who had experienced the adverse event. Obtaining Numerator and Denominator Data A variety of data collection strategies exist to obtain numerator and denominator data. One way of describing data capture mechanisms is based on whether they are “active” or “passive.” Generally speaking, an active data collection system is based on an identified cohort of individuals (e.g., all children who did or did not receive a device at a particular hospital during a defined period of time) that is prospectively followed for the occurrence of the event of interest. Examples of such active strategies include clinical trials and prospective observational studies. Individuals are commonly evaluated for occurrence of the event periodically (e.g., proper functioning of the device) or a system for real-time reporting of adverse events exists. When well designed and implemented, these kinds of studies allow identification of both numerators and denominators of interest (although some individuals may be lost to follow-up). In contrast, with passive data collection, a system allows events to be reported, but little or no effort is made to ensure that all events of interest are

OCR for page 327
Safe Medical Devices for Children reported. Usually, passive systems do not provide for reporting of denominator data (e.g., number of people at risk of a particular kind of adverse event). For example, the U.S. Food and Drug Administration (FDA) requires health care facilities to report certain adverse device events to device manufacturers (and sometimes to the FDA as well), but no agency procedures, incentives, or other mechanisms exist to ensure that such reporting occurs. Even with perfect reporting, that system would not include denominator data, although such data might be collected or estimated in certain cases (e.g., if a manufacturer could provide reasonably complete information on patients implanted with a particular kind of device or if data from other sources allows reasonable estimates of the number of people with the relevant medical condition). Such additional information can change the perspective on adverse event reports. In 2003, based on a number of adverse event reports, FDA issued a notice advising physicians that patients who had had a certain kind of drugeluting stent implanted might experience a higher incidence of subacute thromboses and hypersensitivity reactions than those with bare metal stents (FDA, 2003). A year later, based on follow-up data from a required manufacturer postmarket registry and review of reports involving bare-metal stents, the FDA issued an update that concluded that the stent was not associated with a higher rate of subacute thromboses or hypersensitivity reactions (FDA, 2004). It also concluded that the reported rate of thromboses from the registry was not greater than the rate during the premarket clinical trials and that it was within the expected rate for any stent. In some special instances, specific designs may be set into place for capturing certain events. For example, if there is a registry of all individuals that are diagnosed with a medical condition, what is called a case-control design (see below) can be used to match each individual reported to have an adverse event with a control. Then, data on characteristics (including whether a device was used) for both groups can be retrospectively evaluated from the registry to determine whether having the device is associated with the risk of the event. Alternatively, a case-cohort design could be employed in which characteristics of all individuals who have an event as well as characteristics of a random subset of those from the entire population are determined. The specific design will determine which aspects of the risk can be estimated. In a case-control design, the odds ratio can be estimated even though the individual probability of a device user (or person without a device) having an adverse event cannot be estimated. Sources of Bias The ultimate value of any surveillance system will depend critically on the bias and precision of the estimates it provides of the probabilities dis-

OCR for page 327
Safe Medical Devices for Children cussed above. The underreporting of events (numerator data) leads to underestimates of absolute risks and often to distorted comparisons, especially when the extent of underreporting varies depending on individual characteristics, for example, source of medical treatment. Bias may also occur when events that are not really related to a medical device are reported as adverse device events. For example, an incident may be reported as an adverse event for a particular device when the event was related to the patient’s medical condition. This kind of “false positive” can result in overestimates of absolute risks and, again, distorted comparisons of groups. Such bias can be avoided by confirmation procedures for reported events. However, such confirmation can be difficult if the events are not being reported and confirmed in real time or, at least, reasonably close in time. Biased estimates of risk can also arise from under- or overestimation of those potentially at risk of an event (denominator data). This is primarily a concern with passive data collection systems. Sometimes, however, information is being actively collected on people with a medical device but not on people without a device, so that both the number and characteristics of this group are unknown. SPECIFIC STUDY DESIGNS AND INFORMATION RESOURCES This section discusses several specific strategies for evaluating the safety and effectiveness of medical devices. Most of these strategies are employed to evaluate a device for purposes of securing FDA approval or clearance for marketing. They can also be used for surveillance (even if the purpose is not so described), and the discussion below cites examples of such use. The discussion begins with the randomized trial, which is widely regarded as the “gold standard” for evaluating clinical interventions, and then considers the others in general order of their perceived methodological soundness. For each design, we discuss (1) biases with respect to identification of target population, (2) biases from comparisons of individuals with and without device, (3) biases with respect to complete and accurate ascertainment of events, (4) biases with respect to duration of follow-up, (5) precision, (6) available measures of efficacy and safety, and (7) feasibility. Useful information can be obtained in many ways, and for any study, potential sources of biases related both to the study design and its execution in practice should be considered. If FDA approval of medical devices were dependent on the types of trials typically offered in support of FDA approval of drugs, the hurdles to approval would be very high for both practical and ethical reasons.

OCR for page 327
Safe Medical Devices for Children Randomized Controlled Trials and Other Interventional Studies Randomized comparative clinical trials, typically called Phase III trials in the world of pharmaceutical studies, are considered to be the most reliable scientific design for evaluating the safety and efficacy of an intervention. Ideally, a random sample of subjects is selected from a well-defined target population, and then subjects are randomly assigned to one of several interventions. Subjects are followed for the development of one or more desired outcomes (benefits) and for unwanted side effects (harms or adverse events). The key feature that sets randomized trials apart from other studies is the randomization, which is intended to eliminate bias in the selection of study subjects in the intervention and control groups. By chance, however, groups based on random assignment can differ in important ways. Therefore, researchers typically compare the groups to see whether they differ in significant ways on potentially relevant characteristics. With appropriate planning, the size of the trial can be determined to yield the desired precision for estimating event probabilities and comparisons of intervention arms. As with any study, problems can arise in randomized trials that affect the validity or interpretability of the results. One is that the participants may have been properly randomized to control and intervention groups, but the participants themselves do not represent the target population. This could occur, for example, if the source of participants is not typical of the population, or if a substantial proportion of eligible subjects decline to participate. This would not affect the internal validity of the trial because patients that do participate are randomized. However, it could affect the degree to which the results can be generalized to the target population. Biases can arise in randomized trials in several ways. Chief among these are ascertainment bias, bias due to nonadherence, and missing data/losses to follow-up. Ascertainment bias refers to differential criteria being applied in the evaluation of outcome variables, usually due to the knowledge of a subject’s treatment group by the individual who evaluates the subject. This can often be prevented by either conducting a double-blind trial (in which neither the evaluator nor the subject is aware of the treatment arm), or by blinding the evaluation process. Nonadherence refers to subjects not receiving the interventions to which they are assigned. This can sometimes be a problem in trials that evaluate a drug that may be difficult to tolerate. In device trials, the main concern is that subjects receive the intervention (e.g., device or no device) to which they were assigned. Follow-up and other types of missing data can diminish the power of a clinical trial and, more importantly, introduce bias in the comparisons of interventions. The latter can arise, for example, if subjects at greater (or lesser) risk of imminent failure leave the trial before their failure is observed. Yet another limitation of randomized clinical trials in some settings is there

OCR for page 327
Safe Medical Devices for Children might be ethical concerns about use of an untreated control group when there is some promising information available about the efficacy of a device. Most randomized clinical trials allocate subjects among the intervention arms in a predetermined frequency, usually in equal proportions. However, allocation of subjects can also be done in an adaptive way, in which the allocation of future subjects depends on the outcomes of previously allocated subjects. One such approach is called randomized play-the-winner, in which assignment to one treatment or the other is randomized, but influenced by all the previous patients in the study. Statistical significance is reached when the difference between the numbers of patients randomized to the two groups becomes sufficiently large that it cannot be readily explained by chance. This feature of tending to allocate more patients to the superior treatment has advantages on ethical grounds. In the 1980s, such a design was used to assess the efficacy of therapy involving extracorporeal membrane oxygenation (ECMO, also called extracorporeal life support or ECLS), a form of cardiopulmonary life support that involves an array of medical devices (e.g., vascular access catheters, blood pump, tubing, an artificial lung [often called an oxygenator]). The first child was randomly assigned (with 1 chance out of 2 of ECMO assignment) to ECMO therapy and survived; the next was randomly assigned (with 2 chances out of 3 of ECMO assignment) to conventional treatment and died. With the randomized scheme now more strongly weighted toward the option with the better result (with 3 chances out of 4 of ECMO assignment), the next patient received ECMO and survived, after which the next 10 patients in a row were assigned to ECMO. When all survived, the investigators were able to conclude on statistical grounds that ECMO was significantly superior to conventional treatment. Nonetheless, the unusual design led many in the medical community to question the results. At least two additional trials with more conventional designs were conducted in response to such skepticism.2 Both 2   Investigators in the next trial believed that the accumulating clinical results with ECMO were so superior to conventional treatment that a traditional randomized trial would be unethical (see account by Truog, 1999). They used a different adaptive randomization design that randomized patients evenly (50–50) to ECMO or conventional treatment until one option accumulated four deaths, after which patients were assigned to the more successful arm until the difference between groups reached statistical significance. After ten patients had been assigned to conventional therapy, four had died; the first nine patients assigned to ECMO survived. The next 20 patients received ECMO with one death. As observed by Truog, “In retrospect, four patients died who might have survived if they had been offered ECMO. Nevertheless, this was a smaller number than would have died if the trial had been designed with traditional 50/50 randomization. To this extent, adaptive randomization was successful in both demonstrating the superiority of ECMO and in reducing the total number of deaths” (Truog, 1999). Subsequently, investigators in the United Kingdom, who believed that a traditional trial was required, conducted such a trial until a monitoring body stopped the trial after 54 of 92 infants in the conventional treatment arm died compared to 30 of 93 in the ECMO arm (UK Collaborative ECMO Group, 1996).

OCR for page 327
Safe Medical Devices for Children supported the findings of the first trial. Because adaptive designs have the potential of assigning more subjects to the better intervention arm, they provide an attractive alternative in some settings. There are many other variants of study design for randomized controlled trials, most requiring modifications in statistical analyses. In some studies, for example, individuals in randomly assigned intervention and placebo groups “cross over” at some point so that the intervention group is switched to the placebo and vice versa. One feature of such a design is that study subjects can, in essence, serve as their own controls (assuming that the effects of the intervention stop when the intervention stops). Study subjects can also serve as their own controls in other ways. For example, a device might be studied in one eye with the other eye serving as a control (assuming baseline comparability of the eyes and no “spillover” effect of treatment). For some electrical stimulation devices, an individual’s response is studied with the device on and with the device off. Other studies involve comparisons of an individual’s status following treatment to her/his status prior to receiving the intervention, which can create problems if a steady state of the condition—absent the intervention—cannot be assumed. Additional sources of bias can arise from a placebo effect and from incomplete data due to patient drop-outs. A still weaker design involves comparisons with historical controls. For example, investigators may recruit individuals for an experimental intervention and then compare outcomes for this group with previously published outcome data on individuals who received alternative therapy. The choice of historical or retrospective controls presents various opportunities for bias as described below. Prospective and Retrospective Cohort Studies Prospective cohort studies may have as their goal the comparative assessment of two different interventions when the choice of intervention (e.g., method of contraception) belongs to the study subjects rather than the investigators. Subjects from a defined population, as well as the intervention they choose to take, are identified and followed over time to assess safety and efficacy. In this sense, prospective observational studies are similar to randomized trials. They thus share some of the benefits of trials, especially with respect to their prospective nature and can suffer from the problems caused by nonadherence and missing data/losses to follow-up. The most important difference between a randomized trial and a prospective, comparative observational study is that interventions are determined at random in the former. When the intervention is determined by the subject or their caregiver, the opportunity for selection bias arises, and there have been many examples where this has led to misleading conclu-

OCR for page 327
Safe Medical Devices for Children sions. A likely recent example is the Women’s Health Initiative, which demonstrated an increased risk of certain cardiovascular outcomes among women receiving estrogen therapy, when previous observational studies showed no evidence of increased risk, and in some cases a protective effect (Manson et al., 2003). One potential advantage of a prospective, comparative cohort study is that it may not pose the ethical concerns about investigator assignment of study subjects that arise with some randomized trials. Another advantage is that enrollment of subjects or participants can be easier for prospective observational studies. As a result, they can be much larger and therefore have greater power to detect small differences in efficacy and adverse event rates between the comparison groups. Retrospective cohort studies have similarities to prospective observational studies, but are constructed using individuals who have received an intervention (or not) in the past. The opportunity for biases in these studies is typically somewhat greater than in prospective observational studies. In particular, it can be more difficult to retrospectively identify and obtain follow-up information on subjects than when an observational study is conducted prospectively. In some cases, new information can be obtained from subjects about their status (outcomes). Case Control Studies Case control studies are based on the identification of samples from a population of subjects who have (cases) or have not (controls) experienced the adverse event and for whom the presence or absence of device use can be identified. Unlike randomized clinical trials and prospective or retrospective cohort studies, case-control studies cannot be used to estimate the probabilities of interest given in (a) and (b) above, but do provide estimates of the OR for assessing association between device use and outcome. The validity of case-control studies depends on the selection of “cases” and “controls” who are representative of the populations of all subjects that did and did not experience the event, respectively. The selection of appropriate controls, in particular, can be challenging. The validity of case-control studies also depends on the accurate ascertainment of the intervention and other risk factors that might be associated with the outcome event. For many devices, ascertainment of whether or not a subject used the device can be reliably determined. Yet other risk factors for the event that might be used in multivariate analyses of association may depend on recall, and thus introduce bias. An important advantage of case control studies is that they can more easily assess the association between a device or risk factor and a rare outcome than a prospective study. This is facilitated if registries of certain

OCR for page 327
Safe Medical Devices for Children types of events, such as a diagnosis of cardiomyopathy, are routinely maintained, for then a relatively large number of cases can be readily obtained and matched with controls to identify possible risk factors for the outcome. Also, event reporting systems, such as the MedSun system currently being implemented by the FDA, lend themselves to case-control studies to identify possible safety concerns with devices, provided that there is a means of selecting controls to match the cases and to ascertain the “exposure” status (e.g., device use or not) of both cases and controls. Registry-Based Studies A registry is a system for collecting information about a class of individuals or patients who have in common a disease, injury, condition, medical procedure or product, or similar characteristic or exposure. A registry study is an investigation that uses registry data alone or in combination with other data. Some registries are based on diagnoses and include information about people with that diagnosis who receive certain interventions and people with the diagnosis who do not. Other registries include only individuals who have received a device or intervention. The former approach is most useful for comparative studies of those who have and have not been treated with a device. In some cases, the registration consists only of limited information about the subject at the time the device was first used. In more elaborate settings, registrants are prospectively followed for outcome events, forming a prospective observational study. DISPROPORTIONALITY ANALYSES OF SPONTANEOUS ADVERSE EVENT REPORTING DATABASES An adverse event reporting system is not a study design as such. Adverse event reports can, however, be combined with information from simple registries or other data sources to conduct case-control studies. Recently, new techniques for analyzing event databases have been gaining attention, mostly in the pharmaceutical arena. The remainder of this appendix reviews the application of these techniques to drugs and provides an example of an application to device databases. Overview Disproportionality analyses are the most common technique for analyzing adverse drug reaction databases, and they can also be used for analyzing adverse device events. They are a means of assessing the association between use of a device and outcome when denominator data are not available for estimating population parameters, as is the case with adverse

OCR for page 327
Safe Medical Devices for Children event databases. The risk of a particular adverse event (AE1) can sometimes be assessed with only numerator data if there are also numerator data available for another adverse event (AE2) that is not associated with use of the device. The rationale is to treat those experiencing the second adverse event as if they were the control group in a case-control study of the first adverse event (and vice versa). To see this, suppose that n1 and n2 denote the number of device subjects that experience AE1 and AE2, respectively, and that the unknown population size of device users is N. Similarly, let m1 and m2 denote the number of non-device subjects that experience AE1 and AE2; M denotes the unknown population size of non-device users. The RR corresponding to use of the device is If use of the device is not associated with AE2, we would expect approximately that n2/N = m2/M. Thus, we see that under these circumstances, the RR for association between device use and AE1 can be estimated by Note that this expression for the RR depends only on numerator data (event counts) for the two types of adverse events and for subjects who do and do not have the device. Thus, the RR can be approximated based only on numerator data. As discussed below, key challenges in disproportionality analyses include accurately ascertaining the numerator counts n1, n2, m1, m2 and the assumption that the device is not associated with the risk of AE2. In practice there are not just two adverse events involved in the analysis, but hundreds or thousands of different adverse events (so that AE2 above represents the amalgamation of all events but AE1), and it is generally assumed that the majority of the “other” events (as of all events) will not be associated with any particular device. There are several important advantages of disproportionality analyses. Data from clinical trials are rarely plentiful enough for useful studies of rare adverse events, in which case it may be necessary to fall back on retrospective studies. A carefully planned and executed case-control study, in which ascertainment of cases is well documented and the choice of controls is appropriate, is highly desirable but expensive and time-consuming. It would usually only be used to study a rare adverse event for a particular device if there has been some existing evidence of a potentially serious problem with that device-adverse event combination. In order to uncover potential problems in the first place, we are almost

OCR for page 327
Safe Medical Devices for Children forced to fall back on anecdotal evidence and the analyses of databases of spontaneous reports. The FDA, device manufacturers, and some other organizations maintain voluminous databases of spontaneously reported adverse events, consisting of coded and uncoded text descriptions of the patient, the device, what happened to the device, and what the effect of the adverse event was on the patient. These reports are typically not research quality in that the data collection or reporting guidelines and forms provided FDA are not equivalent in precision to those associated with formal studies, and the quality of provider, professional, and patient reports to manufacturers and FDA is highly variable Underreporting of adverse events is significant, and case-by-case follow-up may be necessary to correct reporting errors to confirm the existence of a problem. These are numerator data only, with no obvious way to match the report counts for each device-adverse event combination to an appropriate measure of exposure. In spite of the deficiencies of such data, analyses of frequency counts of spontaneous reports have proven useful as a problem screening and signaling tool in the adverse drug reporting domain. Analytic Issues and Strategies The idea behind disproportionality analysis is similar to the proportional mortality analysis of epidemiology, which might, for example, study a sample of death certificates and compare the distribution of deaths due to various causes for decedents who had different occupations. This is also a numerator-only analysis and is viewed as an outdated technique by most modern epidemiologists. Nonetheless, although such a study cannot measure the probabilities defined in (a) and (b), the measurement of associations between adverse events and drugs in databases can give (possibly biased) clues to problems during the postmarket phase. During this phase, the number of patients exposed to a new drug may grow by several orders of magnitude compared to the premarket phase, allowing much greater opportunity for rare events to be manifest. Number of Reports Mentioning AE Not Mentioning AE Mentioning Drug n = a b Not Mentioning Drug c d Suppose that the reports in a database are classified into the above 2 × 2 table, as to whether or not a particular drug and a particular adverse event are mentioned in a report. It is common to denote the counts of the four cells of a 2 × 2 table by the letters a, b, c, d, but we will sometimes use

OCR for page 327
Safe Medical Devices for Children the letter n to denote and emphasize the count in the first cell, where both the drug and the adverse event are present. The disproportionality measures are all of the form n/e, where e is a baseline or comparator value that is expected to be near n if the drug and the adverse event are not associated. Three variations in the definition of e are commonly used in disproportionality analyses: ROR = ad/bc [e = bc/d] PRR = [a/(a + b)]/[c/(c+d)] [e = c(a+b)/(c+d)] RR = [a/(a+b)]/[(a+c)/(a+b+c+d)] [e = (a+b)(a+c)/(a+b+c+d)] ROR is the reporting odds ratio (Rothman et al., 2004), PRR is the proportional reporting ratio (Evans et al., 2001), and RR is the relative reporting ratio (DuMouchel, 1999). Note that the epidemiology literature commonly uses the abbreviation RR for “risk ratio,” whereas in the literature analyzing spontaneous reports the preferred phrase is “reporting ratio,” since the presence of noncausal associations is common in spontaneous reports. If the particular adverse event is relatively rare in the database, and the particular drug is also mentioned in a small proportion of the reports, then a ≪ b, c ≪ d and all three measures ROR, PRR, and RR will be numerically similar. (It is very common for a to be less than 1 percent of b and c, and for these to be less than 1 percent of d, since the database may include thousands of different drugs and adverse events.) Of the three, RR has the computationally convenient property that whenever n = a is greater than 0, e is also positive, so that the ratio n/e is well defined in all cases. For ROR, if b = 0 or c = 0, and for PRR if c = 0, then e = 0, leading to an undefined value of n/e. On the other hand, Rothman and colleagues (2004) point out that in those cases in which the condition a ≪ b, c ≪ d (where ≪ means “much less than”) is not true, the odds ratio measure ROR has the advantage of avoiding certain biases that may be due to different drugs (or different adverse events) having different reporting rates. (Indeed, this is just why the odds ratio is the usually preferred measure of association for case-control studies.) Returning to the rare-item situation where the values of ROR, PRR, and RR are all about equal, the value n/e has a natural interpretation as a multiplicative measure, the factor by which the number of observed cases exceeds the number expected under the null hypothesis of no association between the drug and the adverse event. Use of these measures can be severely hampered by several problems, including confounding with demographic or other variables and high sampling variance. To give an example of the former problem, consider the discussion of DuMouchel (1999) about the association of SIDS (crib death) and many vaccines. This is due to the fact that SIDS only occurs in infants by definition and that infants also

OCR for page 327
Safe Medical Devices for Children receive a high proportion of vaccines. If the database is restricted to reports involving infants, the associations disappear. Another example is the association between Viagra use and various cardiac events—due to the association of both of these items with age and gender, each being most common among elderly men. Similarly in a database spanning many years of reports, any drug very new to the market is more likely to show an association with an adverse event that is newly defined in MedDRA (Medical Dictionary for Regulatory Activities), the coding dictionary for adverse events used in most such databases. To combat these common confounding biases, it is recommended to stratify the database reports by gender, age of patient and year of report (and possibly other variables, if available) and to compute a version of the disproportionality measure that adjusts for stratum effects. This correction for demographic and secular trend confounding seems to be most often used with the measure RR. The correction in this case, originally due to Mantel and Haenszel (1959) consists of computing e separately for the reports within each stratum, and then summing the stratum-specific values to get a total e to use in the ratio n/e. There are also analogous ways to adjust ROR and PRR for stratum effects. A trickier type of confounding is with the presence of an indication for taking the drug. For example, a drug used to combat cancer might have a reported AE that is a symptom of the cancer for which the drug is prescribed. It would be very difficult to automatically eliminate all such confounding from a computer database analysis. So far, the only feasible approach is to rely upon the medical knowledge of the analyst to recognize and discount such computed associations. Another problem with disproportionality measures is their often extremely large variance. For example, a database analysis may find tens of thousands of drug–event combinations in which n = 1 and e < 0.001 so that n/e > 1000. In contrast, a value of n = 20, e = 2, n/e = 10, is usually going to be a much more “interesting” drug–event combination on which to follow up. There are two statistical approaches to controlling false positives due to the high variance of n/e. The first is to restrict computation of the disproportionality ratio to combinations in which some measure of statistical significance meets a threshold. For example, Evans and colleagues (2001) recommend restricting PRR to combinations in which n > 2 and the chi-squared statistic for association in the 2 × 2 table is at least 4. The second strategy is to use a Bayesian or empirical Bayesian analysis to produce “shrinkage estimates” that stabilize the ratios by applying a prior distribution that reduces (“shrinks”) the ratios n/e when n and/or e are small. Bate and colleagues (2002 and references therein) describe a “Bayesian confidence propagation neural network” (BCPNN) method that has been used to analyze World Health Organization databases. The development and use of an empirical Bayes model, the “multi-item

OCR for page 327
Safe Medical Devices for Children gamma-Poisson shrinker” (MGPS) is described in DuMouchel (1999), DuMouchel and colleagues (2001), O’Neill and Szarfman (2001), Szarfman and colleagues (2002), and Fram and colleagues (2003). The latter system is in use at the FDA and several pharmaceutical manufacturers. These strategies have proven to be effective ways to reduce the noisiness of the disproportionality measures, but the role of such analyses in drug safety signaling is still evolving. Regulators and manufacturers are beginning to adopt them as an additional tool to help prioritize their pharmacovigilance efforts, but no one is proposing that they can provide definitive inferences without extensive follow-up. Differences Between Drug and Device Adverse Event Data There are several differences between drugs and devices that tend to change the potential use of spontaneous report databases. On the side of making reports regarding devices more useful, the more transparent action of many devices compared to a drug can often make even a single report very informative. For example, if a faulty device delivers an electric shock to the patient, physical inspection of the device might lead immediately to a suggested corrective design of the device—no reliance on statistical analysis is needed. But there are other features of the device world that make it harder to use spontaneous report databases. There are more devices than drugs, and manufacturers modify the design of their devices much more frequently than drugs get reformulated. This means that a particular version of a typical device has a smaller user base than a typical drug, and thus a typically smaller number of reports in a database. In addition, adverse event coding is more complicated for devices than for drugs, and not yet as standardized. A typical device adverse event report includes information both about what happened to the device as well as what happened to the patient, whereas the drug reports contain only the second sort of information. In the drug world, the MedDRA coding system has been adopted by the majority of collectors of adverse event reports around the world; no such standardization has been accomplished in the device world. A device adverse event database might have several hundred thousand reports, with just a paragraph of narrative text describing what went wrong. A data mining approach would need to group the adverse event descriptions into at most a few thousand adverse event types, and preferably do this automatically, without the need for human review of each report individually. A computer analysis can try to form clusters of reports based on the common occurrence of words or phrases in the narrative. The automated discovery of which words or phrases are indicative of important

OCR for page 327
Safe Medical Devices for Children associations in the database is called feature extraction. But the feature extraction step introduces another layer of uncertainty into the analysis, and is usually less informative than being able to work with adverse event codes that fit a well structured theoretical framework. The majority of drug adverse events have at least some natural background rate, making it seem natural to tabulate all combinations of drugs and adverse events in a data mining run. A user of practically any drug might conceivably show up with practically any adverse event, as diverse as liver damage and cardiac arrest. The nonsystemic nature of many devices makes the universe of observed adverse events much more device specific. Does it make sense to tabulate the frequency of electric shocks for patients using blood vessel shunts, for example? If not, how do you automatically decide upon the universe of device–event combinations to tabulate and study statistically? Database analyses of device reports have not yet been attempted on anything like the broad scale of drug data mining methodology. Example: Disproportionality Analysis for Spontaneous Reports of Shunt Complications This section provides a short example of a disproportionality analysis for complications involving cerebrospinal fluid shunts that was used to support the discussion in Appendix E. The analysis used adverse event reports from the MAUDE database managed by ECRI (a private, nonprofit health services research organization). A preliminary group of 2,472 adverse event reports that seemed to involve cerebrospinal fluid shunts were selected. (The authors appreciate the assistance of Mark Bruley of ECRI in this process.) An attempt was made to classify the type of complication by computer scan of the descriptive narratives for certain key words (suggested by Dr. Stephen Haines, co-author of Appendix E). The 12 types of complications and the search words used for each type are listed in Table D.1. The search for any of the text strings in Table D.1 was successful for just 784 of the reports. Table D.2 lists the counts of the manufacturer–event type combinations for these 784 reports. (The reports in which the shunt manufacturer was unknown are coded as “Man.I” in Table D.2.) The disproportionality analysis screens for unusually large counts (N) in Table D.2, in comparison to the count (E) that would be expected if there were no association between manufacturer and event type in Table D.2. In this example, the value of E for any event type and manufacturer combination is computed as the event type total times the manufacturer total divided by 784. The relative reporting ratio is defined as RR = N/E. Table D.3 lists the 15 largest values of RR among the 108 potential event type and manufacturer combinations listed in Table D.2. Note that

OCR for page 327
Safe Medical Devices for Children TABLE D.1 List of 12 Shunt Complication Event Types and the Text Strings That Were Used to Classify the Reports in Searching the Descriptive Narrative in MAUDE Reports Abdominal Cyst “peritonitis,” “pseudocyst,” “pseudo-cyst,” (“cyst” & “abdom”), (“cyst” & “periton”) Abdominal Metastas “metastas,” (“tumor” & “abdom”), (“tumor” & “periton”) Cardiac “heart,” “cardiac,” “atri,” “superior vena cava,” “endocarditis,” “vegetation” Disconnection “disconnect,” “fracture” Hemorrhage “hemorrhage,” “haemorrhage,” “hematoma,” “haematoma,” “bleed” Infection “infect,” “ventriculitis,” “meningitis” Malfunction “malfunction,” “occlu,” “block,” “obstruct,” “plug” Migration “migrat” Mortality “death,” “dead,” “mortality” Organ Perforation “perforat,” “penetrate,” “extru” Pneumocephalus “air,” “pneumocephalus” Slit Ventricles “slit,” “intracranial hypotension,” “overdrainage” TABLE D.2 Counts (N) for a Classification of 784 Reports of Shunt Complication by Type of Event and Manufacturer Event type A B C D E F G H I Total Abdominal Cyst 2 0 0 1 0 0 0 0 0 3 Abdominal Metastas 1 0 2 1 0 0 0 0 0 4 Cardiac 2 0 1 1 0 0 0 0 2 6 Disconnection 15 0 4 91 3 0 0 3 9 125 Hemorrhage 24 2 4 7 10 1 1 4 1 54 Infection 29 0 8 20 3 1 0 2 9 72 Malfunction 123 5 23 137 35 6 1 11 39 380 Migration 3 0 1 0 6 0 0 0 1 11 Mortality 2 0 0 0 1 0 0 0 1 4 Organ Perforation 7 0 2 1 0 0 0 0 0 10 Pneumocephalus 3 0 3 0 0 0 0 0 0 6 Slit Ventricles 43 0 3 0 0 0 0 0 0 46 Total 254 7 51 259 58 8 2 20 62 721   SOURCE: ECRI MAUDE database.

OCR for page 327
Safe Medical Devices for Children TABLE D.3 The 15 Largest Reporting Ratios (RR = N/E) for Manufacturer–Event Type Combinations Listed in Table D.1   Manufacturer Code Event N E RR EBGM 1 Man. G Hemorrhage 1 0.138 7.26 1.04 2 Man. C Pneumocephalus 3 0.413 7.26 1.25 3 Man. C Abdominal Mets 2 0.276 7.26 1.14 4 Man. E Migration 6 0.982 6.11 1.70 5 Man. I Cardiac 2 0.505 3.96 1.10 6 Man. B Hemorrhage 2 0.551 3.63 1.10 7 Man. I Mortality 1 0.337 2.97 1.01 8 Man. C Organ Perforation 2 0.689 2.90 1.08 9 Man. E Mortality 1 0.357 2.80 1.01 10 Man. C Cardiac 1 0.413 2.42 1.00 11 Man. H Hemorrhage 4 1.722 2.32 1.15 12 Man. A Organ Perforation 7 3.240 2.16 1.24 13 Man. E Hemorrhage 10 4.821 2.07 1.32 14 Man. A Abdominal Cyst 2 0.972 2.06 1.04 15 Man. D Disconnection 91 47.194 1.93 1.85 NOTE: The expected count E is computed under the assumption of no association between manufacturer and event type. The empirical Bayes geometric mean (EBGM) is a “shrinkage estimate” of RR that adjusts for the statistical uncertainty due to small counts. Note that the smallest value of RR in Table D.2 (Man. D, Disconnection, line 15) has the largest value of EBGM because it is based on much larger N and E. the largest reporting ratios RR in Table D.3 have small values of N, but much smaller values of E. The ratio RR has a lot of statistical “noise” when such small frequencies are involved. The values of RR computed from such small counts cannot be expected to be very stable as more reports are received. These ratios can, however, be adjusted to discount or “shrink” large values of RR when they are based on such small counts. A statistical model, the empirical Bayes gamma-Poisson hierarchical model (DuMouchel, 1999) can produce improved estimates (called the empirical Bayes geometric mean, EBGM) of the “true” reporting ratio.3 These values appear in the final column of Table D.3. This model predicts that the combination (Manufacturer D, Disconnection) in line 15 of Table D.3, which has the lowest value of RR but is based on much larger 3   Because relatively few reports were used in this analysis compared to adverse drug reaction data mining analyses, the empirical Bayes model used here was a simplification of that used in DuMouchel, 1999. The model used here assumes that the true relative ratio (True RR) has a prior distribution such that True RR = 1 with probability 1 − P, while, with probability P, True RR has a gamma distribution with parameters α and β. The analysis of the counts in Table D.2 resulted in the estimates P = 0.50, α = 3.79, and β = 3.85, and these values led to the values of EBGM listed in Table D.3.

OCR for page 327
Safe Medical Devices for Children values of N and E than any other combination in Table D.3, actually has the largest “true” relative reporting ratio, EBGM = 1.85. A reporting ratio of 1.85 is interpreted as an estimated 85 percent excess frequency in disconnection reports for manufacturer D compared to the number expected if there were no association between manufacturer and shunt complication event type in the database. The reduction in estimated reporting ratios from RR to EBGM is an attempt to adjust for the multiple comparisons fallacy. Since we examined nine manufacturers by 12 event types, we expect the largest of the 9 × 12 = 108 values of RR to be so large purely by chance. However, the statistical theory behind the shrinkage estimator EBGM makes it safer (more valid statistically) to pick out the largest value of EBGM without biasing the estimation of the reporting ratio. And, in fact, there have been literature reports of excess disconnections in shunts from manufacturer D. REFERENCES Bate A, Lindquist M et. al. 2002. A data mining approach for signal detection and analysis. Drug Safety 25(6):393–397. DuMouchel W. 1999. Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System, The American Statistician 53:177–202. DuMouchel W, Pregibon D. 2001 (August 26–29). Empirical Bayes screening for multi-item associations. Proeedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA. Pp. 67–76. Evans SJW, Waller PC, Davis S. 2001. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepia and Drug Safety 10(6):483–486. FDA (Food and Drug Administration). 2003. FDA Talk Paper: FDA Advises Physicians of Adverse Events Associated with Cordis Cypher Coronary Stents. Available at: http://boeree.hq.inverse.dk/bbs/topics/ANSWERS/2003/ANS01257.html. Accessed September 28, 2005. FDA. 2004 (October 18). Public Health Web Notification: Final Update of Information for Physicians on Sub-acute Thromboses (SAT) and Hypersensitivity Reactions with Use of the Cordis CYPHER™ Sirolimus-eluting Coronary Stent. Available at: http://www.fda.gov/cdrh/safety/cypher3.html. Accessed September 28, 2005. Fram D, Almenoff JS, DuMouchel W. 2003 (August 24–27). Empirical Bayesian data mining for discovering patterns in post-marketing drug safety. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC. Pp. 359–368. Ionescu A, Payne N, Fraser AG, Giddings J, Grunkemeier GL, Butchart EG. 2003. Incidence of embolism and paravalvar leak after St Jude Silzone valve implantation: experience from the Cardiff Embolic Risk Factor Study. Heart 89(9):1055-61. Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, Trevisan M, Black HR, Heckbert SR, Detrano R, Strickland OL, Wong ND, Crouse JR, Stein E, Cushman M, the Women’s Health Initiative Investigators. 2003. Estrogen plus Progestin and the risk of coronary heart disease. New England Journal of Medicine 349:523–534. Mantel N, Haenszel W. 1959. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 22:719–748.

OCR for page 327
Safe Medical Devices for Children O’Neill RT, Szarfman A. 2001. Some FDA perspectives on data mining for pediatric safety assessment. Current Therapeutic Research, Clinical and Experimental 62(9):650–663. Rothman KJ, Lanes S, Sacks ST. 2004. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiology and Drug Safety 13:519–523. Szarfman A, Machado SG, O’Neill RT. 2002. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Safety 25(6):381–392. Truog RD. 1999. Commentary: Informed consent and research design in critical care medicine. Critical Care 3(3):R29–R33. UK Collaborative ECMO Trial Group. 1996. UK collaborative randomised trial of neonatal extracorporeal membrane oxygenation. Lancet 348(9020):75–78.