Other Select Topics
The committee used this workshop, as well as the June 14–15, 2010 workshop, as opportunities to hear about a variety of issues related to the medical device regulatory lifecycle. This chapter summarizes three separate presentations. The first presentation is about software in medical devices. The second is on the Food and Drug Administration’s (FDA) use of evidence in premarket approval (PMA) process. The last presentation is an example of industry concerns about transparency of, and delays in, FDA decision-making within the 510(k) process.
TRUSTWORTHY MEDICAL-DEVICE SOFTWARE
Without software, many medical treatments could not exist, said Kevin Fu, assistant professor in the Department of Computer Science of the University of Massachusetts Amherst. The question is not whether devices should use software but rather how the complexities of software and its risks can be better understood. Fu presented an overview of a report that he was commissioned to prepare for the committee to summarize the role of trustworthy software in the safety and effectiveness of medical devices.1
“Software trustworthiness” is a system property that measures how well a software system meets operating requirements allowing stakeholders (such as patients, health-care professionals, and service providers) to trust the operation of the system. Software trustworthiness is closely tied to safety and effectiveness, and diminished trustworthiness can lead to lack
The complete commissioned paper is available as Appendix D.
of safety, effectiveness, usability, reliability, dependability, security, privacy, availability, and maintainability, Fu said.
Safety and Effectiveness
There can be overconfidence in the function of software, Fu said. Complacency can be based on the belief that if the software appears to function, nothing can go wrong; this is not always the case. Fu cited one example from the late 1980s involving the Therac-25, one of the first linear accelerators to use software aggressively in the control of radiation treatments. After reports from health-care professionals of injuries and deaths from machine malfunctions (which resulted in radiation overdoses), the manufacturer investigated and reported that the machine could not possibly overtreat a patient (Leveson and Turner, 1993).
Since then, the number of devices using software and the number of devices recalled for software-related issues have been increasing. Fu reported that 6% of all device recalls issued by the Food and Drug Administration (FDA) from 1983 to 1997 cited software as the reason. The proportion nearly doubled from 1999 to 2005: 11.3% of device recalls were attributed to software. In 1983–1997, 24% of recalled devices relied on software in some way, and this increased to 49% during 1999–2005. In 2006, it was reported that over half the medical devices on the US market involved software in their function. In 2002–2010, there were more than 537 recalls of devices that used software, which affected over 1.5 million devices being used in the United States.
Software in a device is different from the hardware for two reasons, Fu asserted. First, software is discrete, rather than continuous. For example, there would be little concern if a manufacturer of 1-inch nails produced a product ranging from 0.9999 inch to 1.0001 inch. That small error is usually tolerable. However a single error in a computer system, changing a 20-mL entry for an infusion pump to a 200-mL entry, can have potentially catastrophic consequences. There is generally no analogous notion of a safety margin for software. Second, software is extremely difficult to test for every possible complication.
Fu noted that software itself can constitute a device itself, for example, an electronic health record. Electronic health records, if designed correctly, could reduce errors substantially, especially errors of patient misidentification. But electronic health records will need to have very strong integrity guarantees and strong security and privacy, and there is an issue of interoperability among hospitals and systems. System complexities involving the collation of vast amounts of information could introduce risks. When asked whether a paper medical record would be a reasonable predicate, especially in considering security and privacy, Fu stated that software behaves differ-
ently from paper. Paper is static, but an electronic health record is dynamic, more like a “living record.” Fu asserted that there need to be different standards for electronic health records—not necessarily higher, but appropriate to the technology and the situation.
Mitigating Software Risks
Many of the risks associated with software design are preventable, Fu said. A workshop report from the Networking and Information Technology Research and Development Program found that the risks associated with software use are not peculiar to medical devices. Many of the standards and practices used in the development of software for other critical systems (such as avionics and nuclear systems) could be used to ensure confidence in medical devices, but they appear to be ignored by developers (NITRD, 2009). According to the report, “perhaps the most striking [difference] is the almost complete lack of regard, in the medical-device software domain, for the specification of requirements.”
Fu suggested that the committee consider recommending that device software developers follow good systems-engineering practices. Systems engineering, he said, is a much more encompassing technique than simply testing software in isolation. A systems approach could address, for example, whether a ventilator for oxygen works when the software is integrated in an ambulance.
Although implementation errors are often the subject of news stories, they are not actually the primary source of problems with medical devices, Fu said. He used an example of an implementation error that involved a Baxter infusion pump. Underdosing of a patient with three drugs led to increased intracranial pressure and then brain death. A message appeared on the device screen indicating a “buffer overflow,” and the pump shut down. In simple terms, a buffer overflow occurs when the buffer has too little memory space to hold the information that the program is attempting to place in it. A problem with buffer-overflow errors is that they are difficult to reproduce, especially during service. In this particular case, the manufacturer was eventually able to reproduce the problem outside the clinical setting and found that a pump-software upgrade had resulted in a slight coding implementation error that caused the device to fail (and ultimately led to the patient’s death).
How do human factors come into play in errors associated with user interfaces? Infusion pump–user interfaces and software are used effectively and safely every day in medical practice. However, infusion pumps in general have been linked to over 500 deaths and over 56,000 adverse-event reports. Fu discussed an April 2010 New York Times report on infusion-pump problems, which stated that 710 patient deaths were linked to a health-care provider’s entering an incorrect dosage or to a malfunction in the software (Meier, 2010).
Implantable pumps are used to treat for some diseases. Computer control systems for them are used by health-care professionals to set dosage. In the user interface, there are spaces to enter dosage, bolus size, and the duration (hours, minutes, and seconds) for which to administer the bolus.
Fu cited one example: an adverse-event resulting in a patient death reported in Manufacturer and User Facility Device Experience (MAUDE). A bolus was given in 20 min rather than the intended 20 h (that is, at 60 times the intended rate). The patient who had the implanted drug pump lost consciousness while driving because of the overdose, was involved in a collision, and died. The FDA recall notice stated that the software did not provide a label for the hours, minutes, and seconds data-entry fields and that the new software has such labeling.
Fu also cited the more recent radiation-therapy accidents involving linear accelerators. Nothing on the machine, Fu said, warns technicians that they may have entered an inappropriate dose, and there is no way to know that they have entered the right data. It was reported that failures in computer software led a technologist to think that they had set the correct radiation exposure when the machine was actually administering a much larger amount of radiation, which led to several injuries and deaths.
Better analysis of human factors and how they interact with software could help to prevent injury and death, Fu said.
Software users are familiar with the dialogue boxes that appear on a computer screen and advise that a software update is available and should be downloaded and installed. Consumers of commercial, off-the-shelf software are in effect treated as beta testers. Traditionally, for noncritical systems, developers seem to believe that if there is a “bug” in the system they can just send out a patch later.
Problems with computer maintenance can have far-reaching consequences that affect the availability of care in hospitals and other infrastructure. In one example from April 2010, health-information technology
devices were globally rendered unavailable by a single point of failure, a software update gone awry. Many diagnostic machines use a particular antivirus product; to protect them from the latest computer viruses, the product automatically updates software. In one of the updates, however, one critical core component of the Windows operating system was classified as a virus and quarantined by the antivirus software. Computers later entered an endless loop of reboot cycles. Numerous computers and hospitals were affected. One-third of the hospitals in Rhode Island, for example, were forced to postpone elective surgery and to stop treating nontrauma patients in their emergency rooms. At Upstate University Hospital in New York, 2,500 of 6,000 computers were affected.
Fu pointed out that although technical difficulties with software are not unexpected (and might even be amusing) when someone is giving a presentation at a meeting, the same difficulties can occur during a mammography or radiation treatment. In one case, a magnetic resonance imaging machine entered an endless reboot cycle while a patient was in the device. The patient experienced cardiac arrest and died, but the health-care professionals did not notice, because they were focusing on determining why the computer was rebooting.
User factors that are ignored in noncritical systems are important in injury and death in health-care–related systems. Discussions on technology support-group Web sites suggest that end users are often helpless when it comes to dealing with these systems. Fu cited an online forum discussion in which a user was seeking information on how to downgrade the Windows operating system to a prior version (for example, from version SP3 to SP2). The user was setting up an electronic picture archiving and communication system (PACS) for recording medical images, such as x-ray pictures. The PACS was compatible only with the earlier version of Windows (SP2), but the new computers that he ordered to be used with it came with SP3 preinstalled. He had already invested substantially in products to be used with the PACS, many of which also could not be used with the SP3 operating system. Later in the chat, he received various advice on how to downgrade. That may solve the immediate problem of computer compatibility with the PACS, but it creates serious new problems. Specifically, Microsoft Windows ended its support of SP2 and is no longer providing security updates. Thus, for the PACS to work, the system is being run on new computers with obsolete, unsupported operating systems that are subject to security vulnerabilities.
Users share responsibility for keeping their software up to date, Fu said. A manufacturer might produce applications that run on commodity, off-the-shelf software, but then users (hospital health-care professionals) have to maintain it, and sometimes they have conflicting requirements. Fu said that shared responsibility results in no responsibility, especially in the case of software. A single platform may have components from many
manufacturers, and there can be many kinds of failures. As a result, it can be hard to assign responsibility to any kind of failure that might happen in a computer system.
Problems on the Horizon
The FDA Center for Devices and Radiological Health (CDRH) director, Jeffrey Shuren, has stated that health information technology (HIT) software is considered a medical device. FDA has largely refrained from enforcing regulatory requirements for HIT devices, but the agency has received 260 reports of HIT-related malfunctions, and the reports noted 44 injuries and six deaths.
Computer viruses are a continuing concern. Fu noted that in May 2010, Roger Baker, chief information officer for the US Department of Veterans Affairs (VA), testified before a House of Representatives subcommittee that over 122 medical devices in the VA network had been compromised by malware during the preceding 14 months.
A computer virus does not discriminate between a home computer and a piece of radiology equipment, Fu said. All are at risk. But what about intentional malfunctions in software beyond viruses? Fu asked. He reminded participants of how the security of drug packaging in the United States was improved substantially after deaths from cyanide-laced Tylenol in 1982. As a result of that malicious act, there is now regulatory guidance on secure packaging of medicines. Under 21 CFR 211.132, FDA has the authority to establish a uniform national requirement for tamper-evident packaging that will improve the security of over-the-counter drug packaging. Perhaps security needs to be considered more carefully in looking at the safety and efficacy of medical-device software, Fu said.
In an effort to improve patient safety and device security, Fu and colleagues analyzed an implantable cardiac defibrillator and, through reverse engineering, were able to develop a software radio with which they could wirelessly induce the device to cause ventricular fibrillation (Halperin et al., 2008). Fu said that that was possible because of a problem with requirement specification: an unauthorized person should not have been able to manipulate the device.
Together, emerging technologies, HIT software, wireless capabilities, interoperability of devices, patient mobility (which implies the use of devices outside the health-care setting), and the Internet lead to substantial security and privacy risks, Fu said.
510(k) Substantial Equivalence: What Is a Predicate Device for Medical-Device Software?
Fu reiterated a comment made by former CDRH Director David Feigal, at the June 14-15, 2010, workshop, who wondered what the first predicate for software had been. In considering substantial equivalence, Fu said, is a hardware implementation the predicate for the software? Hardware and software are very different entities, he stressed, with very different risks. He suggested that any kind of software device that cites hardware as its predicate deserves careful assessment from a risk perspective. If there were more meaningful requirement specifications, he noted, it would be easier to determine substantial equivalence.
Software in medical devices, Fu summarized,
Breeds overconfidence (“the computer can’t be wrong”).
Is not thoroughly testable (devices do not operate in isolation, and not all possible interactions can be tested).
Is flooding into medical devices at an increasing rate.
Is not equivalent to hardware from a risk perspective.
Many of the risks associated with medical-device software could be mitigated with known technology, Fu said. The software-engineering, systems-engineering, and safety-engineering communities have many techniques that could address problems that have led to software-associated device adverse events.
Fu highlighted several subjects for the committee to consider. They included the idea that device manufacturers need to be incentivized to adopt modern software-engineering and systems-engineering technologies—from static analysis to programming languages that are more easily able to integrate with requirement specifications. Better analysis of human factors that come into play in the use of device software could help to prevent injury and death. Attention should be paid to developing a safety net for security and privacy.
Outcome measures are needed, Fu said, and manufacturers should be able to state the outcome measures for the safe and effective use of medical devices more openly.
He also noted that statistics are needed and that most of the available data are on failures of device software and far fewer on successes. The software-engineering community has techniques that are known to work well in other critical systems, but the medical-device community has not, in general, used them.
There is a call for more open research and open test beds, Fu said. Researchers note that it is difficult to contribute their software technology to the medical-device community because of the proprietary nature of device design. If there were more open test beds, there would be more innovation, he suggested.
Fu reiterated that shared responsibility for a product has a tendency to mean that no party takes responsibility. There needs to be a single authority that is responsible for the safety and effectiveness of the software in a device.
Finally, he noted that research in other fields may be useful in considering an approach for devices. In avionics, for example, the National Aeronautics and Space Administration maintains a safety culture that includes technical and managerial approaches for mitigating the complex risk that arises when taking something in isolation and connecting it with other pieces. The avionics community is also using databases to try to understand successes, failures, and near-misses.
STRENGTH OF STUDY EVIDENCE EXAMINED BY THE FOOD AND DRUG ADMINISTRATION IN PREMARKET APPROVAL OF CARDIOVASCULAR DEVICES
Rita Redberg, professor of medicine at the University of California, San Francisco Medical Center and editor of Archives of Internal Medicine, presented an overview of a study of the strength of study evidence examined by FDA in premarket approval (PMA) of cardiovascular devices (Dhruva et al., 2009), and proposes some opportunities for improvement in the use of clinical evidence.
Cardiovascular devices are increasing in complexity and are an important part of the medical economy, with over 1 million stents, 350,000 pacemakers, and 140,000 implantable cardioverter–defibrillators (ICDs) implanted in 2008. Redberg noted that patients are increasingly exposed to direct-to-consumer advertising of specific devices with claims about how the devices can improve quality of life. The claims are not always consistent with the medical literature.
Background and Objectives
Class III devices are defined by FDA as devices that support or sustain human life, that are of substantial importance in preventing impairment of human health, or that present an unreasonable risk of illness or injury. The PMA process is the most stringent type of device-marketing application required by FDA and is required for most class III devices. Of the approximate 8,000 devices that are marketed yearly, 50–80 of these devices go through
the PMA process. Examples of class III cardiovascular devices are stents, heart valves, and ICDs.
Redberg noted that many high-risk devices are not going through the PMA process. A 2009 Government Accountability Office (GAO) report, FDA Should Take Steps to Ensure That High-Risk Device Types Are Approved Through the Most Stringent Premarket Review Process, examined all the high-risk device approvals from 2003 through 2007 (GAO, 2009). GAO found that 78% of the 217 original and 85% of the 784 supplemental PMA submissions for class III devices were approved through the PMA process . However, GAO also found that more class III, or high-risk, devices were given a 510(k) exemption during that period than went through a PMA process. That means, Redberg said, that clinical-trial data on many high-risk devices are not being collected.
For the study, Redberg and colleagues reviewed the summary of safety and effectiveness data (SSED) on 78 high-risk cardiovascular devices that received PMA from January 2000 to December 2007. As would be the case in analyzing the quality of a clinical trial, study data in each SSED for devices that received PMA were assessed for randomization, blinding, primary end points, active controls, analysis, and followup time. Written by the device sponsor and reviewed by FDA, an SSED is “intended to present a reasoned, objective, and balanced critique of the scientific evidence which served as the basis of the decision to approve or deny the PMA” (FDA, 2010). After device approval, the SSED is made publicly available by FDA with the device’s approval order and labeling.
The 78 devices that received PMA in 2000–2007 had undergone a total of 123 clinical studies (mean, 1.6 studies per device; range, of 1–5 studies per device). Most of the approvals (51 of 78 PMAs, or 65%) were based on a single study. Of the 123 studies, 33 (27%) were randomized studies, and 17 (14%) were blinded studies (either single-blind or double-blind).
The 123 studies encompassed 213 primary end points. Seventeen (14%) of the studies did not list a primary end point. For the rest, the number of primary end points ranged from 1 to 10. (Redberg acknowledged that normally one would have a single primary end point, but they recorded whatever was noted in the SSED, and many of the device studies listed more than one primary end point.)
The number of patients enrolled per study ranged from 23 to 1,548 and averaged 308. As is typical in cardiology studies, Redberg said, the mean age was about 62 years (only 87 of the 123 studies reported age). A little more than two-thirds of the patients were male, which is also fairly typical for cardiology studies (80 of the 123 studies reported the sex of participants).
Of those participants that reported race and ethnicity, 87% were white, 6% black, 5% Hispanic, and 3% other minorities (only 11 of the 123 studies reported race).
Of the 83 studies that listed location, 43 had all US sites, 22 had no US sites (that is, the PMA was achieved without any studies conducted in the United States), and the rest had a mixture of US and other sites.
Followup time varied by device. On the average, stents were approved on the basis of a 6-month followup, and implantable electrophysiology devices were approved on the basis of a 3-month followup. The longest followup periods were 1 year, for endovascular grafts and intracardiac devices.
For about half the primary end points, patients who received the intervention were compared with controls who did not. However, in about one-third of the groups that were randomized, the controls were not enrolled as part of the study; instead, those who received the intervention were compared with “retrospective controls” from a previous study. The other half of the studies were single-arm studies (they had no randomization or control group); objective performance criteria were established by FDA in conjunction with the sponsor. Redberg pointed out that 187 of the 213 primary end points (88%) were surrogate end points, and she noted that there is concern about nonvalidated surrogate end points. It is not clear how well a surrogate end point represents the clinical end point. There is a difference, for example, between an angiographic finding that a stent is open or has closed and a clinical end point of presence or absence of chest pain, and patients who have open stents may still experience chest pain.
Another measure that Redberg analyzed was the number of patients who were enrolled vs the number included in the data analysis. For 122 of the 213 primary end points, there was a difference between the number of patients enrolled and the number analyzed. For all 213 primary end points, over 10,000 patients (27%) were listed as enrolled in the studies but not included in the SSEDs.
Redberg and colleagues could not interpret 15% of the primary end points, because no target goals for device performance were stated in the SSEDs. In several cases, a stated target end point was not met, but the device received PMA. One example cited by Redberg was the NaviStar Thermocool ablation catheter, which has a target stated in the SSED of 50% chronic success but was approved on the basis of achieving 47%.
Redberg noted several limitations of the study. Data were abstracted from the SSEDs that were available on the FDA Web site. Although an SSED is considered to be a summary of all the data on which FDA based its decision, the agency often has additional company-confidential documents that it does not post on its Web site. And it is possible that FDA required followup studies as a condition of approval, but such studies were not available to be included in the analysis.
Food and Drug Administration Response to the Study
After the publication of the study in JAMA, FDA issued a response stating, Redberg said, that a single pivotal study is adequate for approval of a device; that a randomized, double-blind, placebo-controlled trial is not always the best way to look at device data; and that the length of a study is not related to the quality of the data that it produces. Device trials, the agency said, must incorporate the practical realities of devices, which are different from drugs. According to Redberg, FDA also disagreed with the study conclusions regarding clinical vs surrogate end points, and she said that FDA noted that an SSED is not a “surrogate” for the full confidential data review (which is not publicly posted on the Web site).
Authors’ Response to Food and Drug Administration Criticisms
In response to FDA, Redberg said that the study authors feel that a single clinical study is often not adequate and that two studies are preferred for high-risk cardiovascular devices. She added that randomized, controlled, blinded trials with complete followup provide the highest-quality data. Those devices are often implanted permanently, she stressed, and removal entails substantial risk. She expressed concern about the use of retrospective controls, noting the opportunity for bias when one can pick and choose to form a control group. With regard to blinded device trials, Redberg noted that sham controls are used in surgical trials. It is important to discern whether the effects are from the device’s working as expected, she said, or from the invasive procedure of implanting the device. She also spoke of the need for clinical end points that directly measure how a patient feels, functions, or survives.
With regard to the additional confidential data that may be part of a PMA application, Redberg said that she was able to request access to the confidential files for a number of the PMAs that were part of the study, and her review of the additional data did not change the overall findings of the review of the SSED data described in the paper.
Opportunities for Improvement
On the basis of her study results, Redberg offered several recommendations for improving the FDA PMA process for high-risk devices:
Require at least one randomized and blinded study for each device. Ideally, the clinical-trial population should be representative of the intended patients with respect to age, sex, race, and comorbidity; and most of the clinical sites should be in the United States.
Require longer followup time and the use of clinical (not surrogate) end points.
Require an intent-to-treat analysis of all enrolled patients. All patients enrolled should be reported in the SSED, and controls should be active, not retrospective. Adverse outcomes and poor efficacy may be missed if not all data on all enrolled patients are analyzed.
Provide more public access to the raw data examined by FDA in an easy-to-navigate fashion.
Make postmarketing studies available on the clinicaltrials.gov Web site.
In closing, Redberg expressed support for the recent FDA Transparency Initiative and the agency’s plans to improve the quality of clinical trials and for the IOM committee’s current assessment of the 510(k) process.
CONCERNS REGARDING CONSISTENCY OF DECISION MAKING IN THE 510(k) CLEARANCE PROCESS
Robert E. Fischell, founder and chief technology officer of Neuralieve Inc., described his company’s recent experience in working to bring a product to market through the 510(k) clearance process as an example of industry concerns with a perceived lack of transparency and consistency in decision making. His company’s 510(k) submission for a device which uses transcranial magnetic stimulation to relieve migraine headaches was rejected by FDA. He noted that the agency had previously cleared, via the de novo 510(k) clearance process, another company’s device which uses similar technology to treat depression.
Fischell noted that FDA did not respond to his company’s de novo 510(k) submission within the 60-day target and said that there is no recourse for the device sponsors when such response deadlines are not adhered to. At one point, company officials met with a FDA branch chief, who advised, Fischell said, that there should be a panel meeting. According to Fischell, a panel meeting for a 510(k) submission is unprecedented. Company officials were later told that there would be no panel meeting, and that the company should apply for a PMA with additional data to address concerns about efficacy and safety.
Fischell expressed concern that there is no appeal process that works in FDA to overrule the lowest level of review. He expressed further concerns about how FDA reviewers were assigned to particular devices and if their experience and training were appropriate for the device in question.
Dhruva, S. S., L. A. Bero, and R. F. Redberg. 2009. Strength of study evidence examined by the FDA in premarket approval of cardiovascular devices. JAMA 302(24):2679-2685.
Dodick, D. W., C. T. Schembri, M. Helmuth, and S. K. Aurora. 2010. Transcranial magnetic stimulation for migraine: a safety review. Headache 50(7):1153-1163.
FDA (Food and Drug Administration). 2010. PMA Application Contents: Summary of Safety and Effectiveness Data (§814.44). http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/HowtoMarketYourDevice/PremarketSubmissions/PremarketApprovalPMA/ucm050289.htm (accessed October 20, 2010).
GAO (General Accountability Office). 2009. FDA Should Take Steps to Ensure That High-Risk Device Types Are Approved Through the Most Stringent Premarket Review Process. http://www.gao.gov/new.items/d09190.pdf (accessed August 16, 2020).
Halperin, D., T. S. Heydt-Benjamin, B. Ransford, S. S. Clark, B. Defend, W. Morgan, K. Fu, T. Kohno, and W. H. Maisel. 2008. Pacemakers and Implantable Cardiac Defibrillators: Software Radio Attacks and Zero-Power Defenses. In the proceedings of the 2008 IEEE Symposium on Security and Privacy.
Leveson, N. G., and C. S. Turner. 1993. An Investigation of the Therac-25 Accidents. Computer 26:18-41.
Meier, B. 2010. FDA Steps Up Oversight of Infusion Pumps. The New York Times, April 23, B1. http://www.nytimes.com/2010/04/24/business/24pump.html (accessed August 11, 2010).
NITRD (Networking and Information Technology Research and Development). 2009. High-Confidence Medical Devices: Cyber-Physical Systems for 21st Century Health Care. http://www.nitrd.gov/About/MedDevice-FINAL1-web.pdf (accessed August 11, 2010).