Political Barriers to Change
JP Morgan Partners, LLC and The Wharton School
The United States spends more than $200 billion a year to provide guaranteed health care to some 39 million Medicare beneficiaries. The federal government, through Medicare, supplies one-third to one-half the revenues of every hospital in the country and a substantial proportion of the revenues of each and every physician, nursing home, and other health care provider and vendor of every stripe. Medicare is one of—if not the—most popular government programs ever invented, at least among its beneficiaries, if not among hospitals and physicians. So it might be reasonable to think that if the federal government can run such an effective, popular program, it could also lead the charge to reengineer health care delivery systems to promote better quality of care. Reasonable, but naive. I hate to be the skunk at today’s party, and no one wants to improve health care quality more than I do. But I guess I am here to provide the realpolitik. Based on my experience as a health-policy official in the federal government, I believe that not only do we have a very long way to go to achieve acceptable levels of clinical quality (as others today have persuasively argued), but also that it will not be easy for the federal government to be involved appropriately in moving the system in the right direction.
We are gathered here under the auspices of the National Academy of Engineering, and I imagine there are some rocket scientists in this room, so I hesitate to say this. But I think it is important to understand that changing health care is not rocket science. It is harder. This morning I will talk about my experiences as a health-policy official in the Clinton administration that have led me, regrettably, to a fairly pessimistic assessment of what we can reasonably expect the government to do in reengineering health care delivery systems to improve the quality of care.
Last October, the Health Care Financing Administration (HCFA) (now the Centers for Medicare and Medicaid Services, or CMS), the agency that administers Medicare, finally published the first state-by-state assessment of the quality of care Medicare provides to beneficiaries (Jencks et al., 2000). I say finally both because it took us a long time to get the sign-offs from the U.S. Department of Health and Human Services and from “down the street” (the way we referred to the White House and the Office of Management and Budget) to release the study and because it had taken Medicare more than 35 years to get around to making this assessment.
To do this first-ever qualitative assessment, we began by assembling a group of experts to decide which areas should be included. The group decided to focus on process measures rather than outcomes, which would be more controversial. According to the clinicians who worked on the study, the measures were very basic; one told me they were practically at the level of washing one’s hands before surgery. In other words, all of the measures were supported by clinical consensus, and, in an ideal world, they should all have been at or near 100 percent. The beauty of the study is that it clearly indicates where we are and what needs to be done.
We chose to assess processes in the six most significant areas for Medicare beneficiaries: acute myocardial infarction, heart failure, stroke, pneumonia, breast cancer, and diabetes. For some measures, we did a systematic, random sample of up to 750 patient records in each state; for others we looked at all Medicare claims.
For pneumonia, for example, the study showed that the 39 million Medicare beneficiaries are not getting some of the basic things they need to treat them when they are sick or to prevent them from getting sick. The study focused on errors of omission rather than errors of commission (in this sense, it was slightly different than the focus of To Err Is Human [IOM, 2000], the ground-breaking study documenting 50,000 to 100,000 death annually from preventable medical errors). Overall, it wasn’t a very pretty picture. The individual indicators ranged from a low of 11 percent to a high of around 95 percent. Some of the data suggest major problems in medical education and care processes. Practice
patterns showed physicians were not providing some very basic treatments in roughly 50 percent of the cases.
The report was published the week I left HCFA, and this timing was not accidental. It required a great deal of effort to get it published, and in fact, I think some of the career clinicians and policy analysts who worked on the assessment and the article we published began to doubt whether it would ever see the light of day. The agency did not publicize it widely, in part because of a concern that the information might make Medicare, or some physicians or hospitals, look bad, and in part because it might be embarrassing to some states. We did not have a political agenda when we undertook the study. We wanted to know where we stood and how effective Medicare was in translating insurance coverage into high quality care so we could assess how well we were doing in purchasing good care on behalf of Medicare beneficiaries and taxpayers.
The silence that greeted our efforts was deafening. I remember that New Jersey ranked near the bottom of the 50 states on an aggregate basis. The day after the data were released, the New Jersey Medical Society held a press conference. Surprisingly, the physicians at the press conference did not question the data or criticize HCFA (a popular pastime). Instead, they said they were concerned and would work to improve the quality of care provided to Medicare beneficiaries in New Jersey, which seems like an entirely proper and laudable reaction. But except for that press conference, I heard nothing about the report. I have since quizzed clinicians in other states—and found, totally unscientifically but, I fear, very reliably, that almost no one had heard of it. In fact, let me see a show of hands in this room of those who have heard about this report. So that is the bad news.
The good news is that just because no one has heard of the report doesn’t mean that it won’t have an impact. HCFA’s plan is for the peer review organizations that conduct quality improvement projects (called “QIOs” or Quality Improvement Organizations) to work with the providers in each state to develop a plan to improve their scores on each of the quality-of-care indicators. Over time, the plan for New Jersey might raise some indicators by 10 or 20 percent, which would make a real difference in the lives of Medicare beneficiaries. That, I think, is the best we can hope for.
Don’t get me wrong. I think that result would be terrific. But I think this example illustrates why changing health care is so hard—it is personal, and it challenges people’s assumptions, and it forces them to think about things they would prefer not to think about, like what happens when loved ones get sick. It is really difficult to get a consensus about what constitutes quality health care. And I guarantee you, most clinicians believe they are providing high quality care.
Consider an anecdote that illustrates why changing health care is so hard. In the late 1980s and early 1990s, Medicare tried a demonstration program in which we designated “centers of excellence,” which were hospitals that had demonstrated consistently high quality care and good outcomes for certain procedures and were willing to accept a capitated payment and meet other standards. Beneficiaries could still go wherever they wanted to have their cataracts removed or to undergo coronary artery bypass grafting surgery, but they were offered lower copayments if they chose a center of excellence. We wanted to see if giving beneficiaries incentives to choose hospitals that had demonstrated a better quality of care would improve outcomes and save beneficiaries (and Medicare) money.
The results were even better than we expected. The outcomes for patients improved, and Medicare and the beneficiaries saved money. Therefore, we proposed extending the centers of excellence program to include other procedures and make it a permanent feature of Medicare. Great idea, right? Not so fast.
Our proposal was included in several versions of the Balanced Budget Act but was rejected by congressional conferees in the end because of heavy lobbying against it. You might assume, as I did, that the lobbying was on behalf of a group of mediocre hospitals that were threatened by the notion of providing information about outcomes and quality and offering beneficiaries incentives to choose higher quality care. You would be wrong, however. The intense opposition came from one of the premier academic health centers in this country. As it was explained to me, this facility felt it was already considered a center of excellence and did not want Medicare to put its “Good Housekeeping Seal of Approval” on other facilities. Amazing, but not that unusual! It seems that everyone wants a market-based health care system where competition is allowed to drive prices down and quality up—unless it affects them. When there’s this much money involved, and when providers view it as a zero-sum game—“if hospital X gets the patients and the money, hospital Y loses”—markets do not (or are not allowed to) function as they are intended to function.
Another reason changing health care is harder than rocket science is that there is no real consensus about its goals or the laws that govern it. When you’re faced with a vexing scientific or technical problem, you can go back to first principles, theorems, settled laws, and formulas. There is nothing like that in health care. Other than the vague principle that promoting the public health is a good thing and laws that say certain Americans are entitled to Medicare and Medicaid coverage, there are no rules and no entity with the authority to enforce them.
This may strike you as a strange insight coming from the person who used to run Medicare, the 900-lb. gorilla. And yes, it is true that the government, because Medicare controls about one-third of health care spending, can have a great deal of influence over what happens in health care. But in Medicare’s nearly 40-year history, I think our record of using Medicare and its huge dollar impact to affect the quality of the health care delivery system is mixed at best. Medicare has shown that it can set prices. Medicare has shown that it can set minimal conditions of participation (which in some
cases are proxies for quality, or at least, in the aggregate, a basis for concern if they are not met). And in recent years, the federal government has shown that it can control waste, fraud, and abuse in Medicare by aggressively (some would say too aggressively) prosecuting and punishing wrongdoers.
What is less clear is whether Congress or the public would tolerate Medicare using its market leverage and its authority to purchase health care for 40 million elderly and disabled people to affect broad changes in the delivery system. I am skeptical that this would be allowed—or for that matter, seriously attempted by any administration. The Clinton administration did make some efforts in this direction, but it was never anything we went to the mat for. When all is said and done, I am not convinced the country is ready to invest in the kind of assessments and distinctions based on quality that would force us to implement the changes we need. Certainly, Congress is not ready to support it. The government is a powerful purchaser and could be a powerful force for change. However, after eight years, I have learned that there is a good deal of opposition in Congress to giving HCFA, or any agency, the kind of authority it needs to make meaningful changes in the way health care is delivered in this country. There are too many entrenched interests, and there is too much money at stake.
Even though this conference is asking the right questions and represents a step in the right direction, except for the adoption of some basic technologies that, despite their widespread adoption in other industries, are not yet widely used in this country, there is no consensus about what the government, or anyone else, should do to improve the quality of health care. For these reasons, I think we need to be realistic about how difficult reengineering health care delivery systems will be and how difficult it will be for the government to play a leadership role.
IOM (Institute of Medicine). 2000. To Err Is Human: Building a Safer Health System, edited by L.T. Kohn, J.M. Corrigan, and M.S. Donaldson. Washington, D.C.: National Academy Press.
Jencks, S.F., T, Cuerdon, D.R. Burwen, B. Fleming, P.M. Houck, A.E. Kussmaul, D.S. Nilasena, D.L. Ordin, and D.R. Arday. 2000. Quality of medical care delivered to Medicare beneficiaries: a profile at state and national levels. Journal of the American Medical Association 284(13): 1670–1676.
Lessons from Financial Services
Financial services and medical services have many similarities. They are both data intensive. They are both service industries with high-stakes costs. Both have a mix of large and small providers. And both are opaque, meaning that consumers generally have a limited understanding of how they work. For both, as procedures and processes become more complex, the number and extent of system vulnerabilities increases. In other words, the more complex a system is, the more likely it is that something will go wrong. Innovations are simultaneously a source of risk and a means of avoiding risk. A new test, for example, may help identify a condition and treat it, but the test itself may involve risks.
Financial services and medical services also have many dissimilarities. Risks in financial services tend to be symmetrical; they have an up side and a down side. Many medical risks, however, have only a down side.
How much can be learned from financial services depends on how much you can learn from failure. In the last five years, there have been multi-billion-dollar failures among very sophisticated organizations; losses have been on the order of $2 billion to $3 billion, which can wipe out a major institution overnight.
The causes of those dramatic failures fall into two major categories. The first is errors in risk mitigation; a risk was identified but the attempt to reduce or manage the risk failed. Errors in risk mitigation come from three areas: agency risk, risk migration, and risk degradation. The second cause of failure is errors in risk measurement. One of the most common errors is the assumption that all risks are normally distributed. We tend to know a lot about the middle of a bell-shaped curve, but we have a very poor idea, based on historical data, of what is in the tails.
In financial services, returns are the outcome, and returns are not normally distributed. There are fewer outcomes in the middle of the curve and more at the extremes. In risk management, that means that the 100-year storm is going to occur every 50 years. The analogy in medical services is that your mitigation efforts will depend on how frequently you believe extreme adverse circumstances in the tails will occur. If you base your strategy on normal distributions, you will underinvest in risk management.
Another error in risk measurement is a failure to take covariances into account. We have found that risk is driven much more by covariances than by standard deviations. The poster child for this error is Long-Term Capital Management. Despite sophisticated models, Nobel laureates on the board, and bright employees, the company concluded that many adverse circumstances would not happen at once, that diversification would provide protection. It did not. In financial services, in periods of real stress or meltdown, the correlations tend to change. If you don’t include covariances in the model, you will have a very difficult time modeling the risk. You cannot use normal periods as a basis for modeling crises; in addition, covariances and correlations may not be stable.
Another cause for error is risk ignorance, failure to recognize that a risk exists. Risk ignorance tends to be associated with innovations, a lack of familiarity with the characteristics of new products, new drugs, new surgical processes. If you are unfamiliar with what is likely to happen, it is very difficult to know how to mitigate the risk.
A risk mitigation plan itself can have risks because it provides a sense that risk has been addressed. If you haven’t addressed the risk correctly, the risk can be higher than if you had no mitigation strategy. Perhaps the most common form of risk is what we in financial services call agency risk. In aviation, it is called pilot error. In medical terms, it is called medical malpractice. The employee or the staff member fails to follow established procedures.
Risk migration is another problem. Most risk mitigation efforts do not eliminate risk; they simply transform it into another form of risk or transfer it to another area. A new medication, for example, may reduce risk, but may raise the risk of administering an incorrect dosage. An example in
financial services was during the meltdown in 1998, when many U.S. banks did foreign exchange swaps with Russian banks to protect themselves against the decline in the Russian ruble. That worked fine until the Russian banks failed.
That is a very clear example of risk migration. If you fail to recognize risk migration, you end up with risk ignorance. You assume you have protected yourself against a risk, when all you have done is transfer it to another site. You can’t simply take the first step. You have to take the second step and know what to do if the backup system fails. What happens if this happens? What happens if that happens?
Now consider risk degradation. Case studies of major industrial disasters and major financial disasters have shown that over time there is a gradual degradation of the risk management process because systems are not maintained and audits are not done. These systems fail incrementally, and for a while as they fail, nothing seems to change. When the first light bulb goes out in your house, you may not change it because other lights are still on.
When one system fails and there are no obvious adverse circumstances, people may conclude that redundant systems are not necessary. The organization becomes desensitized to risk so that, over time, the probability that the degradation in the risk system will be addressed actually declines. Finally, a minor incident creates an interaction among these various failing systems that results in a major disaster, such as the disaster in Bhopal, which could have been prevented if the risk management systems had been maintained.
Risk management is an ongoing process that must be cared for and tended to as you go forward. Most financial services now are very humble about their ability to measure risk. We know that it is “fat-tailed,” but we haven’t come up with distributions that reflect how it actually behaves. Instead, we try to allow for very large margins of error. We do a lot of stress testing; we run the worst possible conditions through the model and see if anybody is left standing at the fault line. For example, a large life insurance company can stress its portfolio by assuming simultaneous 8.5 earthquakes in Los Angeles and Tokyo. Risk migration and risk ignorance can be addressed through risk mapping, a reengineering process that asks what could possibly go wrong at every point in the process.
The best way to manage risk is through real-time audits. That is the only way you can control agency risk. Real-time audits often reveal degradation in the risk management processes. Auditors have never been liked because they seem to be second guessing or interfering with procedures. However, financial services organizations with an “audit culture” are among the best trading houses on Wall Street. Some star traders have very aggressive auditors who walk the floors and call traders off the floor at any time to question their actions. Traders who make millions of dollars a year for their organizations and for themselves are not afraid to be second-guessed.
Another approach is what the aviation industry calls a “cockpit culture,” in which there are frequent communications and discussions in the cockpit. Cockpit cultures are based on the idea that any member of the team can challenge what is going on at any time. I think that type of team culture can be substituted for an audit, basically relying on internal challenges rather than external challenges.
Can Purchasers Leverage Engineering Principles to Improve Health Care?
Pacific Business Group on Health and The Leapfrog Group
Most purchasers wish we didn’t have to think about the question in the title of my presentation. Most purchasers would like the health care industry to adopt quality engineering methods as a natural expression of professional responsibility; and we would like our health insurance beneficiaries to select only quality-engineered providers as an expression of informed consumerism. However, the three Institute of Medicine (IOM) reports on quality and rapidly increasing health care costs have persuaded large purchasers to consider how they might use their unique role to accelerate American providers’ journey to engineered care delivery.
Waiting for other stakeholders to solve the problem is not a promising option. When I ask consumers, like my mother, why she isn’t a prudent buyer, she replies, “When I am well, I don’t want to think about health care. When I am sick, I want to be able to trust that my treatment will be error-free. When I go to doctors’ offices and hospitals, big white certifications with gold seals are hanging on the wall. I’d prefer to rely on them rather than be skeptical.”
When I remind regulators that “Our moms are relying on you,” they reply, “It’s the tax cuts. We don’t have the budget to ensure quality, so we rely on accreditors.”
When I ask accreditors about the IOM reports and the hospitals they certify, they reply, “You force us to rely on providers to pay us for our accreditation activities. If we become too demanding, they will find a more tolerant accreditor.” When I ask hospitals and doctors about high average national rates of quality failure and the IOM reports, they reply, “We don’t believe that our personal error rates are as bad as the national average. To achieve perfect care, we’d probably have to hire quality engineers and buy complex clinical information systems. Where is the money for that? Insurers don’t pay us any more for these things.”
When we then turn to each other in the purchaser community, we agree that we have to do something about this. But many of us are understandably cautious, reasoning, “If we begin to get aggressive and limit our insurance plan networks to providers that are engineering high quality into their care, we will surely receive many complaints from our insurance beneficiaries that we are restricting their access to the doctors and hospitals they know and love. Then our careers will be at risk. We can only go as far as our beneficiaries/ consumers will let us go.”
So we are back to our starting point in the “circle of nonaccountability” with consumers. Apparently, everyone is responsible for improving quality via better engineered care delivery methods, but no one feels accountable for its occurrence. Until every stakeholder has more responsibility for solutions, we aren’t likely to make much progress. How can purchasers leverage engineering principles to advance the interests of all stakeholders?
Several options are available. First, purchasers can use various purchaser-mediated rewards to encourage health plans and providers to adopt engineering methods. Differential rewards could be offered to plans and providers who widely apply general engineering methods, such as the 80/20 principle, design for safety, mass customization, continuous flow production, and other methods that have worked well in other complex, high-risk industries. The most practical method of implementation may be to develop a meaningful ISO-type certification in health care and to make comprehensive, publicly released performance measurements available. We are very far from having anything like that today, at least not at a level that inspires confidence.
Another approach would be to use systems analysis to identify narrow, high-yield single “ingredients” (e.g., uptake of electronic clinical information systems or implementation of robust disease registries to provide continuous, stratified population risk scores). We could select a menu of tangible, multifaceted “best-operating practices,” based on nationally distinguished care redesign efforts, such as the idealized design of clinical office practice or RWJ’s Pursuing Perfection winners, and reward other providers that adopt them or health plans that encourage their adoption. The Leapfrog Group
implemented a variant of the “single ingredient” approach by initially adopting three tangible operating practices, including computer physician order entry (CPOE), which improvement experts predicted would lead to big leaps in the safety of American hospital care.
However, rewarding single or multiple structural ingredients carries the risk of not fitting all providers equally well, and they are subject to implementation flaws. Accordingly, they may not lead to better performance. We may best use them as a stopgap until robust provider performance measurements are routinely available, if our prioritization of the structural ingredients that we encourage is evidence-based and strategic. One of the attractive features of tangible improvements like CPOE is that a purchaser or insurer can easily determine if a provider has implemented it. It is much harder to assess implementation of broad engineering principles, such as continuous flow production. For this reason, purchasers understandably favor narrow, less flexible, tangible engineering advances over the implementation of broad engineering principles.
Besides purchaser-mediated rewards, purchasers can apply engineering principles to their own purchasing processes. In the world of health care purchasing, there is no clear consensus on intermediate outcomes or the best way to pursue them. We operate in what systems engineers call a “zone of complexity,” so we must focus on simple rules, good-enough vision, and room for innovation. The Leapfrog Group’s approach of focusing on tangible operating practices aligns well with this heuristic from complex, adaptive systems thinking. The Leapfrog Group advocates a few simple, good-enough purchasing rules:
Hold purchasers responsible for rating their highest volume providers directly or via their plans.
Offer purchasers multiple methods for rewarding higher provider performance and creating a “business case” for quality and quality improvement.
Test each purchaser member’s aggregate improvement incentives by applying Leapfrog’s criterion that every year the percentage of the patient population receiving care from a provider that adopts the three Leapfrog safe practices must increase at a statistically significant rate. If not, the Leapfrog purchaser must notch up its provider rewards until this rule is met or drop out of the group.
Encourage consumers to take an interest in differences in quality of care ratings for providers.
Make the “back bencher purchasers” visible. We want Leapfrog purchasers to be clearly distinguished from other purchasers. It has been easy for purchasers to talk about quality, but to do very little about it.
Obviously, the application of complex, adaptive systems thinking to the purchase of health care is still in an embryonic stage. Leapfrog purchasing principles illustrate an intuitive, initial application. The concept of engineered purchasing warrants further development.
Let me close by briefly addressing a pivotal engineering challenge for all institutional stakeholders—the need for consumers and physicians to recognize the magnitude of current quality failure in health care in their own work. Research in social science by Kahneman, Tversky, and others is available on which to base new approaches, but applications have been few. As long as we continue to permit poor quality to remain invisible, purchasers and consumers will have trouble becoming robust advocates for quality care, and providers will only slowly incorporate engineering knowledge into their work. Today, quality defects are largely invisible to most stakeholders. Until we find a better way of addressing the invisibility problem, it is going to be hard to motivate any of the key stakeholders to apply the rich resources of engineering knowledge to improving health care.
Shibboleths in Modeling Public Policy1
Richard P. O’Neill
Federal Energy Regulatory Commission
Over the last 25 years, the principal direction of the government’s modeling of public policy in the energy area has been to analyze the effects of more market-driven and incentive-driven outcomes. Similar efforts have been made in health care. Many of the regulations hastily put in place in the 1970s after the oil embargoes and price run-ups are still being unraveled today. As a result, paradigms that have been accepted for more than a century are changing.
People consume some services and commodities without knowing the price, then pay the bill without fully understanding how the price was determined. One of those commodities has been electricity. Attempts to create market forces in this area have been made since the 1970s, when legislation was passed to begin to open up natural gas and electricity markets. But paradigms shifts are not easy.
One of the most interesting paradigm shifts in history took place during the tenure of Pope Urban VIII. In 1530, Copernicus published a book stating that the earth revolved around the sun. At the time, Church theology held that the earth was the fixed, immovable center of the universe. But Galileo read Copernicus, looked at the skies through his newly invented telescope, and agreed with him. Soon after, Galileo published the Dialogue, his most controversial work, which presented the arguments for and against heliocentrism. The Inquisition banned the book, and Galileo was found guilty of heresy and condemned to spend the rest of his life under house arrest (in a palace). Writings by Copernicus and Galileo were placed on the Church’s index of forbidden works, where they remained for more than 200 years. All of this happened despite the fact that the Church had been debating the truth of Copernican discoveries for decades and despite Pope Urban VIII’s admiration for Galileo.
Western science doesn’t always get things right the first time. Priestley, who is usually credited with discovering oxygen, went to his grave believing in the phlogiston theory of combustion. History shows that paradigm shifts are difficult.
In our day, the move from a centralized, regulated system of energy to a more decentralized system based on competitive incentives has been very difficult. In some ways, the electricity system is like a hospital, a centrally run institution with many agents (e.g., doctors, nurses, and administrators) operating with different incentives—some of them at odds with the overall mission of the organization. In the energy system, a key goal has been getting the incentives right. Very quickly, one realizes that entrenched cultural beliefs present major barriers to change. Some social scientists believe that cultural paradigm shifts can take several generations.
Modelers are often compared to carpenters with hammers looking for nails; if they find a screw instead of a nail, they pound it anyway. Many in the energy field assume that the market was in a Nash equilibrium (i.e., entities may not collude explicitly, but they do collude implicitly) in part because of a popular book, successful movie and Nobel prize on Nash’s life and work. Much modeling was done based on Nash’s theory, but it turned out that there was explicit collusion in Western markets.
Different jargons and market dialects often present barriers to paradigm shifts. Enormous efforts have been made to introduce competition and competitive market paradigms over the last quarter century, but many people in the field have been trained to think and speak in cost-of-service or cost-base dialect. In fact, a huge segment of the industry still talks and thinks in this dialect. Like Eskimos who have many words for snow but few words for heat stroke, market participants trying to talk about auctions and market processes do not have the appropriate grammar or vocabulary to discuss the topic.
Small, unwritten rules matter. In New Zealand, unwritten rules for government-owned electricity corporations were an
important factor in market outcomes. In other countries, when government-owned electric assets were sold to private interests, often one of the first things the CEOs did was increase their own salaries and buy private jets—hardly a confidence builder for competition.
Some popular analogies in policy discussions about electricity market reforms are to the natural gas market and the air traffic control system. Natural gas is a poor analogy for the electricity market, because natural gas can be economically stored. There is no simple equivalent of a valve in the electricity system. Some have argued that the electricity system controller should be like an air traffic controller, meaning there should be no central market-control process. These same people seem to want rules that direct behavior with no regard for cost. Most of these analogies are misleading because, even though they argue for market forces, they also lead either to greater socialization or easier manipulation of the market.
The California electricity market is an interesting case study. In the early 1990s, there was a great deal of discussion about liberalizing the California market. Technical people spent two years designing the new market, but when politicians got involved, they threw out all of the market designs and cut a deal that emerged as legislation (AB1890), which was passed unanimously. Environmentalists agreed to support the plan if they could be given money for their programs. Marketers pushed for a bad market design to ensure more profits for themselves later.
Employees of utilities who had been involved in the discussions before the compromise were told not to discuss the previously proposed market designs. A number of staff at the Federal Energy Regulatory Commission noted that the model did not provide good incentives and that the process could get out of control. For two years, the legislatively mandated market design underwent constant changes. Confusion and gaming masked what was to happen—no new generators, demand growth, and a drought. In the end, the state risked everything on a model designed by a legislative committee.
Prices in the wholesale electricity market are typically in the range of $30 to $50 per megawatt-hour. In California and most of the west, spot prices remained more than $100 per megawatt-hour for months. Then, suddenly, prices dropped, for a number of reasons: the utilities lost their credit ratings; the governor bought power under high-priced, long-term contracts; new generators came on line; and the weather changed.
Since then, hundreds of thousands of dollars have been spent on litigation to determine what went wrong and who should be punished. Interestingly, the smartest people turned out to be the consumer representatives who had been skeptical about the program and had called for a retail rate freeze and guaranteed rate reductions. Initially, they came out looking good, but in the end, the rate freeze contributed to the market disequilibrium. Consumers will be paying for these mistakes for years to come.
In retrospect, it can be seen that another player in this market caused a lot of the mischief. That player constantly proposed and funded campaigns for market designs that would not work, but that it could take advantage of. That player supported an array of approaches that all eventually failed. That player was Enron, the darling of Wall Street at the time. After Enron went bankrupt, it released a memo outlining the strategies it had used to manipulate the market in California. Wall Street lost its exuberance and abandoned its desire for Enron clones.
Some interesting questions have been asked about the California experience. For instance, was this a six-sigma event (i.e., was the outcome a low-probability event)? Or was this a one-sigma or two-sigma event (i.e., the wrong paradigm)? If we look at the debacle as an enormous experiment, some good may come of it. Theory can predict performance, and theory predicted that the market design in California would fail under stress. The California experiment cost billions of dollars, but it did prove the strength of the theory.
Another lesson we learned from California is that incentives matter. Financial incentives are very important, in energy markets and in the health care market. Fee-for-service versus salary systems can change the incentives significantly. You always have to ask what incentive a doctor or dentist has to cure a problem; non-monetary incentives are far more important in health care than in the energy industry. Dedication is a very important factor in health care and hard to model. Another issue is the principal agent problem—who acts on the patient’s behalf? Programming this behavior into a model is difficult.
Bad incentives yield bad practice. Enron is a case in point in the energy market. Good accounting theory is mark-to-market accounting. In theory, mark-to-market accounting is theoretically correct, but in practice, in thin markets the market price is prone to manipulation. In fact, it can often simply be made up. Enron took advantage of this gap and produced a grossly distorted set of accounts.
In energy and health care markets, good market designs can yield benefits. Here are some don’ts in market design: don’t oversimplify; don’t create gaming opportunities; don’t favor large players; don’t use market jargon or model jargon to explain things; don’t ignore the extreme model outcomes, because outliers often provide interesting information and stories. The modeler should be immersed in the problem. Outsiders may be brought in to help, but somebody must intimately understand the model, as well as the process being modeled.
Clients should understand that models provide insights. Forecast are often wrong, but everyone must forecast. Forecasting often involves insight into possible outcomes, rather than numbers. The model must be tuned based on experience. A model should not be a black box. It should be available to people; it should be testable; and it should be auditable.
A lesson for health care modeling is that modeling can be a useful tool, but a healthy skepticism is necessary for success.
Matching and Allocation in Medicine and Health Care
Alvin E. Roth
Many of the previous speakers have considered hospitals analogous to factories. But, unlike factories, hospitals are highly decentralized, and many of the important decision makers, including doctors (not to mention patients), aren’t employees of the hospital; they come to the hospital on their own patient-care missions, and they have their own objectives.
To efficiently allocate resources to serve these different objectives, it becomes necessary to elicit information from the people who have it. But eliciting information isn’t always simple, because the information we can elicit depends in part on what we plan to do with it. When you ask me about something I know, I want to ask you why you are asking. What you intend to do with my answers will influence the way I answer you.
That is, what information we can reliably obtain depends in part on how we use it and what incentives this gives the people from whom we must get the information. My own most relevant experience of these issues in a medical context comes from redesigning the resident match, so I’ll start my discussion there. Then I’ll suggest how similar “strategic” issues might arise in organ transplantation, scheduling operating rooms, etc.
Hospitals only began offering internships about a hundred years ago. Typically, a student graduated from medical school, then looked for a job at a hospital. By the 1920s, interns had become a significant part of the labor force in hospitals; and internships had become an important part of the career path of doctors. Hospitals began to try to get good interns by hiring them a little bit earlier than their competitors. Gradually, hiring began earlier and earlier, and by 1945, hospitals were hiring medical students as early as the end of their sophomore year of medical school for internships that would begin only after graduation. As a result, residents were being hired so early in their education that it was very hard for residency programs to distinguish the best candidates, or even for candidates to be sure what kind of residency program they would be interested in. In 1945, medical schools intervened by refusing to release any student information before a certain date—no transcripts, no letters of recommendation, no confirmation that a student was in good standing in medical school. It may have been risky to hire someone just on the basis of sophomore-year grades, but it was even riskier to hire someone just because he said he was a medical student. So, this intervention was successful at controlling the dates of appointment, and as this became apparent, the date of appointment was successfully moved later, into the senior year of medical school, when more information about students’ abilities and preferences was available for finding appropriate matches of students and hospitals.
But, between 1945 and 1950, a new problem appeared. In 1945, hospitals were all supposed to wait until a given day to make offers and give students 10 days to accept or reject those offers. What happened? Consider a student who got an offer from his third-choice hospital and had 10 days to decide. Suppose that student also heard from his first- or second-choice hospital, saying they liked the student but were not making an offer yet; the student had been placed on a waiting list in case some of the offers they had made were rejected. So, the student waited, which was easy to do because he had 10 days to decide about the offer from his third-choice hospital. If all students waited those 10 days, the waiting lists didn’t move, and on the tenth day bad things happened. The student might have accepted his third-choice offer and then, later in the day, received a more preferred offer. The student might have accepted that too. If, after even only a modest delay to gather his courage, he informed his third-choice hospital of his change of heart, students whom that hospital would have liked to hire may have already committed to other hospitals. (Obviously the hospital’s problem could be even worse if a long time passed before they realized they had an unfilled position.) On the other hand, even if the student felt honor bound to decline a late, more preferred offer, he might have spent the next year very unhappily at his third-choice hospital, explaining to all his colleagues why
a talented doc like him shouldn’t have been working in a place like this. Either way, there was a lot of unhappiness.
Given that all these troubles had occurred on the tenth day, in 1946 hospitals agreed to allow only eight days for offers to remain open. As you might imagine, this didn’t solve the problem. By 1949, residency programs were giving exploding offers—students had to accept or reject immediately, without knowing what other offers might be forth-coming. So, once again, decisions were being made without all the information that might be available.
In the early 1950s, a radical innovation was tried—a centralized clearinghouse. Graduating medical students submitted to the clearinghouse a list, in order of preference, of the residency programs at which they had interviewed. Residency programs similarly ranked students they had interviewed. These rank order lists—that is, the information elicited from the participants in the market for residents—were then used to match students to residency programs. And although this system has evolved over the years to take account of changes in the medical marketplace, it has survived to the present day in something close to its original form, as the National Resident Matching Program. (I had the privilege of directing the most recent redesign of the matching algorithm.)
The surprising thing that was observed in the 1950s is that most positions were filled as matched: that is, students and residency programs submitted their rank order lists and then went on to sign the employment contracts suggested by the match. We now understand that this wasn’t inevitable, but it came about because the match algorithm that was chosen in the 1950s produced matches that were stable, in the sense that there were never “blocking pairs” consisting of a student and a residency program that were not matched to one another but that would both have preferred to be matched to one another rather than matched to their actual partners.
It is easy to see in principle why a clearinghouse that produces unstable matches might not succeed. A student who receives a match with her third-choice hospital, for example, only has to make two phone calls to find out if she is part of a blocking pair. She calls her first- and second- choice hospitals and says, “before I accept my match outcome, I just wanted to check if you might have a position for me.” If she is part of a blocking pair, then one of the hospitals will see that they prefer her to someone with whom they are supposed to match. They might say something like, “by chance we have an extra position…” and then call up the candidate they liked less and say they’ve had a budget shortfall and are one position short. But if the match is stable, when the hospital looks at the list of people with whom it is supposed to match, it sees that it would prefer to go ahead with the match. To put it another way, if the match is stable, no candidate can find a hospital that she would prefer to go to that is willing to take her.
One way the importance of stability became evident had to do with the growing number of couples graduating from American medical schools who wanted to find two residency positions in the same city. The number of couples increased in the 1970s, as medical schools stopped being overwhelmingly male. An attempt was made to accommodate couples by allowing them (after being certified by their deans as a “genuine” couple) to indicate that they wished to be matched to residencies in the same city. Then each individual submitted a rank order list, as if they were single, except that they were asked to specify one member of the couple as the “leading” member. The leading member went through the match as if single, and the rank order list of the other member was then edited to remove options that were not in the same city as the residency to which the leading member had matched.
Although this procedure did give couples two jobs in the same city, many couples started to find their residencies outside of the match, and it is easy to see why. Suppose that my wife and I have as our first choice two particular, good positions in Boston. Our second choice would be to get two particular positions in New York. If instead we get one good job in Boston and one bad one, we’re not going to be very happy (because of the Iron Law of Marriage, which says you can’t be happier than your spouse). So an instability may exist: when we call the two residency programs in New York, they may be happy to take us, which now leaves the Boston jobs unfilled and some people who were matched to the New York jobs scrambling to find new ones.
So, a failure to elicit the right kind of information (the preferences of couples) contributed to a decline in the effectiveness of the match by giving couples incentives to circumvent the clearinghouse. The present match deals with that by allowing couples to submit rank order lists of pairs of positions. Last year about 550 couples (1,100 people) participated in the match as couples.
Another way the importance of stability became clear was through the experience of British doctors. In the 1960s, the British began to experience the same kind of troubles the American medical market had experienced before 1945. But in Britain, different regions of the National Health Service adopted different kinds of centralized clearing-houses. Some produced stable matches, and some did not. The stable systems are still working; but most of the unstable ones failed, sometimes quite dramatically, even though the National Health Service can mandate that jobs be filled through the centralized clearinghouse.
But participants learned to circumvent unstable clearing-houses. In the Birmingham area, for example, after a few years, the majority of the rank order lists submitted by students contained only a single position, and hospital programs in turn listed only the students who listed them in this way. In other words, by the time the lists were submitted, the matching of students to positions had already been determined privately, in advance, by the parties, and they wrote each other’s names down and that was that. That is, people can often find ways to circumvent even compulsory systems,
if they have incentives to do so. In contrast, stable mechanisms that do not give people incentives to get around them can function efficiently for years.
Before I move on to topics more directly related to patient care, let me just mention that no design of an allocation or matching system can be successful unless it is first adopted for use. So part of the design process is the adoption process. The question of how radical changes are adopted is ultimately political. Those who want to see their work implemented need to understand the objections to it, the fears it may arouse, and what constituencies are concerned. Because complex systems in which information is decentralized are subject to being gamed and circumvented, these “political” concerns need to be addressed carefully.
What are the lessons of these kinds of matching processes for allocation issues more directly concerned with patient care? People don’t get sick because of incentives, so you might think that incentives, which are such a big deal in labor markets, won’t play a big role in allocation decisions directly concerned with health care.
But consider organ transplants. There are about 80,000 candidates on various waiting lists for organs. Last year, about 22,000 organs were harvested from 11,000 donors. There is scarcity here and real questions about allocation. Over time, the United Network for Organ Sharing has made many modifications in the system allocating these scarce organs. There are waiting lists, with priorities based on criteria such as time on the list and current health.
While the details of the allocation rule will certainly affect who gets which organ (or who gets an organ and who does not), it might not be clear how the incentives created by different allocation rules can affect the overall efficiency of allocation. To get an idea of this, consider the case of pediatric heart transplants.
Congenital heart defects can now be discovered in utero. When priority started to be given to patients with greater time on the waiting list, pediatric cardiologists began to put their patients on the waiting list while they were still in the womb. If a heart became available before the pregnancy was full term, it was often nevertheless in the patient’s interest to perform a C-section, so that the baby would get the heart. That meant that donor hearts started going into babies who were not full term and were lower birth weight, which isn’t good for the overall survival rate. Now the system has been modified so that fetuses can be on a waiting list, but in a different category than already born pediatric patients. But giving more priority to time on the waiting list changed the incentives of pediatric cardiologists and changed the flow of hearts into babies in an unanticipated and not necessarily positive way.
This brings me back to the game theory observation with which I began—when agents have different objectives (e.g., when each doctor is concerned with managing his own patients), how information is used to make allocations affects the incentives of those who have the information in ways that can alter the allocations in unintended ways. Many aspects of the allocation process for organs involves these issues, from the debate over regional versus national waiting lists to the priorities that should be given to different kinds of candidates (e.g., chronic versus acute illness). And patients, as well as doctors, can act strategically based on their incentives, as when a given patient may be able to place himself on multiple regional lists, for example.
Similarly, other medical allocation issues involve information that must be elicited from interested participants. For instance, one of the big issues in scheduling an operating room that is used by many surgeons is how long a given operation will take. How an operation is described can influence its estimated duration, which in turn influences what resources it is allocated. To make appropriate allocation and scheduling decisions, it is first necessary to elicit information, and what information is delivered depends on how that information will be used.
This is of course a common issue in markets. And because doctors often run their own businesses, the business of the hospital interacts with the business of the market. So we need to remain aware that anything done inside a hospital interacts with all of the other things that go on in the medical marketplace outside of the hospital.
In summary, to do allocation well, information is needed. When information is decentralized, it still must be found. One of the things that makes systems in which information is decentralized different from those in which it is centralized is the importance of incentives and the constraints that incentives put on what can be done. In the medical market for residents, there is a lot of evidence to support the contention that the stability constraint is binding. As we start to think about how to elicit information to make allocation decisions in other systems, we will have to pay attention to the incentive constraints.