sure demonstrates statistical significance overall, at an aggregated level, does not mean that it will reliably distinguish performance at a detailed individual level (Andersson et al., 1998; Hixson, 1989; Silber et al., 1995). Consequently, many hospital and physician ranking systems based on outcome measures perform poorly (Blumberg and Binns, 1989; Green et al., 1997; Greenfield et al., 1988; Jollis and Romano, 1998; Krumholz et al., 2002; Marshall et al., 2000).
Comparisons of risk-adjusted quality outcomes, when applied for purposes of accountability, work best when they are narrowly focused (e.g., on a single clinical entity); when the underlying patient factors that affect outcomes are well understood; and when those performing the comparisons can access accurate, complete, standardized patient data at a high level of clinical detail. For example, several measurement systems for risk-adjusted mortality outcomes for open heart surgery can account for more than 60 percent of all observed variation in those outcomes (Hannan et al., 1998; O’Connor et al., 1998).
Most health care settings lack the necessary data to support accurate risk adjustment and ranking of providers. Standardized clinical data are not captured as part of the care delivery process. The HCFA mortality reports were produced from Medicare claims data, which lack important clinical detail. Accuracy is also a problem in claims data (Green and Wintfeld, 1993).
On the other hand, learning systems exhibit a high tolerance for imperfect data and an ability to use such data productively. When used for process improvement, risk adjustment removes variation arising from patient factors that are beyond the care delivery system’s control and makes the effects of process changes more clearly identifiable (i.e., it improves the signal-to-noise ratio), an aim quite different from that of accountability systems of improving predictive value. For example, a risk adjustment model that accounts for only 25 percent of outcome variability (by modeling out the contribution of known cofactors) can significantly improve a team’s ability to see structure in the data or determine more accurately whether a process change has improved outcomes. The same risk adjustment likely would not improve comparative outcome data to the point where they could reliably rank care providers for accountability.
All efforts to improve safety and quality through the use of performance data run the risk of instilling fear and provoking defensive behavior on the part of providers. Scherkenbach outlines three sequential factors in that re-