would explain such broad-based and substantial improvements in morbidity and mortality rates.
Among the downsides, outcome measures are generally impractical for assessing the quality of most types of ambulatory medical care. Easily measured endpoints like mortality occur too infrequently or far downstream of the care being assessed. Patient-centered measures (e.g., health status) are much more difficult to collect and generally reflect illness severity in addition to provider quality.
Although outcome measurement is more practical and widely applied in surgery, hospital- or surgeon-specific outcome measures are severely constrained by small sample sizes. For the large majority of surgical procedures, very few hospitals (or surgeons) have sufficient adverse events (numerators) and cases (denominators) for meaningful, procedure-specific measures of morbidity or mortality. For example, Dimick and colleagues used data from the Nationwide Inpatient Sample to study 7 procedures for which mortality rates have been advocated as quality indicators by the AHRQ (Dimick et al., 2004). For 6 of the 7 procedures, a very small proportion of U.S. hospitals had adequate caseloads to rule out a mortality rate twice the national average. Although identifying poor-quality outliers is an important function of outcomes measurement, focusing on this goal alone significantly underestimates problems with small sample sizes. Discriminating among individual hospitals with intermediate levels of performance is more difficult.
Other limitations of direct outcomes assessment depend on whether outcomes are being assessed from administrative data or clinical information abstracted from medical records. For outcomes measurement based on clinical data, the major problem is expense. For example, it costs over $100,000 annually for a private-sector hospital to participate in NSQIP.
With administrative data, the adequacy of risk adjustment remains a major concern. High-quality risk adjustment may be essential for outcome measures to have face validity with providers. It may also be useful for discouraging gaming, e.g., hospitals or providers avoiding high-risk patients to optimize their performance measures. However, it is not clear how much the scientific validity of outcome measures is threatened by imperfect risk adjustment with administrative data. There is no disagreement that administrative data lack clinical detail and systematically under-represent patient comorbidities and other clinical variables related to baseline risk (Finlayson et al., 2002; Fisher et al., 1992; Iezzoni, 1997; Iezzoni et al., 1992).