When developing the criteria that could be used to select social risk factors that should be accounted for in Medicare value-based payment programs, the committee reviewed existing criteria for selecting risk factors for risk adjustment models from the literature. These include criteria, principles, and other guidance from
- Centers for Medicare & Medicaid Services Hierarchical Condition Categories (CMS-HCC) model for risk adjustment of Medicare capitation payments (Pope et al., 2004);
- Consumer Assessment of Healthcare Providers and Systems (CAHPS) Hospital Survey case-mix adjustment (Elliott et al., 2009; O’Malley et al., 2005);
- Department of Health and Human Services (HHS)-HCC risk adjustment model for individual and small group markets under the Affordable Care Act (Kautter et al., 2014); and
- The National Quality Forum 2014 report Risk Adjustment for Socioeconomic Status or Other Sociodemographic Factors.
The criteria reviewed are excerpted below.
The following 10 principles guided the creation of the diagnostic classification system.
Principle 1—Diagnostic categories should be clinically meaningful. Each diagnostic category is a set of ICD-9-CM [International Classification of Diseases, 9th Revision, Clinical Modification] codes (CDC, 2013). These codes should all relate to a reasonably well-specified disease or medical condition that defines the category. Conditions must be sufficiently clinically specific to minimize opportunities for gaming or discretionary coding. Clinical meaningfulness improves the face validity of the classification system to clinicians, its interpretability, and its utility for disease management and quality monitoring.
Principle 2—Diagnostic categories should predict medical expenditures. Diagnoses in the same HCC should be reasonably homogeneous with respect to their effect on both current (this year’s) and future (next year’s) costs. (In this article we present prospective models predicting future costs.)
Principle 3—Diagnostic categories that will affect payments should have adequate sample sizes to permit accurate and stable estimates of expenditures. Diagnostic categories used in establishing payments should have adequate sample sizes in available data sets. Given the extreme skewness of medical expenditure data, the data cannot reliably determine the expected cost of extremely rare diagnostic categories.
Principle 4—In creating an individual’s clinical profile, hierarchies should be used to characterize the person’s illness level within each disease process, while the effects of unrelated disease processes accumulate. Because each new medical problem adds to an individual’s total disease burden, unrelated disease processes should increase predicted costs of care. However, the most severe manifestation of a given disease process principally defines its impact on costs. Therefore, related conditions should be treated hierarchically, with more severe manifestations of a condition dominating (and zeroing out the effect of) less serious ones.
Principle 5—The diagnostic classification should encourage specific coding. Vague diagnostic codes should be grouped with less severe and lower-paying diagnostic categories to provide incentives for more specific diagnostic coding.
Principle 6—The diagnostic classification should not reward coding proliferation. The classification should not measure greater disease burden simply because more ICD-9-CM codes are present. Hence, neither the number of times that a particular code appears, nor the presence of additional, closely related codes that indicate the same condition should increase predicted costs.
Principle 7—Providers should not be penalized for recording additional diagnoses (monotonicity). This principle has two consequences for modeling: (1) no condition category should carry a negative payment weight, and (2) a condition that is higher-ranked in a disease hierarchy (causing
lower-rank diagnoses to be ignored) should have at least as large a payment weight as lower-ranked conditions in the same hierarchy.
Principle 8—The classification system should be internally consistent (transitive). If diagnostic category A is higher-ranked than category B in a disease hierarchy, and category B is higher ranked than category C, then category A should be higher ranked than category C. Transitivity improves the internal consistency of the classification system and ensures that the assignment of diagnostic categories is independent of the order in which hierarchical exclusion rules are applied.
Principle 9—The diagnostic classification should assign all ICD-9-CM codes (exhaustive classification). Because each diagnostic code potentially contains relevant clinical information, the classification should categorize all ICD-9-CM codes.
Principle 10—Discretionary diagnostic categories should be excluded from payment models. Diagnoses that are particularly subject to intentional or unintentional discretionary coding variation or inappropriate coding by health plans/providers, or that are not clinically or empirically credible as cost predictors, should not increase cost predictions. Excluding these diagnoses reduces the sensitivity of the model to coding variation, coding proliferation, gaming, and upcoding.
In designing the diagnostic classification, principles 7 (monotonicity), 8 (transitivity), and 9 (exhaustive classification) were followed absolutely. For example, if the expenditure weights for our models did not originally satisfy monotonicity, we imposed constraints to create models that did. Judgment was used to make trade-offs among other principles. For example, clinical meaningfulness (principle 1) is often best served by creating a very large number of detailed clinical groupings. But a large number of groupings conflicts with adequate sample sizes for each category (principle 3). Another trade-off is encouraging specific coding (principle 5) versus predictive power (principle 2). In current coding practice, non-specific codes are common. If these codes are excluded from the classification system, substantial predictive power is sacrificed. Similarly, excluding discretionary codes (principle 10) can also lower predictive power (principle 2). We approached the inherent trade-offs involved in designing a classification system using empirical evidence on frequencies and predictive power, clinical judgment on relatedness, specificity, and severity of diagnoses, and the judgment of the authors on incentives and likely provider responses to the classification system. The DCG [Diagnostic Cost Group]/HCC models balance these competing goals to achieve a feasible health-based payment system (Pope et al., 2004).
Our criterion for selection of case-mix adjustors is the “impact factor,” which is the product of two measures: predictive power (the strength of the relationship between the candidate adjustor and the outcome variable at the individual level) and heterogeneity factor (the amount of variation among hospitals in the adjustor variable) (Zaslavsky, 1998). Predictive power quantifies the improvement in model fit (R2) attributable to a variable; unlike tests of statistical significance, it does not depend on sample size. The heterogeneity factor measures the extent to which the characteristic is unevenly distributed across hospitals and therefore potentially a source of bias in comparisons. A variable, such as gender, could be highly predictive of responses but have little impact on case-mix adjustment because its distribution is relatively homogeneous across hospitals. Conversely, a variable could have quite different distributions in different hospitals but be unrelated to the rating. By combining both predictive power and heterogeneity into a single measure, the impact factor is more informative than purely predictive measures such as R2; it approximates the magnitude of the incremental adjustments due to adding a variable to the case-mix model (O’Malley et al., 2005).
Explanatory power (Zaslavsky, 1998) was used to assess the relative importance of individual PMA [patient-mix adjuster] variables to hospital-level adjustment. Explanatory power is the product of two components: (1) the individual predictive power of a PMA variable (as measured by the improvement in R2 attributable to a candidate predictor) and (2) the hospital-level heterogeneity of a PMA variable (Elliott et al., 2009).
There are 264 HHS-HCCs in the full diagnostic classification, of which a subset is included in the HHS risk adjustment model. The criteria for including HCCs in the model are now described. These criteria were sometimes in conflict and trade-offs had to be made among them in assessing whether to include specific HCCs in the HHS risk adjustment model.
Criterion 1—Represent clinically significant, well-defined, and costly medical conditions that are likely to be diagnosed, coded, and treated if they are present.
Criterion 2—Are not especially subject to discretionary diagnostic coding or “diagnostic discovery” (enhanced rates of diagnosis through population screening not motivated by improved quality of care).
Criterion 3—Do not primarily represent poor quality or avoidable complications of medical care.
Criterion 4—Identify chronic, predictable, or other conditions that are subject to insurer risk selection, risk segmentation, or provider network selection, rather than random acute events that represent insurance risk.
Following an extensive review process, we selected 127 HHS-HCCs to be included in the HHS risk adjustment model. . . . Finally, to balance the competing goals of improving predictive power and limiting the influence of discretionary coding, a subset of HHS-HCCs in the risk adjustment model were grouped into larger aggregates, in other words “grouping” clusters of HCCs together as a single condition with a single coefficient that can only be counted once. After grouping, the number of HCC factors included in the model was effectively reduced from 127 to 100 (Kautter et al., 2014).
TABLE CA-1 Guidelines for Selecting Risk Factors for Adjustment
|Guideline||Rationale||Clinical/Health Status Factorsa||SDS Factorsb|
|Clinical/conceptual relationship with the outcome of interest||Begin with conceptual model informed by research and experience||✓||✓|
|Empirical association with the outcome of interest||To confirm conceptual relationship||✓||✓|
|Variation in prevalence of the factor across the measured entities||If there is no variation in prevalence across health care units being measured, it will not bias performance results||✓||✓|
|Not confounded with quality of care, risk factors should:||Trying to isolate effects of quality of care||✓||✓|
||Ensures not a result of care provided||✓||✓|
||Although these could explain variation in outcome, in performance measurement the goal is to isolate differences in performance due to differences in the care provided||✓||✓|
|Guideline||Rationale||Clinical/Health Status Factorsa||SDS Factorsb|
|Resistant to manipulation or gaming—generally, a diagnosis or assessment data (e.g., functional status score) is considered less susceptible to manipulation than a clinical procedure or treatment (e.g., physical therapy)||Ensures validity of performance score as representing quality of care (versus, for example, upcoding)||✓||✓|
|Accurate data that can be reliably and feasibly captured||Data limitations often represent a practical constraint to what factors are included in risk models||✓||✓|
|Contribution of unique variation in the outcome (i.e., not redundant or highly correlated with another risk factor)||Prevent overfitting and unstable estimates, or coefficients that appear to be in the wrong direction; reduce data collection burden||✓||✓|
|Potentially, improvement of the risk model (e.g., risk model metrics of discrimination—i.e., sensitivity/specificity, calibration) and sustained with cross-validation||Change in R-squared or C-statistic may not be significant, but calibration at different deciles of risk might improve. May not appear to be a big change but could represent meaningful differences in terms of the outcome (e.g., lives, dollars). Order of entry into a model may influence this result||✓||✓|
|Potentially, face validity and acceptability||Some factors may not be indicated empirically, but could improve acceptability—need to weigh against negative impact on model, feasibility and burden of data collection||✓||✓|
NOTE: SDS = sociodemographic status.
a Examples of clinical and health status factors include comorbidity; severity of illness; patient-reported health status, etc.
b Examples of sociodemographic factors include income; education; English language proficiency, etc.
SOURCE: NQF, 2014.
Centers for Disease Control and Prevention. 2013. International Classification of Diseases, Ninth Revision, Clinical Modification (IC+CD-9-CM). http://www.cdc.gov/nchs/icd/icd9cm.htm (accessed June 24, 2016).
Elliott, M. N., A. M. Zaslavsky, E. Goldstein, W. Lehrman, K. Hambarsoomians, M. K. Beckett, and L. Giordano. 2009. Effects of survey mode, patient mix, and nonresponse on CAHPS Hospital Survey scores. Health Services Research 44(2 Pt 1):501-518.
Kautter, J., G. C. Pope, M. Ingber, S. Freeman, L. Patterson, M. Cohen, and P. Keenan. 2014. The HHS-HCC risk adjustment model for individual and small group markets under the Affordable Care Act. Medicare & Medicaid Research Review 4(3)1-4.
NQF (National Quality Forum). 2014. Risk adjustment for socioeconomic status or other sociodemographic factors. Washington, DC: National Quality Forum.
O’Malley, A. J., A. M. Zaslavsky, M. N. Elliott, L. Zaborski, and P. D. Cleary. 2005. Case-mix adjustment of the CAHPS Hospital Survey. Health Services Research 40(6 Pt 2):2162-2181.
Pope, G. C., J. Kautter, R. P. Ellis, A. S. Ash, J. Z. Ayanian, M. J. Ingber, J. M. Levy, and J. Robst. 2004. Risk adjustment of Medicare capitation payments using the CMS-HCC model. Health Care Financing Review 25(4): 120-123.
Zaslavsky, A. M. 1998. Issues in case-mix adjustment of measures of the quality of health plans. In Proceedings, Government and Social Statistics Sections, edited by Alexandria, VA: American Statistical Association.
This page intentionally left blank.