Page 89 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

5

Proxies for Determining Listing-Level Severity

To assess whether health-care utilization is a good proxy for listing-level severity, the committee examines, in this chapter, what would make a good proxy. The starting point is that a good proxy for a health-care utilization or a combination of health-care utilizations would correctly classify people as having “listing-level severity” or “no listing-level severity” in a sufficiently large proportion of cases. Listing-level severity is defined in step 3 of the disability-determination process (see Chapter 1) as referring to an impairment that would qualify a person for Social Security Disability Insurance (SSDI). This chapter examines the characteristics of “classifiers” of listing-level severity and discusses the elements of a study that might predict utilizations that would be proxies for listing-level severity.

HEALTH-CARE UTILIZATIONS FOR CONSTRUCTING CLASSIFIERS OF LISTING-LEVEL SEVERITY

A set of health-care utilizations that can be used for classification of people as having listing-level severity or no listing-level severity might be defined as a classifier. An example of a classifier that is used in a number of the Social Security Administration (SSA) Listings to determine listing-level severity is “3 hospitalizations in the previous year occurring at least 30 days apart.” Thus, a person who has three or more such hospitalizations is classified as positive for having listing-level severity, and a person who has fewer than three hospitalizations is classified as negative for having listing-level severity. That classifier is admittedly simplistic in that it relies on a single type of health-care utilization (hospitalizations). Its use in isolation might not be optimal for accurate classification of people as having or not having listing-level severity. However, its simplicity facilitates our description of the main concepts below. The concepts can then be readily applied to more realistic classifiers that combine multiple health-care utilization measures.

CHARACTERISTICS OF CLASSIFIERS

As represented in Table 5-1, each person who is subject to the application of a classifier will end up in one of four categories: true positive, false positive, true negative, or false negative. In this context, a false positive results when a person who has no listing-level severity is

Page 90 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

incorrectly classified as having listing-level severity (because the impairment is not severe enough to result in an inability to participate in any gainful activity). Similarly, a false negative results when a person who has listing-level severity is incorrectly classified as not having listing-level severity.

TABLE 5-1 The Four Categories of People in a Binary Classification Problem

True Status
Classifier Result		Listing-level severity	No listing-level severity
	Positive	True positives	False positives
	Negative	False negatives	True negatives

Any study conducted to evaluate a classifier would require enough data to classify all people into one of the four cells of Table 5-1. That requirement implies that a standard of listing-level severity needs to be developed so that the true status of every person in a study is known. Ascertaining true status might be expensive or labor-intensive if it requires, for example, independent medical consultations with each disability applicant. However, an assumption will be made that the true status of each person in the study is known.

Ideally, a classifier by which the proportion of people who are classified as positive (i.e., who have listing-level severity) is 100 percent [(true positives)/(true positives + false positives) = 1.00] is known to have positive predictive value. Similarly, a classifier by which the proportion of people who are classified as negative (i.e., who do not have listing-level severity) is 100 percent [(true negatives)/(false negatives + true negatives) = 1.00] is known to have negative predictive value.

The positive predictive value depends on both the prevalence of listing-level disability in the pool of applicants and the discriminatory power of the classifier. The standard approach to evaluating the discriminatory power of a classifier is to calculate its sensitivity, the proportion of people who are positive among those who have listing-level severity [(true positives)/(true positives + false negatives)], and specificity, the proportion of people who are negative among those who do not have listing-level severity [(true negatives)/(false positives + true negatives)]. No classifier can have 100 percent sensitivity (zero false negatives) and 100 percent specificity (zero false positives). All classifiers make errors in discerning false positives and false negatives. Therefore, in choosing among classifiers, there is a need to make tradeoffs regarding the relative importance of those errors. Figure 5-1 illustrates the tradeoffs between false negatives and false positives for a classifier that is based exclusively on number of hospitalizations; the same logic would apply to any classifier.

Page 91 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

TRADEOFFS IN THE CLASSIFICATION OF LISTING-LEVEL SEVERITY

images — **FIGURE 5-1** An ideal classifier.

Figure 5-1 shows a hypothetical distribution of the number of hospitalizations conditional on having listing-level severity (first curve) or not having listing-level severity (second curve), respectively. In that population, all nondisabled people have a number of health-care utilizations that is lower than a particular threshold (e.g., less than three hospitalizations), and all disabled people have three or more hospitalizations. As a result, the classifier “less than 3 hospitalizations” perfectly classifies all members of the population. That is, all disabled people are “true positives,” all nondisabled are “true negatives,” and there are no false positives or false negatives. However, Figure 5-1 represents an unrealistic situation in that no classifier is likely to separate disabled and nondisabled people perfectly.

Figure 5-2 represents the more common situation in which no threshold of health-care utilizations can perfectly discriminate the two groups. Therefore, the classifier of the form “less than 3 hospitalizations” will result in some nondisabled people incorrectly classified as disabled (false positives) and some disabled people incorrectly classified as nondisabled (false negatives). The choice of a different threshold will result in different proportions of false negatives and false

Page 92 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

positives. For example, increasing the threshold to four hospitalizations will decrease the proportion of nondisabled that are false positives (and thereby increase specificity) and will decrease the proportion of disabled that are true positives (and thereby decrease sensitivity).

As Figure 5-2 shows, in a world with overlapping distributions of hospitalizations, one cannot reduce false positives without also reducing true positives, and vice versa.

The committee, for simplicity, has considered a unidimensional classifier, number of hospitalizations. In practice, one could consider multidimensional classifiers that are based on several measures of health-care utilization (as opposed to only number of hospitalizations). In fact, some proposed classifiers for medical conditions in the 21st century are based on hundreds or thousands of variables (e.g., variables obtained from insurance claims databases or electronic medical records) and take advantage of modern advances in machine learning algorithms. Regardless of the complexity of the classifier, it will rarely be able to provide a clean separation between disabled and nondisabled populations.

CHOICE OF CLASSIFIER AS A VALUE JUDGMENT

Inasmuch as any classifier has to find a balance between false positives and false negatives, the question is whether false positives and false negatives will be given the same weight or whether one of the two is considered more serious than the other. One would need to decide how to weigh false positives and false negatives in an effort to select a classifier.

In our setting, the classifier operates in the early steps of the screening process, and people who are classified as having listing-level severity (the positives) are automatically considered eligible and not considered further at later stages. In contrast, people who are classified as not having listing-level severity (the negatives) can be determined eligible at later stages. Therefore, from a societal point of view, a false positive might be viewed as a more expensive mistake than a false negative. A possible strategy might be to choose classifiers that err on the side of increasing false negatives rather than false positives, that is, classifiers that have higher specificity at the expense of lower sensitivity. However, the relative emphasis on the positive predictive value versus the negative predictive value depends on the goals of the classification. In any case, the decision to privilege false negatives over false positives, or vice versa, is a judgment that SSA would have to make.

Accordingly, a possible strategy might be to choose classifiers that err on the side of increasing false negatives rather than false positives, that is, classifiers that have higher specificity at the expense of lower sensitivity. In the simplified example of Figure 5-2, the threshold would be moved to the right (e.g., “less than 5 hospitalizations”) to minimize the proportion of false positives, as shown in Figure 5-3.

Page 93 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

A key challenge is that the positive predictive value depends critically on the prevalence of listing-level severity in the population of applicants who reach step 3. That is, even a classifier that has high discriminatory power, as measured by sensitivity and specificity, might have a low positive predictive value. For example, suppose that an excellent classifier that has 100 percent sensitivity and 90 percent specificity is developed. And suppose that the prevalence of listing-level severity among applicants is 20 percent, that is, only one-fifth of applicants are truly unable to do any gainful activity. Then it is easy to show that the positive predictive value will be only 71 percent. That is, using the almost implausibly good classifier, only 71 percent of people who are classified as having listing-level severity actually have listing-level severity. Thus, in that example, if 1 million applicants were evaluated according to the classifier, all 200,000 disabled applicants would be awarded benefits, but so would 80,000 nondisabled applicants.

All the above can be applied similarly to the negative predictive value, that is, the proportion of people who do not have listing-level severity among those who are classified as not having listing-level severity by the classifier. The relative emphasis on the positive predictive value versus the negative predictive value depends on the goals of the classification. Here, because of our context, we have focused on the positive predictive value.

The decision to privilege false negatives over false positives, or vice versa, is a judgment for SSA to make. The dependence of the positive (and negative) predictive value on both the classifier’s discriminatory power and the true prevalence of listing-level severity raise fundamental questions, of which these are prominent:

What is the lowest positive (or negative) predictive value of a classifier that SSA is willing to tolerate for step 3?
Given the expected prevalence of true disability, what levels of sensitivity and specificity are required to achieve at least the lowest value tolerated by SSA?
Given the available data on health-care utilizations and state-of-the-art predictive techniques, is the development of a classifier with those sensitivity and specificity levels achievable?
If the answer to question 3 is no, can a more modest role of the use of health-care utilizations in the classification of listing-level severity be defined?

Page 94 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

Answering those questions is instructive in assessing the likelihood that SSA will be able to use health-care utilizations for classification of listing-level disability. Answering question 1 involves a policy judgment. Question 2 will have an immediate answer as soon as an answer to question 1 is provided. Question 3 is an empirical question; answering it requires a research project to evaluate the performance of the classifier.

DESIGNING A STUDY

The performance of a health-care utilization or a combination of health-care utilizations as a classifier of SSDI applications is measured by its positive and negative predictive values for different cutoffs. Calculation of those values requires knowledge of the prevalence of listing-level severity among SSDI applicants who have a particular medical condition and of the distributions of health-care utilizations that are conditional on listing-level severity. That in turn requires knowledge of the true status (listing-level or non-listing-level severity) of each individual.

However, determining listing-level severity—defined by SSA as the inability to perform any gainful activity regardless of age, education, or work experience—is challenging. Observing that a person is not working does not necessarily mean that the person cannot work. Indeed, it has been noted, in some studies, that the disability-determination process itself provides applicants with a strong disincentive to work, inasmuch as any (substantial) gainful activity performed during or immediately before the application period can be used as evidence that the applicant can work (Maestas et al., 2013; Autor et al., 2015). At the same time, expert assessments of disability severity can be inaccurate and can lead to identification of some people who can work as unable and to identification of some who are unable to work as able. For some diagnoses, determination of listing-level disability might be straightforward because of the readily observed effects of the illness or disorder. For other diagnoses, determination of ability to work requires more subjective judgment, particularly for less visible sources of disability, such as impairments in cognition or emotional regulation and or chronic pain and fatigue.

Thus, a research design that compares health-care utilization distributions for applicants who equaled the Listings (and later manifested very low levels of post award work) with those who were denied at step 3 (and possibly allowed at step 5) could be used to approximate measures of sensitivity and specificity for various cutoff levels of health-care utilizations. Such a study could provide measures of the prevalence of listing-level severity among the population of interest. Of course, predictions that arise from any particular study will become outdated as the health system changes, and the study would need to be repeated periodically.

Together, those estimates could be used to determine the likelihood that calculation of positive (and negative) predictive prevalence would lead to mistaken listing-level severity for various candidate measures. If the accuracy of a prediction is sufficiently high, in the judgment of SSA, the health-care utilization could be considered an acceptable classifier for listing-level severity and could be incorporated into the Listing of Impairments.

One option is to confine interest to diagnoses for which expert assessment of listing-level severity is straightforward and reliant on objective measures. In those situations, expert assessment can be used as the gold standard with which health-care utilizations can be compared. However, when a disability is less readily observable, expert assessment alone might not constitute a gold standard and might need to be supplemented with additional information, such

Page 95 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

as work outcomes. Another option could be to use data on health-care utilizations and listing-level severity in the context of a disability insurance program without strong work disincentives.

Yet another possibility could be to merge administrative records of past SSDI applications with data on health-care utilization history available at the time of determination and data on postdetermination work outcomes. Thus, disability examiners (and later administrative law judges) who adjudicated the cases would be the experts providing assessments. SSA administrative data contain the basis of all medical determinations, including separate codes for whether successful applicants at step 3 have impairments that “met” or “equaled” the Listing of Impairments. An applicant whose impairments are determined to “equal” a listing is determined to have listing-level severity but not specifically to meet the criteria laid out in the actual Listings (20 CFR §§ 404.1526 and 416.926). The ability to identify such cases as meeting a listing at step 3 of an initial determination could greatly decrease the time to determination, especially if it led to fewer appeals (and engagement of disability attorneys to make the case that an impairment equals a listing).

For example, at least 14 states have already developed all-payer claims databases (APCDs) that pool the information required for such analysis from both public and private insurers. Four more states are in the process of implementing an APCD, and 16 more states have shown strong interest in developing their own APCD. It is therefore possible that, in a few years, most states will have APCDs to conduct the proposed analysis.¹

SUMMARY AND CONCLUSION

Inasmuch as there is scant literature on finding evidence of health-care utilizations that would be good proxies for listing-level disability, the committee considered the question, “What would make a good proxy?” That is, what proxy would result in the smallest number of false positives? Listing-level severity—defined by SSA as the inability to perform any gainful activity regardless of age, education, or work experience—is not easy to measure in practice. Simply observing that a person is not working does not necessarily mean that a person cannot work. Indeed, it has been noted, in some studies, that the disability-determination process itself provides applicants with a strong disincentive to work, inasmuch as any (substantial) gainful activity performed during or immediately before the application period can be used as evidence that the applicant can work. At the same time, expert assessments of disability severity will suffer from some level of type I and type II error.

Before developing a classifier, to improve assessments of disability, there would need to be decisions made about false positives and false negatives. From a societal point of view, a false positive might be viewed as a more expensive mistake than a false negative. A possible strategy might be to choose classifiers that err on the side of increasing false negatives rather than false positives, that is, classifiers that have higher specificity at the expense of lower sensitivity. However, the relative emphasis on the positive predictive value versus the negative predictive value depends on the goals of the classification. In any case, the decision to privilege false negatives over false positives, or vice versa, is a judgment that SSA would have to make.

Multiple issues should be explored in designing a study, such as confining interest to diagnoses for which expert assessment of listing-level severity is straightforward and reliant on objective measures, using data on health-care utilizations and listing-level severity in the context

___________________

¹ See https://www.apcdcouncil.org (accessed February 4, 2018) for details.

Page 96 Cite

Suggested Citation:"5 Proxies for Determining Listing-Level Severity." National Academies of Sciences, Engineering, and Medicine. 2018. Health-Care Utilization as a Proxy in Disability Determination. Washington, DC: The National Academies Press. doi: 10.17226/24969.

×

of a disability insurance program that does not have strong work disincentives, and merging administrative records of past SSDI applications with data on health-care utilizations history that are available at the time of a determination and data on postdetermination work outcome.

Given appropriate data, models for quantifying the value of health-care utilizations in determining impairment severity are available. However, given the rapidly changing health-care landscape, predictive models that are developed now might not have the same performance attributes later. Analyses will have to be repeated as the health-care landscape changes.

REFERENCES

Autor, D. H., N. Maestas, K. Mullen, and A. Strand. Does Delay Cause Decay? The Effect of Administrative Decision Time on the Labor Force Participation and Earnings of Disability Applicants. Santa Monica, CA: RAND Corporation, 2015. https://www.rand.org/pubs/working_papers/WR1070.html (accessed February 27, 2018).

Maestas, N., K. J. Mullen, and A. Strand. 2013. Does disability insurance receipt discourage work? Using examiner assignment to estimate causal effects of SSDI receipt. American Economic Review 103(5):1797–1829.