APPENDIX C

Methodological Issues in Measuring Health Status and Health-Related Quality of Life for Population Health Measures: A Brief Overview of the “HALY” Family of Measures

Dennis G. Fryback, Ph.D.

University of Wisconsin-Madison

INTRODUCTION

A “Simple” Concept

There is little dispute that the well-being of individuals involves two main conceptual parts, one dealing with longevity (or threats to longevity), and the other dealing with morbidity or the nonlethal aspects of daily function and pain and suffering that affect peoples’ lives. Any single summary measure of health and well-being of individuals and of populations will need to account for both these aspects.

This overview deals with a seemingly simple and powerful idea about numerical representation of health of individuals and its extrapolation to populations. Suppose we have a numerical measure of degree of morbidity (or lack of it) experienced by an individual at any given time. This numerical measure can be used to weight each passing moment of an individual’s life, and when the weighted moments are cumulated across the person’s life, the numerical aggregate denominated in “health-adjusted life years” (HALYs) can be used as a summary statistic accounting for both longevity and degree of morbidity experienced.

In retrospect, the HALYs accrued by an individual can be one numerical summary of that person’s lifetime health experience. This summary is the conceptual generalization of just counting the number of years an individual lived; we can say, for example, that a person lived to the age of 83 years (accumulated 83 life years) or we can say that individual accumulated 72.5 HALYs. Applied prospectively, the health-adjusted life expectancy (HALE) might be computed to index an individual’s lifetime and health prospects. For example, the estimated life expectancy of a 50-year-old male in the United States in 1981 was 25.0 years; in 1993 the estimated life expectancy of a 50-year-old male in the United States was 29.2 years, and the increase of 4.2 years in life expectancy over the 12-year period gives us information about changes in one of the two aspects of health and well-being in the United States during that interval. If we could prospectively weight life years with expected degree of morbidity during those years, we might be able to report the equivalent health-adjusted life expectancies for 50-year-old males. Of course, we



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 39
APPENDIX C Methodological Issues in Measuring Health Status and Health-Related Quality of Life for Population Health Measures: A Brief Overview of the “HALY” Family of Measures Dennis G. Fryback, Ph.D. University of Wisconsin-Madison INTRODUCTION A “Simple” Concept There is little dispute that the well-being of individuals involves two main conceptual parts, one dealing with longevity (or threats to longevity), and the other dealing with morbidity or the nonlethal aspects of daily function and pain and suffering that affect peoples’ lives. Any single summary measure of health and well-being of individuals and of populations will need to account for both these aspects. This overview deals with a seemingly simple and powerful idea about numerical representation of health of individuals and its extrapolation to populations. Suppose we have a numerical measure of degree of morbidity (or lack of it) experienced by an individual at any given time. This numerical measure can be used to weight each passing moment of an individual’s life, and when the weighted moments are cumulated across the person’s life, the numerical aggregate denominated in “health-adjusted life years” (HALYs) can be used as a summary statistic accounting for both longevity and degree of morbidity experienced. In retrospect, the HALYs accrued by an individual can be one numerical summary of that person’s lifetime health experience. This summary is the conceptual generalization of just counting the number of years an individual lived; we can say, for example, that a person lived to the age of 83 years (accumulated 83 life years) or we can say that individual accumulated 72.5 HALYs. Applied prospectively, the health-adjusted life expectancy (HALE) might be computed to index an individual’s lifetime and health prospects. For example, the estimated life expectancy of a 50-year-old male in the United States in 1981 was 25.0 years; in 1993 the estimated life expectancy of a 50-year-old male in the United States was 29.2 years, and the increase of 4.2 years in life expectancy over the 12-year period gives us information about changes in one of the two aspects of health and well-being in the United States during that interval. If we could prospectively weight life years with expected degree of morbidity during those years, we might be able to report the equivalent health-adjusted life expectancies for 50-year-old males. Of course, we

OCR for page 39
may wish to compute these summary numbers for any subgroup of the population, or for a population as a whole. This, then, is the simple concept: if we have suitable measures of morbidity experience and of life expectancy, we can compute a health-weighted life-year measure—HALYs, or HALE—to summarize population health. Although this seems a simple idea, it can be complex in application. This paper presents the major methodological issues surrounding construction of such measures. I use the terms HALYs and HALE to refer generically to summary numerical representations that are an accumulation of health-weighted life years. In the course of this overview subclasses of HALYs—e.g., quality-adjusted life years (QALYs)—are discussed as well as some differentiations within subclasses. Brief Historical Background: Descriptive and Decision Making Roots of HALYs The groundwork for the population-based HALY representation a was laid in the 1960s and early 1970s in publications from the U.S. Department of Health, Education, and Welfare, in the operations research literature reporting work funded by HEW. The problem was to describe the overall health of the population. For years this had been indicated by mortality rates, the decline in which began to level off some in the 1950s and 1960s. During the 1960s and 1970s researchers were looking for methods to bring into the descriptive summaries information about morbidity as well as mortality. The earliest paper I have seen is by Sanders, who introduced the mathematical combination of a measure of functional capacity and a measure of time to make a combined measure of “effective life years.” 1 Sullivan 2 amplified on this concept as did Moriyama, 3 and later Sullivan used a stationary life table technique to compute age-specific disability-free life expectancy from census data about mortality and the National Health Interview Survey (NHIS) data about disability. 4 Sullivan multiplied the number of people in each age range’s stationary population (from the census’ life table) by the proportion of the population that was disability-free in that age range in the NHIS data. The averaged result was mathematically an early progenitor of health-adjusted life expectancy, where the health weighting was a 0 for health states involving disability and 1 otherwise. This same approach has been used more recently but with different continuous measures of overall health to compute “Years of Healthy Life” (YHL) for the U.S. population 5 and HALE for Canadians. 6 As a descriptive statistic the HALE for a population draws meaning from its broad conceptual foundations and experience we gain with it over time. A good descriptive indicator should behave about like one expects it to given the conceptual grounds it springs from, and with experience over time we will learn its quirks and associate its behavior retrospectively and prospectively with other indicators of interest. This is not unlike the fashion in which some summary economic indicators such as Gross National Product (now evolved to Gross Domestic Product), or various stock market indexes around the world have acquired meaning. b As an input for decision making, however, a HALY measure may have more demands placed on it. In particular, one type of HALY measure called the “quality-adjusted life year” (QALY) has a history associated with welfare economics and a principle for decision making under uncertainty termed “maximizing expected utility” or the EU principle. This line of reasoning begins

OCR for page 39
by assuming that individuals have well-defined preferences; these preferences are termed “utilities” and individuals are presumed to make decisions so as to maximize their overall expected utility. Welfare economics concerned with how to devise a social utility function to guide social decision making so that aggregate social outcomes are related in desirable ways to the preferences of the individuals making up the society. This background is discussed at length in a book on cost-effectiveness analysis in health and medicine commissioned by the U.S. Public Health Service. 9 While the HALE concept was being developed as a descriptive index for health gains in populations, economists, operations researchers, and psychologists were developing the QALY concept for making decisions about which medical treatment or which health system intervention should be undertaken by virtue of its being the most cost-effective. 10 , 11 The seminal work in this regard is by Fanshel and Bush, 12 who developed a preference-based measure for measuring health in populations, tied it to the foundations of EU decision making for public policy, and then demonstrated its application. c Less than a decade later, the QALY as an outcome measure for cost-effectiveness analysis appeared in the major medical literature. 13 Technically, as a measure of decision outcomes for expected utility decision making, the QALY measure must satisfy some stringent mathematical and psychological properties, as discussed in following sections. These properties were first analyzed and discussed in the operations research literature, 14 and very recently reanalyzed with surprisingly simplified result. 15 Although the use of QALYs for EU decision making and cost-effectiveness analysis has been predominant in the medical decision making literature, EU decision making is by no means the only theory for how to make decisions. The study of the mathematical properties of QALY-like measures for other decision theories has been given modest attention 16 probably because non-EU theories of decision making, while possibly descriptive of how people actually do make decisions, are rarely advocated as theories of how decisions under uncertainty should be made. COMPONENTS, CONSTRUCTION, AND PROPERTIES OF HALY MEASURES Viewpoints Confusion about what is and is not required of a HALY measure often stems from the use of such measures for individual decision making versus use for societal decisions. In the discussion following, it is important to distinguish the use of a HALY measure to assist in decisions about health and medical care by individuals versus decisions made in a social policy setting. In constructing measures to assist individuals, we must take into account the psychology of how individuals make decisions in complex situations of uncertainty. Construction and use of HALY measures in individual decision making is not the subject of this paper. Societal decision making applications may be more deliberative, and we may get a chance to ask how we (as society) wish to make the decisions. Summary measures of population health may be only one of several decision considerations. Although a social measure of population health needs to deal with being representative of how individuals in the society value health, it also needs to deal with the problem of how health may be distributed in a population as a consequence of decisions. And, the societal viewpoint must also deal with problems of data availability on a population scale.

OCR for page 39
In the discussion following, I assume that we are intending the HALY measure to aid us to describe the health of populations and to describe the aggregate health consequences of social decisions that affect health of populations, for the purpose of policy-level decision making. Separation of “H” from “LY” A health state is defined as “the health of an individual at any particular point in time (p. 399).” 17 Gold and colleagues 18 characterized lifetime health paths as lifetime movement through different health states: In the most general case, each individual is born and lives out a lifetime that consists of moving through different health states over time until death. Each individual has a different path through these health states that terminates at a different time of death. For example, consider a perinatal intervention designed to improve the health of a newborn. Without the intervention the average newborn faces a probability distribution of possible paths through life. With the intervention, the person faces a different, hopefully improved, probability distribution of paths.... Two special paths represent the extremes of possible health outcomes and they bound the range of effectiveness scores for possible health paths. One is the path that consists of immediate death at birth. The other is the path that consists of ‘optimal’ health for a ‘full’ lifetime. [pp. 89–90] The term “health outcome” is used throughout the literature; unfortunately it is used in two different ways, one referring to what is denoted here as a “health state” and the other to a “health path.” Readers often have to determine from context which of these usages is meant. It is possible, of course, to speak also of conditional lifetime health paths such as age-specific and condition-specific health paths that are the health paths starting from a particular age with a particular health condition. The problem of measurement of health paths is the problem of assigning a number to each possible lifetime health path (or conditional health path) so that their numerical ordering represents better or worse health paths, and so that we can do meaningful arithmetic (such as taking averages) and the numerical results (e.g., averages) also are meaningful in the same way in that the numerical order of the averages represent better or worse aggregated or prospective health paths. The most fundamental assumption in the construction of HALY measures is that the part of the measure dealing with weighting health states can be obtained separately from the life-years, or time duration part of the measure. This assumption implies that the relative degree of health represented by a particular health state can be rated separately from knowing the duration of that health state. There are data showing this assumption is not uniformly true about how individuals think about their health. The relative weight that individuals give to some health states has been shown to vary with how long those states are endured. 19 There are other data showing for other health states such as angina the assumption is approximately true. 20 Some acute states may be relatively more tolerable than if they were chronic states. Alternatively, individuals may accommodate to a

OCR for page 39
health state over time, thus giving higher weights to certain conditions endured for a long time versus a brief time. At the level of individuals’ ratings of their health this is a complicated issue. This assumption is made for population-based measures largely for practical reasons. It immensely simplifies data collection and computations of average HALYs. It means we can collect data about relative weighting of health states in surveys that are separated from collecting data about longevity and later put these two sets of data together. And it simplifies understanding and communicating the aggregated HALY measure. There are ways to relax this assumption to a degree, for example by constructing health state descriptions that include time in the state for states where duration of the health state is known, to systematically and seriously affect the weight given to the state. Decision makers guiding construction of HALY measures for population-focused decision making will need to weigh simplicity against fidelity of the measure with respect to the assumed degree of separability of health state from duration. “High-fidelity” health state description systems, able to differentiate thousands or millions of health states that might affect humans, may need to be combined with time measures as if they were fully separable. High-fidelity HALY measures, not relying on the assumption of separability, may well be too complex to allow data collection at a population level. For the subsequent sections of this paper, it will be assumed that the H part of the measure is deemed to be separable from the LY part, and we will focus on different systems for measuring H. MEASURING “H” Systems for Measuring Health Status and Health-Related Quality of Life The problem of deriving weights for health states is generally divided into three exercises which are discussed below: First, we must decide what aspects of health will be the foundation of the classification scheme—i.e., what are the important aspects of health we wish to enumerate as health states. Next, we must devise a system by which a real individual’s health can be mapped into our discrete set of health states—i.e., we must operationalize the classification system. Third, we must devise a system for assigning numbers to each of the health states so that these numbers can be used as weights in a HALY computation. In the discussion following, I use six different systems as examples of health measures: QWB: The Quality of Well-Being Scale 21 is the direct descendant of the early work by Fanshel and Bush cited earlier. The QWB has been used as a general health measure in many clinical studies and in policy research. HUI: Torrance and colleagues 22 , 23 created the Health Utilitities Index from work conducted only slightly later than the Fanshel and Bush scale. This index has gone through several augmentations over the years; the successive indexes are the HUI-Mark I, HUI-Mark II, and

OCR for page 39
HUI-Mark III. The HUI is used for population health indexing in the province of Ontario and preliminarily throughout Canada by Statistics Canada. YHL: Years of Healthy Life is a measure of population health computed experimentally by the U.S. National Center for Health Statistics from data collected in the National Health Interview Survey. 5 I will use “YHL” to refer elliptically to the health state weighting component of this measure. DALY: Disability-Adjusted Life Years is a measure created by Murray and colleagues to index global burden of disease for the World Health Organization. 24 , 25 For purposes of this section I will use DALY to refer indirectly to the disability state weights that are one component of the DALY measure. EQ-5D: The EuroQol collaboration is a European multinational collaboration on measuring health-related quality of life in Europe. 26 The EQ-5D is the main health state data collection instrument used in this collaboration. SF-36: The Medical Outcomes Study Short Form-36 (known as the “SF-36”) 27 is a descendant of the Rand General Health Survey questionnaire. It is a health profile that is widely used in the United States and other countries. It is being tested as an instrument for monitoring population health change in the Medicare population (the “Seniors Project”). While these six do not exhaust the possibilities appearing in the literature (e.g., see McDowell and Newell 28 ) for a survey of many different health state description systems; McHorney 29 reviewed this literature; and Gold et al. 18 discussed systems that might be used in cost-effectiveness analyses), a more comprehensive review is beyond the scope of this overview. Descriptive systems for health states. The universe of health states that humans experience is immense. The problem faced by any classification system is to represent the complex reality of this universe in a manageable fashion. The simplest of classifications is to classify all health into two states: “alive” and “dead.” Policy decisions based only on mortality rates and on life expectancy ultimately rest on this simple binary classification without further differentiation. Once we decide that there are subclasses in the “alive” state that we wish to differentiate, the trick is how to do this in a meaningful way without being overwhelmed by complexity. What constitutes “health”? Any specific answer to this question is bound to be controversial. But examination of different classification systems that have been proposed results in a list of concepts and concerns that seems to encompass most. These are shown in Table 1. Constructs as listed in the left-hand column of Table 1 are referred to by various terms used approximately synonymously: “concepts,” “constructs,” “domains,” “dimensions,” “attributes,” and “factors.” The different aspects of health are not assumed to be independent in a statistical sense; for instance, the occurrence of depression may well be influenced by presence or absence of acute physical function restrictions or pain. Statistical independence of health dimensions is not a necessary condition of health state measurement systems. Most measurement schemes do require another sort of independence: that the weight accorded to different levels of one of these dimensions does not depend on other dimensions or its particular etiology. I return to this in the next section.

OCR for page 39
TABLE 1 Typical Concepts and Concerns Used in Measures for Health States Concepts and Domains Indicators Health perceptions Self-rating of health; health concern, health worry Social Function Social relations Interaction with others; participation in the community Usual social role Acute or chronic limitations in usual social role (major activities) of child, student, worker Intimacy/sexual function Perceived feelings of closeness; sexual activity and/or problems Communication/speech Acute or chronic limitations in communication/speech Psychological Function Cognitive function Alertness; disorientation; problems in reasoning Emotional function Psychological attitudes and behaviors Mood/feelings Anxiety; depression; happiness; worries Physical Function Mobility Acute or chronic reduction in mobility Physical activity Acute or chronic reduction in physical activity Self-care Acute or chronic reduction in self-care Impairment Sensory function/loss Vision; hearing Symptoms/impairments Reports of physical and psychological symptoms, sensations, pain, health problems or feelings not directly observable; or observable evidence of defect or abnormality   SOURCE: This table appears in Gold et al. (1996) as adapted from Patrick and Erickson (1993). Figure 1 provides a representation of the concepts in Table 1 . I have placed the constructs in a hierarchy. Each branching in the hierarchy can be construed as a parsing of the construct from which the branches emanate into subconstructs. It is not always easy to tell whether two different measures include the same constructs or dimensions of health. Not all concepts in Table 1 or Figure 1 appear explicitly in every measure of health. Alternatively, the same concept appearing in two different measurement systems may be operationally specified in quite different ways. Finally, health measures may differ in how they parse a construct at any given level in a conceptual hierarchy of health, and in how far down the tree they will continue parsing constructs into subconstructs. For the most part, these domains cover the descriptive schemes that attempt to generalize across health experiences. Measures that deal only with manifestations of a particular disease or condition may be much more detailed in the aspects of health affected by that disease to the exclusion of aspects not generally affected (for example, measures intended only to describe health states of persons with arthritis may be very detailed about pain and physical function, but generally do not include dimensions dealing with visual ability).

OCR for page 39
Figure 1 A possible hierarchical view of health dimensions. For purposes of public policy, where decisions may well be made that involve many different diseases and conditions, it is generally agreed we need a measurement scheme that covers the wide front of human experience and not just one disease or one aspect of health. But common health indexes devised to be “generic” in fact differ in which aspects are included and which are not. A major point of difference comes at the top level of the hierarchy in Figure 1 . Some measurement systems include self-perceived health, and some do not. The Years of Healthy Life (YHL) measure from the U.S. National Center for Health Statistics 5 and the Medical Outcomes

OCR for page 39
Study Short Form (SF-36) 27 both include self-perceived health. The Quality of Well-Being Scale (QWB), 21 the Health Utilities Index (HUI), 22 , 23 and the scale of disability used in the DisabilityAdjusted Life Years (DALY) measure, 24 , 25 and the EQ-5D (the instrument being used by a multinational collaboration on measuring health-related quality of life in Europe) 26 do not use this dimension as part of the classification scheme. A second major difference is inclusion/exclusion of the dimension I have labeled “existential/experiential symptoms and abilities.” Some developers of health measures have taken the view that the attributes down this branch are important aspects of health, but only insofar as they affect functioning. In this view, these aspects of health are valued for how they do or do not limit ability to conduct oneself in the broad world physically and socially. The QWB gives some of these weight as symptoms or problems in its “symptom/problem complex” attribute. The HUI, the EQ-5D, and the SF-36 detail degree of functioning on a number of these. However in DALYs and in the YHL these are not measured except indirectly through how they affect functioning in various activities. Operationalizing the classification system. There are methodological differences associated with the manner in which health classification systems are operationalized. The problem of operationalization is that of “mapping” the health state of a given individual into the set of classes that is created by a measurement system. It is perhaps easiest to describe these by example. Many classification systems are operationalized according to attribute. The system developers have listed the dimensions to be included in the measure (dimensions such as those in Table 1 and Figure 1 ), and have worked out a system for locating where along each dimension a given individual resides whose health is to be rated. Each dimension is categorized into a discrete set of levels, ranging from best function or best health on that dimension to worst. Table 2 shows attribute names and number of categories created for each attribute for the four systems here that are operationalized in this fashion. The number of different health states that potentially can be distinguished by a classification system is the product of the number of levels of each dimension since a health state is defined by picking one level from each dimension. So the QWB can distinguish 3 × 3 × 5 × 26 = 1,170 different health states (plus the state “dead,” for a total of 1,171 states). The HUI-Mark III can distinguish up to 972,000 health states. The EQ-5D can distinguish 243. The YHL measure classifies 30 health states. And the disability classification for DALYs has 7 classes (including no disability). In fact, many potential combinations of attributes in the first three systems are probably impossible combinations (e.g., “in a coma” and yet driving around the community!), so these simple computations represent upper bounds. And by no means are the various combinations considered to be equally likely—so much of the population will probably congregate in a relatively few cells of the classification schemes having many categories. The sheer number of health states does not indicate good or bad on the part of a measure; it is only an indicator of potential for differentiating states of health.

OCR for page 39
TABLE 2 The Attributes and Their Number of Levels for Four Health Measurement Systems System Attribute Name Number of Levels Quality of Well-Being Scale Mobility Scale 3   Physical Activity Scale 3 Social Activity Scale 5 Symptom/Problem complex 26 Health Utility Index-Mark III Vision 6   Hearing 6 Speech 5 Ambulation 6 Dexterity 6 Emotion 5 Cognition 6 Pain 5 EQ-5D (EuroQol) Mobility 3   Self-Care 3 Usual Activities 3 Pain/Discomfort 3 Anxiety/Depression 3 Years of Healthy Life Activity Limitation 6   Self-Rated Health 5 Health Scale for DALYs Disability 7 The EQ-5D and the HUI are designed so that individuals are asked to place themselves on each of the dimensions in the instrument. The EQ-5D divides each attribute into three levels—no problem, some or moderate problems, and inability or extreme problems with the construct. The individual respondent is required to interpret these adjectival phrases, so “some problems” may have a different meaning for different individuals. In this sense, the EQ-5D incorporates individuals’ perceptions of degree of effect that their condition has on the different dimensions of health. The HUI also is designed to have individuals pick the category in which they fit on each attribute. The categories, however, are defined more extensively and somewhat less subjectively than in the EQ-5D—for example, the middle category of ambulation is “Able to walk around the neighborhood with walking equipment, but without the help of another person.” An individual is classified on the QWB dimensions as a result of either direct selfclassification on the symptom/problem complex, or slightly less directly on the other dimensions. For the symptom/problem complex, the subject is shown or read a list of possible health conditions (e.g., “general tiredness, weakness, or weight loss,” or “trouble learning, remembering, or thinking clearly”). The subject (or a proxy) identifies all conditions that they have been affected by in the past 6 days. For the other dimensions, questions are asked that help classify the person’s functional ability. For example, for “mobility,” which is used to indicate how able the person is to get around the community, the person is asked if he or she has a driver’s license, and if not, is this because of their health. If the person does have a license, he or she is asked if he or she drove a car in the past day; if not, was this for health reasons. If the person did not drive, he or she is asked whether he or she used public transit, and if so, was more help than usual needed, and if not, was this because of his or her health. These questions are to determine the degree of

OCR for page 39
mobility and mobility restrictions caused by health. This type of questioning is an alternative to directly the person to self-classify the degree of limitation. The YHL health classification system was developed to use data from the U.S. National Health Interview Survey (NHIS). One question asks each respondent to self-classify their health as excellent, very good, good, fair, or poor. A pre-defined logic is used to classify the person’s degree of limitations in daily activities based on a number of questions about which activities he or she can engage in. The respondent is not asked to self-classify into limitation categories. The QWB, HUI, and EQ-5D are designed to be assessed directly as primary data. The health classification of the YHL is a secondary analysis of pre-existing NHIS data, although in principle it could be administered as the primary purpose of data collection. The disability classification used for computing DALYs relies on secondary data analysis, rather than primary data collection. A panel of experts has classified diseases and disabling conditions by how likely they are to produce varying levels of disability in recreation, education, procreation, occupational activities, or in activities of daily living. These judgments are used in secondary analyses of existing data sets to infer prevalence and degree of limitations in different populations (and to relate these inferences to specific health conditions). The SF-36 classifies individuals using an entirely different process than the other five measures. This instrument, and many similar instruments, use multiple questions about functions, abilities, feelings, and attitudes. Developers of instruments in this genre of health measures start by thinking of the various dimensions of health they wish to measure. But rather than going through a process of enumerating a categorical scale for each concept in the measure, they create many specific questions relating to the concept. For instance, if a scale of physical function is to be created, a pool of questions about specific physical activities will be created. This pool of questions, or “items” in the jargon, will cover items ranging from activities that all but the most impaired may be able to do, such as turning over, or toileting, or dressing or bathing oneself, to items about activities only the most physically adept might be able to do such as running long distances or playing vigorous sports. Ideally, if “physical functioning” is a unidimensional concept, all the items can be arranged so that they form a ladder of physical function—a person of a given ability would be able to perform all tasks from the low end of this ladder up to a certain point, then would not be able to perform any tasks on the ladder above this point. d Unfortunately, it is difficult to make such a perfect list as there are different aspects of physical functioning, such as fine motor control, flexibility, cardiovascular fitness, strength, for example. A person with arthritis affecting the finger joints may not be able to sew, for example, but may walk long distances. Items to form a scale are tested during instrument development and selected to ensure they sample a wide spectrum of activities thought to be associated with the underlying construct. The more items that are used in construction of the scale, the more precision there is to measure an individual’s performance on that dimension. Developers of such scales must balance the desire to have greater precision with the problem of getting subjects to answer long questionnaires. So a minimum number of items is selected for the instrument so that they collectively have good test-retest reliability and can be shown to correlate well with other measures and indicators of the construct. The SF-36 uses 36 items spread across 8 constructs—physical function, role functioning as limited by physical problems, bodily pain, social function, mental health, role function as lim-

OCR for page 39
ited by emotional problems, vitality, and general perceived health. The measurement philosophy, using multiple items regarded as each being a sample of abilities and attitudes, constitutes a different class of measurement approach from the attribute categorization of the other health measurement systems discussed above. Assigning numerical weights to health state categories. Once a classification system is set up, the final step is to assign numbers—the health state weights—to the different health states that the classification system can distinguish. There are both fundamental and real differences in how different systems derive the numbers used to weight the states, and there are apparent differences that are in fact mostly superficial. Let us deal with the superficial aspects first so that they can be put aside. The first of these is orientation—which end of the scale is “up.” The QWB and the HUI are scaled from 0 to 1, where higher numbers mean better health states. The DALY disability weights range from 0 to 1, but the lower end of this scale indicates less disability, so lower numbers indicate better states. Orientation is a matter of cosmetics and perhaps convenience and not fundamental differences. Another matter of cosmetics is the numerical range of the scale. In the same way that temperature can be measured in degrees Kelvin, Celsius, or Fahrenheit, we should not be concerned with scales which are simple transformations of each other. Scales that simply use different numerical endpoints are not necessarily fundamentally different measures. So what is a fundamental difference in assigning numbers to categories? The most basic difference is whether the numbers reflect preferences—i.e., whether they are derived from a human judgment about the relative desirability of being in one state or another—or are derived in a manner not directly related to preferences. The eight scales of the SF-36 are each computed from a simple scoring scheme that is not preference-based. For example, the physical function scale is formed by 10 items asking about degree of limitation in performing various physical tasks. The respondent can answer “limited a lot,” “limited a little,” or “not limited at all” for each of the 10 items. These responses are scored numerically as 1, 2, or 3, and the responses across the 10 items are added to yield a total score that can range from 10 (limited a lot on all 10 items) to 30 (not limited at all on all 10 items). This sum is then rescaled by simple transformation to range from 0 to 100, where 0 is minimum physical function on all 10 items and 100 is maximum function on all 10 items. The physical function scale of the SF-36, formed by this score ranging from 0 to 100 is a reliable indicator of physical function ability and has been shown to correlate well with many other indicators of physical function and to discriminate well between different groups of patients known to differ in physical abilities. 27 While we generally can agree that a higher score on this scale is likely to be a better health state than a lower score, the scale is not constructed so that a 10-point difference, for example, is the same amount of increase in desirability of the health state all along the scale. A change from 40 to 50 on the scale may or may not be equivalent to a change in desirability when increasing from 80 to 90. So while this scale may be a reliable indicator of changes in physical function, there is no psychometric guarantee that equal numerical changes are equally desirable changes. e Alternatively, the QWB, HUI, and DALY health measures were directly developed from judgments of desirability of the health states. Apparently the EQ-5D is being scaled in this manner from data that has been collected. The assignment of weights for the YHL was done in a

OCR for page 39
manner allowing it to indirectly approximate the HUI, and to reflect preference judgments even if not developed from primary judgment data. Although the terminology has not been standardized formally, there is a recent trend in the scientific literature to refer to measures based on preferences as health-related quality of life (HRQL) measures, to distinguish them as a subclass of all health status measures. By this convention, all the measures are health status measures, the QWB, HUI, DALY, and EQ-5D are health-related quality of life measures, and the YHL health measure is a proxy HRQL measure. By extension, the generic class of all adjusted life-year measures is HALYs and the subclass employing HRQL measures for the health component are QALYs. The U.S. Public Health Service Panel on Cost-effectiveness in Health and Medicine, after due deliberation, concluded that a QALY measure is required for generalizable cost-effectiveness calculations. 18 However, this does not preclude the use of a non-QALY HALY measure for descriptive summary purposes. There is much dispute about whose preferences should be used, and how to collect preference judgments. We’ll first discuss whose preferences to collect. For QALYs to be used for individual decision making there is no question that it is that individual’s own preferences that we wish to use. But for public policy purposes the answer is not so clear. It seems desirable to weight health states in such a way as to reflect public opinion. But there are many health states with which the general public is unfamiliar. If a survey of public opinion is conducted, these states may receive more or less weight than they would from persons who have suffered them. This may be because of an inability to imagine validly living with the health state, or because of fear of the health state, or other reasons. For instance, it is commonly held that the state of being blind is weighted substantially lower by people who have not been blind than by many who have been blind for an extended time. The QWB and the HUI (and thereby the YHL) have been developed from broad community surveys. Members of the community were asked to rate health states and their answers were pooled and analyzed to develop a scoring system that predicts the community-assigned score for each particular health state. There was no special effort made to ensure that persons with a particular condition were assigned to rate that health state. So people with different conditions are theoretically represented in the sample roughly in proportion to the prevalence of the health condition in the community. f The DALY weights were derived as considered judgments by a panel of experts. 24 The EQ-5D uses peoples’ ratings of their own general health state to derive weights for levels of the five dimensions. In a large population survey with each respondent rates himself or herself on each of the five dimensions, and then makes an overall numerical rating of his or her general health from worst imaginable health state (0) to best imaginable health state (1). Statistical modeling of these data can then be used to develop weights that relate observations on the dimensions to the overall rating of health. How to collect preferences is hotly debated. In the theory of decision making based on the principle that we should maximize expected utility of decision outcomes, there are many demands placed on the numerical utility scale used to evaluate decision outcomes. Because cost-effectiveness analysis comes from these roots, there is strong theoretical reason to demand that our eventual QALY measure should be a utility measure. And to do this, the health state weights have to be a utility measure. To explain exactly what this means mathematically is beyond the scope of this paper, and those interested can find technical discussions in Pliskin, Shepard, and

OCR for page 39
Weinstein (1980); Miyamoto and Eraker (1985); Torrence, Boyle, and Horwood (1982); Torrande, Thomas, and Sackett (1972); 30 Torrance (1988), 31 and Gold, Patrick, et al. (1996). Indepth technical references are books by Keeney and Raiffa (1976) 32 and by von Winterfeldt and Edwards (1986). 33 The idea is to tie the numerical judgment scale to decisions by people about health states. The Standard Gamble (SG) method for doing this incorporates uncertainty as a part of the decision environment. Respondents are shown a scenario describing a particular health state and are asked to imagine the certain prospect of living in that state for their remaining lifetimes and then dying. Alternatively, they can choose an action where they have a 60 percent chance of immediate death or a 40 percent chance of living their remaining lifetimes in excellent health. If they choose the first alternative, the gamble is made more attractive by increasing the chance of excellent health; if they choose the gamble, it is made less attractive by decreasing the probability of excellent health. Then they are offered the two choices again, the sure thing with the specified health state or the gamble. This continues until they cannot choose between the two alternatives. At this point of indifference, it is possible to derive a utility scale number for the health scenario in the sure thing alternative that is a function of the probability of excellent health in the gamble. Repeated use of the SG assessment method will obtain health state utilities for any specified health states. Or, this technique can be used with the separate dimensions to develop an entire scoring algorithm relating the components of the health state classification system to scores for health states. A second method for assessing weights for health states is the Time Trade-Off (TTO). Using this method the person is asked to consider living his or her remaining lifetime (usually specified as a fixed time roughly commensurate with age) with the health state described in the scenario, (same as the first alternative in the SG technique). The alternative to this is to live for a fraction of the fixed remaining lifetime, for example, 50 percent, but in excellent health. If the person prefers the first alternative, longer life but in worse health, the amount of time in excellent health is increased to make the second alternative more attractive. If the preference was the other way, then the amount of time in excellent health is decreased. Proceeding in this manner it is possible to find the indifference point, where remaining lifetime in the worse state of health is equivalent to less time in excellent health. This equivalence yields an equation by which a numerical weight can be derived for any health state. Although there are undoubtedly psychometric problems with these two assessment techniques, the SG and the TTO are considered to be the “gold standard” methods to elicit utilitybased health state weights. There are computer programs under development that can be used to conduct the elicitation so that no interviewer bias intrudes. The HUI was explicitly developed using these elicitation techniques. What alternative is there to these methods? We can always simply ask a person to rate the relative desirability of a health state on a scale relative to 0 (dead) and 1 (excellent health). Visual aids such as a drawing of a scale marked in divisions of tenths can aid this direct rating. These methods, often utilized, are called the direct rating (DR) and the visual analog (VA) methods. The QWB was derived from data collected using the DR method. The EQ-5D is being scaled from VA-elicited ratings. The weights derived by the four methods do differ. Generally, a given state is weighted highest with the SG method and the TTO method. DR and VA weights range more throughout the 0–1 scale. Which are more valid? It is not clear, especially for purposes of public policy de-

OCR for page 39
cision making. Although the SG and TTO methods are in principle tied to the theory of individual decision making, many consider the questions stilted and too hypothetical to elicit meaningful responses. Others seem to have little problem eliciting apparently meaningful weights with these methods. The Panel on Cost-Effectiveness in Health and Medicine preferred these assessments by a slim margin. Where SG and TTO are strongly supported by theory, people are discomfited by their cognitive awkwardness. Where DR and VA are easily understood and may be consistently used by respondents, it is not clear that their specific numerical results give us the scale properties needed to do precise arithmetic for health care cost-effectiveness analysis. I do not see the definitive experiment on the horizon to decide this issue. I expect that public policy decision makers will have to choose the measurement technique that prima facie produces the numerical results most meaningful to their decisions, then gain experience with how the weights derived with that technique behave when tracked in the form of QALYs or QALE over time in a population. My opinion is that most health state weighting schemes will weight states in pretty much the same overall order, at least on a gross scale. For instance, my colleagues and I have shown that about 50 percent of the population variation in QWB scores can be predicted at a gross level using SF-36 data. 34 From a policy standpoint, it is important that the measures on which decisions will be based include the constructs of interest in describing health. The numbers should be meaningful to the policy makers—this will come in part from the logic of the construction and derivation, but also from experience examining data in a variety of contexts and time frames. PUTTING IT ALL TOGETHER We have explored a number of issues concerning the “H” part of HALY measures. These include: choice of constructs to be included in the measure of health states how the health state conceptualization is operationalized how the specific weights are obtained for health states preference versus nonpreference scaling how to collect preference weights: SG, TTO, DR, VA We have established a terminology that helps to distinguish methodological choices: A Health Status measure is a system for weighting health states. An example is the SF-36. Preference-based health status measures are health-related quality of life measures. Examples are the HUI, QWB, DALY, EQ-5D, and maybe the YHL measure. A HALY is a health-adjusted life year summary computed with a health status measure A QALY is a health-adjusted life year summary computed using a health-related quality of life measure.

OCR for page 39
QALYs appearing in the literature have been based on the HUI and on the QWB. These are community-based indexes of health-related quality of life. QALE for a population has been computed using HUI in Canada 6 and approximated in the United States using YHL. 5 The QWB has been used to make a similar computation for a community population based on community data. 35 , 36 All these computationsare done with QALYs in a positive orientation, that is, more is better. The computations are meant to describe the health-adjusted longevity of a population based on cross-sectional or brief longitudinal observations of quality of life and longevity. The DALY is also a form of QALY, however it has a negative orientation as a loss of health rather than a gain in health. It is a representation of deficit in adjusted life years from fullhealth life expectancy in a population. QALYs computed with HUI or QWB are a “glass half full” measure and DALYs are simply QALYs as a “glass half empty” measure. In the positive orientation we have a measure of health achieved compared to accruing no life years at all. In the negative orientation we have a measure of burden of illness in the sense of deficit from full achievement. The latter calculation will depend on what standard is used to represent “full health”; this is discussed at length by Murray. 25 As implemented by Murray, the DALY also has one other important property: the life years are also weighted. In every other computation of QALY or HALY that I am familiar with, a life year is counted as a life year regardless of the age of the person living that life year. In the DALY framework, not only are health states weighted, but there is a weighting of life years that accentuates the concept of dependency of the very young and the very old, in a sense giving more weight to years accumulated in the population during productive adulthood. Although this possibility of weighting life years exists for any QALY measure, the DALY implementation is the only one to date that has incorporated it. I have not seen QALYs computed using EQ-5D. One study computing clinical cost-effectiveness made a comparison HALY computation using the physical function scale from the SF-36, 37 but did not rely on this for the final results. I have recommended that researchers can use the regression equation I have reported 34 to convert full SF-36 profiles into pseudo-QWB scores and from these to compute approximate QALYs. I continue to believe this would be acceptable at a population level but not at an individual decision making level. What are policy makers to do with all of these choices? First, examine the assumptions underlying the HALY computation. Beyond assuming that “a HALY is a HALY,” the assumption of separability of health state weights from duration of the health state is an important one— it greatly simplifies the data required, and this simplification needs to be weighed against the meaningfulness of the results. Policy makers should make sure that the health measure used to make the computation in fact responds to the aspects of health they feel are important. They also should decide whether it is important to have a scale for health states that is weighted in a manner responsive to preferences or not. Although we can compute HALYs or QALYs (or DALYs) by many different means, the answers may not be the same, so a choice among measures is a real choice. It may be important to adopt one or two measures provisionally and gain experience with them over time in a wide variety of settings, to understand where they correspond with intuition and where they do not, where they are responsive and where they are not, and where they are collectable and where they are not.

OCR for page 39
NOTES a   What I recount here is a North American history of the construct. I am unfamiliar with the early European literature in this regard. b   In fact, a director of the U.S. National Center for Health Statistics proposed a use of an index representing shortfall in HALE from what could be expected ideally as a “gross national health index” (Linder [1966] 7 cited in Patrick and Erickson [1993].) 8 c   This was an idea whose time had come. It was being invented in several literatures about this time and reviewers of different disciplinary backgrounds will no doubt trace its roots differently. d   A continuum of items with this property is called a “Guttman Scale,” named after the psychologist who first identified it as a scale type. A physical model for this is the machine for sorting oranges by size. Oranges are rolled down a trough with holes of successively larger diameter cut in the bottom. An orange will roll down the trough crossing the holes until it comes to the hole just big enough for it to fall through. We know that it is bigger than all the holes it roled across and smaller than holes farther down the trough from where it fell through. In the other instruments, YHL, HUI, QWB, DALY, and EQ-5D, each single dimension is divided into categories that at least at a gross level are defined to form Guttman scales. For example, the five categories of the HUI’s speech dimension are: (1) able to be understood completely when speaking with strangers; (2) able to be understood partially when speaking with strangers, but to be understood completely when talking with someone who knows me well; (3) able to be understood partially when speaking with strangers or people who know me well; (4) unable to be understood when speaking with strangers but able to be understood partially by people who know me well; (5) unable to be understood when speaking to other people or unable to speak at all. e   An easily understood analogy is that of the fever thermometer. It is a reliable instrument for indicating change in fever status of a patient, and we can probably find that in relevant ranges higher temperatures indicate more uncomfortable health states in most people. But a change from 99°F to 100°F may not at all be the same in subjective discomfort as a change from 102°F to 103°F. f   In fact, as empirical collection of community survey data usually require exclusion of the very ill or those mentally unable to respond, these states may be under represented REFERENCES 1. Sanders, B . Measuring community health levels . American Journal of Public Health 54(7) : 1063–1070 , 1964 . 2. Sullivan, DF . Conceptual Problems in Developing an Index of Health . Washington, D.C. : U.S. Department of Health, Education, and Welfare , 1966 . 3. Moriyama, IM . Problems in the measurement of health status . In Sheldon, EB , Moore, WE , eds . Indicators of Social Change . New York : Russell Sage Foundation , 573–600 , 1968 . 4. Sullivan, DF . A single index of mortality and morbidity . HMSHA Health Reports 86(4) : 347–355 , 1971 . 5. Erickson, P , Wilson, R , Shannon, I . Years of healthy life . Healthy People 2000 Statistical Notes

OCR for page 39
. Hyattsville, MD : National Center for Health Statistics/CDC/DHHS , vol. April, 1995 . 6. Wolfson, MC . Health-adjusted life expectancy . Health Reports (Statistics Canada) . 8(1) : 41–46 , 1996 . 7. Linder, FE . The health of the American people . Scientific American 214(6) : 21–29 , 1966 . 8. Patrick, DL , and Erickson, P . Health Status and Health Policy . Quality of Life in Health Care Evaluation and Resource Allocation . New York : Oxford University Press , 1993 . 9. Garber, AM , Weinstein MC , Torrance GW , and Kamlet MS . Theoretical foundations of cost-effectiveness analysis . In Gold MR , Siegel JE , Russell LB , and Weinstein MC , eds . Cost-Effectiveness in Health and Medicine . New York : Oxford University Press , 25–53 , 1996 . 10. Klarman, HE , Francis, JO , and Rosenthal, GD . Cost-effectiveness analysis applied to the treatment of chronic renal disease . Medical Care 6 : 48–54 , 1968 . 11. Packer, AH . Applying cost-effectiveness concepts to the community health system . Operations Research 16 : 227–253 , 1968 . 12. Fanshel, S , and Bush, JW . A health-status index and its application to health-services outcomes . Operations Research 18 : 1021–1066 , 1970 . 13. Weinstein, MC , and Stason, WB . Foundations of cost-effectiveness analysis for health and medical practices . New England Journal of Medicine 296 : 716–721 , 1977 . 14. Pliskin, JS , Shepard, DS , and Weinstein, MC . Utility functions for life years and health status . Operations Research 28 : 206–224 , 1980 . 15. Bleichrodt, H , Wakker, P , and Johannesson, M . Characterizing QALYs by risk neutrality . Journal of Risk and Uncertainty 15 : 107–114 , 1997 . 16. Bleichrodt, H , and Quiggin, J . Characterizing QALYs under a general rank dependent utility model . Journal of Risk and Uncertainty 15 : 151–165 , 1997 . 17. Gold, MR , Siegel, JE , Russell, LB , and Weinstein, MC . Cost-Effectiveness in Health and Medicine . New York : Oxford University Press , 1996 . 18. Gold, MR , Patrick, DL , Torrance, GW , et al . Identifying and valuing outcomes . In Gold, MR , Russell, LB , Weinstein, MC , ed. Cost-Effectiveness in Health and Medicine . New York : Oxford University Press , 82–134 , 1996 . 19. Sackett, DL , Torrance, GW . The utility of different health states as perceived by the general public . Journal of Chronic Diseases 31 : 697–704 , 1978 . 20. Miyamoto, JM , and Eraker, SA . Parameter estimates for a QALY utility model . Medical Decision Making 5(2) : 191–213 , 1985 . 21. Kaplan, RM , Anderson, JP , Wu, AW , Mathews, WC , Kozin, F , and Orenstein, D . The Quality of Well-Being Scale. Applications in AIDS, cystic fibrosis, and arthritis . Medical Care 27(3 Suppl) : S27–43 , 1989 . 22. Torrance, GW , Boyle, MH , and Horwood, SP . Application of multi-attribute utility theory to measure social preferences for health states . Operations Research 30(6) : 1043–1069 , 1982 . 23. Boylem, MH , Furlong, W , Feeny, D , Torrance, GW , and Hatcher, J . Reliability of the Health Utilities Index-Mark III used in the 1991 cycle 6 Canadian general social survey health questionnaire . Quality of Life Research 4(3) : 249–257 , 1995 . 24. Murray, CJL . Quantifying the burden of disease: The technical basis for disability-adjusted life years . Bulletin of the World Health Organization 72(3) : 429–445 , 1994 . 25. Murray, CJL , Lopez AD , eds. The global burden of disease : A comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected to

OCR for page 39
2020 . Cambridge, MA : Harvard School of Public Health on behalf of the World Health Organization and the World Bank; distributed by Harvard University Press , 1996 . 26. Group, TE . EuroQol—a new facility for the measurement of health-related quality of life . Health Policy 16 : 199–208 , 1990 . 27. Ware, J, Jr. , and Sherbourne, CD . The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection . Medical Care 30(6) : 473–483 , 1992 . 28. McDowell, I , and Newell, C . Measuring Health: A guide to rating scales and questionnaires. (2nd ed.) New York : Oxford University Press , 1996 . 29. McHorney, CA . Generic health measurement: Past accomplishments and a measurement paradigm for the 21st century . Annals of Internal Medicine 127(8) : 743–750 , 1997 . 30. Torrance, GW , Thomas, WH , and Sackett, DL . A utility maximization model for evaluation of health care programs . Health Services Research 7 : 118–133 , 1997 . 31. Torrance, GW . Measurement of health state utilities for economic appraisal: A review . Journal of Health Economics 5 : 1–30 , 1988 . 32. Keeney, RL , and Raiffa, H . Decisions with Multiple Objectives: Preferences and Value Tradeoffs . New York : John Wiley & Sons , 1976 . 33. Von Winterfeldt, D , Edwards, W . Decision Analysis and Behavioral Research . New York : Cambridge University Press , 1986 . 34. Fryback, DG , Lawrence, WF , Martin, PA , Klein, R , and Klein, BEK . Predicting Quality of Well-Being scores from the SF-36: Results from the Beaver Dam Health Outcomes Study . Medical Decision Making 17(1) : 1–9 , 1997 . 35. Lawrence, WF , Fryback, DG , Klein, R , and Klein, BEK . Community-based quality-adjusted life expectancy: Results from the Beaver Dam Health Outcomes Study . Medical Decision Making 16 : 454 , Oct.–Dec. 1996 [Abstract]. 36. Rosenberg, MA , Fryback, DG , and Lawrence, WF . Population-based estimates of health-adjusted life expectancy: Comparison of alternative methods for computation . Medical Decision Making 17(4) : 527 , 1997 [Abstract].