Read "Valuing Health for Regulatory Cost-Effectiveness Analysis" at NAP.edu

Page 67 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

3
Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis

As described in Chapter 2, federal agencies apply a variety of approaches to estimate and value the health-related benefits of regulatory interventions. Agencies are currently developing measures of health impacts for use in cost-effectiveness analysis (CEA) along with monetized estimates for use in benefit–cost analysis (BCA). These effectiveness measures include both single-dimension measures such as deaths or cases of illness averted and integrated measures of morbidity and mortality, that is, the health-adjusted life-year (HALY) measures that are a central focus of this report.

In this chapter the Committee describes different effectiveness metrics for health-related CEA and sources for estimates of health-related quality of life (HRQL) based on these metrics. We first introduce criteria for selecting among effectiveness measures for use in regulatory analysis, and then discuss various approaches in light of these criteria.

We cover much of the same ground as “Identifying and Valuing Outcomes,” Chapter 4 of the report of the U.S. Panel on Cost-Effectiveness in Health and Medicine (PCEHM) (Gold et al., 1996b). The emphasis and detail of this report, however, are tailored for an audience of regulatory analysts and decision makers. We reiterate some of the material in the PCEHM report here so that this volume will be a largely self-contained reference. In many instances the Committee follows and endorses the PCEHM’s interpretations and recommendations; in a few respects, our judgments differ, as summarized at the end of the chapter.

This chapter begins with a discussion of criteria for selecting among different HALY measures and for determining which approach to applying

Page 68 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

these measures is most appropriate for regulatory analysis. We then describe and evaluate each approach in more detail. The subsequent sections of this chapter first briefly review the single-dimension measures common in statistical reporting systems and epidemiological studies, including case reporting of illness or injury, preventable deaths, and life years lost. This section also considers the contribution of mortality and longevity changes, relative to changes in HRQL, to overall estimates of effectiveness. Next we examine alternative HALY metrics, discuss their construction and theoretical roots, and methods for determining the relative values of specific health states. These metrics, survey instruments, and methods for eliciting preferences or values for particular health states are evaluated in terms of their practicality, reliability, and theoretical and empirical validity. In the following section we consider sources of health state values for regulatory analysis and review four commonly used generic HRQL survey instruments. The fifth section identifies data collection and research priorities as well as promising developments for improving the measurement of health effects for regulatory analysis. Last, we briefly summarize the Committee’s findings and conclusions based on the material presented in the chapter.

CRITERIA FOR SELECTING HALY METRICS FOR REGULATORY CEA

As introduced in the preceding chapters, regulatory analysts face a series of choices in determining how to structure the effectiveness measure in their analyses. First, they may choose between a single-dimension or integrated measure. Although single-dimension measures, such as lives saved, life years extended, or cases of illness or injury avoided provide important information of interest to decision makers, analyses of major regulations generally include more than one health effect of concern. Thus our focus is on developing criteria for selecting among the integrated measures that are the main focus of the report.

The first choice that analysts face in selecting an integrated measure is whether to rely on the most commonly used approach—the quality-adjusted life year (QALY)—or one of the other HALY approaches. HALY approaches, which rest on how length of life is combined with a value or preference for a given state of health, are discussed in detail later in this chapter. They vary primarily in the extent to which they are widely accepted, available, and used. Because the requirements for regulatory CEA are already in effect and analysts need tools that are ready for use, the Committee’s criteria for selecting among these HALY measures are largely practical ones. (The development and pursuit of a longer term research agenda are discussed separately at the end of this chapter.)

At this broadest conceptual level, the relevant performance characteris-

Page 69 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

tics of a HALY effectiveness measure for regulatory CEA conform to some straightforward criteria.

First, the HALY metric should have a “track record,” that is, it should be in relatively widespread use and methods for estimating index values, as well as estimates themselves, should be available in the literature.
Second, the metric should be easy to understand and interpret. To some extent, the comprehensibility of a metric is a function of the extent to which it has been used, and thus depends on the first criterion.
Third, the metric should be relatively inexpensive to use, both in terms of the availability of methods and values for immediate application and in terms of the development and collection of new values.

Of course, in addition to these practical considerations, measures must also provide valid and reliable estimates of the relative value of different health states. Assessing reliability and validity is, however, largely a function of the extent of the research base; the measures that do not meet the first criteria above are less likely to have been subject to extensive tests of validity and reliability.

As discussed in more detail in the following sections of this chapter, the Committee believes that the QALY best meets these criteria. Once an analyst makes the decision to use the QALY metric, the next set of choices involves determining how to apply this measure in the context of a particular regulatory analysis. As already discussed, analysts face the choice in BCA and CEA alike of conducting new research on benefit values or transferring estimates from existing studies. In CEA analysts have a third option: they can use generic indexes. The use of these indexes can be based on existing studies or new research; i.e., the analyst may transfer estimates from an existing study that used a generic index, or may use the index to generate new valuation estimates. As illustrated by the Committee’s case studies, these indexes have the advantage of allowing the analyst to value new health states without the substantial investment of time and resources required for new primary valuation research. Each of these approaches is discussed in detail in the later sections of this chapter.

Because several generic indexes are well established and easy to use, the Committee expects that they will often be applied in regulatory analysis in the near term. As already discussed, regulatory analysts lack the time or resources to engage in the development of instruments for health status valuation in the context of individual regulatory analysis. Thus we focused our criteria for implementing the QALY measure on the choice among available generic instruments.

Several authorities have offered criteria for assessing the construction and performance of HRQL measures, primarily with respect to their use in

Page 70 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

CEAs of health care services and in clinical outcomes studies. Box 3-1 presents standard performance criteria for preference-based HRQL survey instruments. While each of these features of an HRQL instrument may be

BOX 3-1
Standard Performance Criteria for HRQL Instruments

The PCEHM proposed that valuation approaches should have a theoretical foundation and be empirically derived. Economists and decision theorists tend to favor choice-based valuation such as standard gamble and time trade-off methods because they are more closely connected to utility theory. Some psychologists have also used techniques such as rating scales and magnitude estimation.

An ideal measurement method would satisfy a long list of criteria. While any given list is probably incomplete, some criteria deserve particular attention. For example, the ultimate standard of validity is construct validity, the extent to which an instrument accurately measures or identifies the thing it is intended to measure. Because HRQL is an unobservable construct with alternative theoretical foundations, there is some ambiguity and tension as to how to demonstrate an instrument’s validity. Three subsidiary or partial aspects of validity that are more readily demonstrated are content validity (adequate or appropriate scope to the measure); criterion validity (the degree of correspondence of the instrument to an agreed-on measure of the construct); and predictive validity (ability to predict future behaviors and outcomes).

An instrument’s valuation survey sample should be adequate in size and response rate, and the population from which the sample was drawn should be representative of the population of interest in the CEA. In the case of regulatory CEA, this would be the population affected by costs and/or benefits of the regulatory intervention.

A measure should be reliable, that is, exhibit consistency in repeated measurements by the same individual over time or across different groups drawn from the same population.

A measure should be widely applicable to a range of health states and conditions. It should be sensitive, that is, responsive to change, and not exhibit floor or ceiling effects in the range of anticipated effects. An HRQL instrument should be flexible and universal, as demonstrated by applications to and adaptations for cultural and language subpopulations and alternative administration formats.

An HRQL measure should be well documented, transparent, and interpretable. An instrument should be feasible to administer, not burdensome for respondents, and acceptable to users and the public. This may be judged by administration format, completion times, and rates of missing responses. Preference elicitation surveys should have satisfactory completion rates; if respondents consistently decline to make choices within an elicitation exercise, the measure or method may not be appropriate or adequately informative.

SOURCES: Gold et al. (1996b); Lohr et al. (1996); IOM (1998); Brazier et al. (1999b).

Page 71 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

desirable, some are particularly important and take a specific form in the context of informing regulatory decision making.

As discussed later in this chapter, in applying these criteria the Committee found that no one HRQL index is obviously superior to the others in all respects for all applications. Thus, to designate any single instrument as a standard for all regulatory analyses would be arbitrary. Judging the appropriateness of a given instrument for a particular regulatory application depends not only on the features of the HRQL instrument, but also the characteristics of the affected population, the intervention, and the health research that underlies the risk assessment.

The Committee emphasizes the following criteria for choice of an HRQL instrument in a regulatory analysis.

First, an HRQL instrument must be applicable to the range of health-related effects being evaluated. Generic HRQL instruments are designed for application to a wide range of health states that can result from a variety of health-related risks or interventions. Still, as described below, each generic instrument has distinctive features absent from the others. For example, the Quality of Well-Being Scale (QWB) includes symptoms and problems in its valuation formula, along with functional attributes; the Health Utilities Index (HUI) instruments specify sensory and cognitive functions, which make them relatively sensitive instruments for conditions with these manifestations; and the SF-6D allows the use of widely collected SF-36 and SF-12 data sets.

Second, the instrument should be sensitive enough to distinguish among health endpoints. This criterion addresses the “fit” between the HRQL instrument, the health condition(s) of interest, and the risk assessment data used to estimate and characterize the health impacts. For example, a highly differentiated HRQL instrument may not be readily “mappable” onto epidemiological data about respiratory symptoms related to air quality if the later dataset is based on very general symptom-based categories. Conversely, if the regulatory health impacts of interest are very specific, such as functional limitations resulting from long-term effects of traumatic injury, and the domains of an HRQL instrument do not reflect those effects, that instrument might not be sufficiently sensitive. In the Committee’s case study of child seat restraint anchoring systems, in which head injuries were a prominent risk, some but not all indexes included a cognitive function domain. In this case study, however, the similarity of estimates of QALY effects (as assigned by experts) across different instruments does not demonstrate that the more specific attributes are critical to the sensitivity of the instrument (see Appendix A, Tables A-11 and A-12).

Third, a generic instrument should reflect the values or preferences for health of the population(s) of interest. In most cases, for major regulations, those who will bear the costs and/or receive the benefits can be represented

Page 72 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

by the U.S. population as a whole. Hence it is the preferences of this population that will matter most for valuation. Of the generic instruments reviewed, only the QWB and the EuroQol Group’s EQ-5D have preferences derived from the U.S. population. Whereas the U.S. EQ-5D valuation survey is recent and based on a nationally representative sample, the QWB valuation survey is about 30 years old and was conducted in a single community (San Diego, CA). The HUI-3 valuation survey was conducted with a representative sample from Hamilton, Ontario, Canada, and the SF-6D values are derived from a U.K. general population survey.

Fourth, as in the case of the HALY measure, the HRQL instrument also must be acceptable to and understandable by survey respondents, policy makers, and the general public. One indication of a measure’s acceptability is the extent to which valuation survey respondents comprehend, and are willing to engage in, the preference elicitation exercise. In a broader sense, the ethical commitments and implications of the HRQL instrument and the health state values it generates must be viewed as legitimate by the ultimate users of the analytic results. Transparency, in the sense of relying on data that is publicly available (not proprietary), may also contribute to a measure’s acceptability.

Finally, as in the case of the HALY measure, the HRQL instrument should be as inexpensive to use as is compatible with the other objectives. This criterion applies to considerations such as mode of administration (e.g., mail surveys are less costly than personal interviews) and also to the proprietary status of the instrument and related analytic tools.

SINGLE-DIMENSION MEASURES OF HEALTH-RELATED OUTCOMES

Cases of illness or injuries, deaths, hospitalizations, and days of work or school lost are commonly reported outcomes based on routine health information collection activities. These measures are familiar, easily comprehended, generally stable, and can be obtained or calculated from standard statistical sources. Tables 2-2, 2-4, and 2-5 in the previous chapter provide examples of specific single-dimension outcome measures used in regulatory analyses. The drawback of relying on these types of measures alone, without benefit of more comprehensive measures, is that they are not readily aggregated.

Mortality-based indicators have long dominated population-based health status measurement. They are also prominent in risk assessments and economic analyses for health and safety regulations. Life expectancy and age-specific death rates are familiar and straightforward health outcomes measures. Early analyses counted preventable or premature deaths

Page 73 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

averted.¹ With the advent of CEA in health care settings, analysts turned to counting years of life saved, thus reflecting differences in remaining life expectancies.

Much of the information needed to calculate integrated measures of morbidity and mortality relates to the determination of the relative values attached to different health states, yet changes in survival tend to swamp the impact of changes in HRQL in HALY calculations for health care programs. In a review of 63 studies that included 173 cost-effectiveness-ratio pairs that reported both cost per life year ($/LY) and cost per quality-adjusted life year ($/QALY), Chapman and colleagues (2004) found that quality-adjusting life years resulted in a median difference between LY and QALY ratios for the 173 ratio pairs of just $1,300. (The median ratios were $24,600/LY and $20,400/QALY.) In a separate review of 110 cancer prevention, early detection, and treatment interventions, Tengs (2004) also compared $/LY and $/QALY ratios. Consistent with the findings of Chapman et al., she reported a very high rank-order correlation between LY and QALY ratios. Both studies concluded that the difference in quality-adjusting life years would have affected decisions about cost-effectiveness in just a small proportion of cases (8 and 5 percent in the Chapman and Tengs studies, respectively, at a $50,000 decision threshold in each case).

The results of these two review studies suggest that accounting for mortality impacts may be more important than adjusting for the HRQL impacts associated with diseases for which the intervention saves many lives. In these cases, calculation of life years gained may capture the majority of the impact of the intervention on health. However, this will not be the case for programs or regulations that improve health and functioning but do not significantly change life expectancy, such as one might expect with mitigation of environmental exposures to lead or mercury. In the juice processing case study (summarized in Appendix A), for example, chronic illness impacts accounted for the majority of QALY gains.

HEALTH-ADJUSTED LIFE YEARS

HALY measures were designed to address the limitations of single-dimension measures. HALYs capture information about both length of life and the states of health experienced during those years. The virtue of such an index of health—that it combines information about diverse health-related conditions as well as mortality—also poses challenges. A HALY is a

¹	Throughout this report, we use the term “preventable” rather than “premature” deaths. These terms refer to decreases in the risk of death attributable to a regulation, in other words, expected gains in life expectancy.

Page 74 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

relatively abstract concept, and some users of health statistics may find it harder to understand than more concrete and simpler health indicators, such as a change over time in the incidence of lung cancer or life expectancy in a population. Hence reporting the constituents of HALY measures and presenting cost-effectiveness ratios using specific outcomes such as preventable deaths remain important.

HALYs not only meld descriptive information about health states and longevity, they also incorporate judgments about the relative value of different states of health, taking into account their impact on functioning and subjective experience. Such judgments about HRQL may be individual, aggregated and averaged for a population, or reached collectively by individuals participating in an interactive or consensus process.

HALY measures are constructed in three steps. First, a description of a health state or disease condition is needed. Second, that state or condition must be given a value or weight, relative to other states and conditions. By convention, HRQL scales are anchored by values of 0 and 1, where 0 corresponds with death and 1 with the state of full, optimal, or “perfect” health. (States of health considered worse than death can be accommodated by negative values.) Third, the values for different health states or conditions must be combined with estimates of the duration in each health state over the predicted remaining life span. Figure 3-1 represents an illustrative health-adjusted life expectancy (for either an individual or a population, on average) as the shaded area on a two-dimensional graph where the vertical axis represents HRQL and the horizontal axis represents duration of life. When interpreted as an individual life, the figure suggests how one moves through different states of health, implying different levels of HRQL, over the course of a lifetime.

Several approaches to estimating HALYs are discussed later in this section and many are illustrated in the Committee’s case studies. The most familiar and widely used measure is the QALY, and that is the metric given fullest consideration here. Before discussing the QALY and alternative metrics, we describe some general features of HALY measurement, using the QALY as the case in point.

Describing Health States

HRQL measurement relies on concepts such as “health status,” “functional status,” “well-being,” and “quality of life.” Although these terms, along with “health-related quality of life,” are often applied interchangeably, in fact they encompass narrower or broader arrays of domains, with “health status” denoting a more restrictive concept and “quality of life” a more extensive one. Table 3-1 presents concepts and domains that fall within these broader rubrics.

Page 75 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

FIGURE 3-1 Health-Adjusted Life Years

At a minimum, the measurement of HRQL incorporates both the description of health status (which may include observable and unobservable symptoms, functional capabilities, and health perceptions) and the importance or value that people, individually and/or collectively, attach to these aspects of health. Health states may be described (and valued) either as related to or representing specific disease conditions or in generic terms. Valuations of generically described health states, using multiattribute health state classification systems, are reviewed in some detail later in this chapter.

HALY metrics such as QALYs have been constructed with tools from both psychometrics (the theory and techniques of measuring psychological phenomena such as attitudes) and utility theory (defined in Chapter 1). They are developed most often from some combination of psychological survey and decision-theoretical techniques. All generically described health states used in HRQL indexes depend on psychometric scaling and concepts to some degree. Such generic indexes thus share common features with health profiling instruments, such as the SF-36. Like other health status profiling instruments, the SF-6D was not designed to produce a preference-based index value.²

²	The SF-36 is described later in the Chapter, when its derivative preference-based index, the SF-6D, is discussed.

Page 76 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

TABLE 3-1 Concepts and Domains Used in Defining Self-Reported Health Status, Quality of Life, and Health-Related Quality of Life

Concepts	Domains	Attributes
Symptoms	Reports of physical and psychological symptoms or sensations not directly observable, such as energy and fatigue, nausea, and irritability	Frequency, severity, bothersomeness
Functional status		Frequency, difficulty, severity, ability, with help
Physical	Functional limitations and activity restrictions, such as self-care, walking, mobility, sleep, sexual
Psychological	Positive or negative affect and cognitive, such as anger, alertness, self-esteem, sense of well-being, distress
Social	Limitations in work or school, participation in community
Health perceptions		Frequency, severity/intensity, satisfaction
Global	General ratings of health and quality of life, such as satisfaction or overall well-being
Worries and concerns	About health, finances, the future
Spiritual	Meaning and purpose of life or relationship to the universe
Disadvantage/opportunity	Perceptions of stigma or reports of discrimination because of health condition	Frequency, impact
Resiliency	Reports of ability to cope or withstand stress and illness	Frequency, satisfaction, ability
Environmental	Evaluations of personal safety, adequacy of housing, respect, freedom, and so on	Satisfaction, importance
SOURCE: Reprinted from Patrick and Chiang (2000, Table 1).

Page 77 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Valuing Health States and Preference Elicitation Methods

The scaling of values associated with particular health states reflects the relative strength of preference for one state as compared with another. Health states must be located on an interval scale (and not simply ranked) in order to be incorporated in a HALY measure. This section reviews four methods for eliciting preferences for health states:

Standard gamble (SG),
Time trade-off (TTO),
Category rating (CR) or visual analogue scale (VAS), and
Person trade-off (PTO).

These preference elicitation techniques pose different questions and emphasize different facets of the relative value of various health states. Most analysts who use these valuation techniques recommend that results from two or more approaches not be combined within a single analysis, and that their interpretation and the discussion of results consider the elicitation method (Lenert and Kaplan, 2000).

Each of the four methods has particular strengths. Economists generally prefer metrics or instruments that use SG or TTO. These elicitation techniques produce relative preference weights using methods consistent with neoclassical economic utility theory, which requires choices reflecting an opportunity cost—the sacrifice of one valuable good for another. Preference or value elicitation methods grounded in utility theory correspond more closely than do psychometric approaches to the model of consumer choice.

Rating scale approaches such as CR or VAS are considered the least burdensome for respondents, although some studies have reported that respondents found the task more challenging than TTO or SG. CR or VAS are understood to reflect respondents’ internal representations of health states in a comparative sense, and may be anchored or influenced by the actual health of the respondent (Krabbe et al., 1997).

PTO valuation methods have been designed to introduce other-directed interests and considerations into societal resource allocation and priority-setting contexts. In contrast with other techniques, the PTO approach does not purport to represent primarily self-interested or consumer preferences for health states. PTO has not been as widely applied as the other techniques.

Unless new surveys are conducted to elicit values for specific health states, the elicitation technique is part and parcel of the choice of a generic, multiattribute HRQL index. Thus, although the following discussion addresses elicitation methods in isolation from other features of valuation surveys, in practice these methods are not readily mixed and matched with

Page 78 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

the descriptive systems of different indexes. Nonetheless, considering the performance of different elicitation methods as such is helpful because the valuation (as compared with the characterization or description) of health states is what distinguishes QALYs from other HALY metrics.

Standard Gamble

Expected utility theory provides a normative model for individual decision making under conditions of risk or uncertainty. The SG is the only preference elicitation method directly linked to the axioms of expected utility theory. In order to establish the relative values of various health states on an interval scale, respondents must determine the conditions of indifference or equivalence between two outcomes. One of the alternatives, representing the health state (less than full health) of interest, is a certain outcome. The other alternative has two possible outcomes, one being full health and the other being immediate death. The respondent is asked to specify the risk of immediate death (with probability p) and the complementary probability of survival in perfect health (1 − p) that would make this uncertain alternative just as attractive as the certain alternative of the impaired health state. On a 0-to-1 scale for health state values arrayed from death to full or perfect health, the value of the health state in question is then (1 − p).

The relative values of different states of health elicited with the SG technique will reflect, to some degree, individuals’ attitudes about taking risks. If the respondent in an SG is averse to taking risks, the value assigned to the certain, impaired state of health will be closer to 1.0 (optimal health) than if the respondent is not risk averse (Kahneman and Tversky, 1979; Loomes and McKenzie, 1989). The standard gamble is a cognitively demanding technique. Because SG explicitly uses probabilities of events to determine relative values, and probability information often is not well understood, empirical results do not confirm the prediction from expected utility theory that the relative values of different health states maintain a constant proportional relationship to risk. For example, when presented with probabilities that differ by an order of magnitude (a 1-in-100 risk versus a 1-in-1,000 risk), respondents do not treat them as representing a fully tenfold difference in likelihood. A method for adjusting SG responses to account for biases in probability weighting has been proposed by Bleichrodt et al. (2001), but this method has not been widely adopted.

Time Trade-Off

The TTO elicitation method is also considered consistent with utility theory because respondents must sacrifice one valuable good for another.

Page 79 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

The TTO method was developed as an alternative to the standard gamble to avoid the cognitive challenges associated with choosing probabilistic outcomes. In a TTO elicitation, the respondent is asked to choose between two certain prospects, for example, to experience remaining life expectancy in a given health state (less than full health) or to live for a fixed number of years in full health, followed by immediate death. The number of years in full health is varied until the respondent is indifferent between the two prospects. The value of the health state is then given by the ratio of the number of years in full health to the remaining life expectancy.

The TTO method has proved practical and acceptable to survey respondents (Brazier et al., 1999a). It may be more comprehensible than an SG. Furthermore, the TTO method has intuitive appeal, as it involves the direct exchange of the two components of health, morbidity and longevity. The method has been shown to confound preferences for health states with time preference (the extent to which one discounts the value of states in the future). The TTO method relies on the fundamental assumption of QALYs that the weight assigned to a health state is independent of its duration, and so one will trade off a constant proportion of remaining years of life for a given improvement in health status, regardless of how many years remain. However, empirical work has demonstrated that the value of a health state may depend on its duration (Sackett and Torrance, 1978; McNeil et al., 1981). Other experimental results suggest that TTO may be better suited to valuing chronic conditions than temporary conditions (Dolan and Gudex, 1995). The TTO method nonetheless offers a useful and intuitively plausible first approximation of relative values for different states of health.

Direct Rating: Category Rating and Visual Analogue Scales

Direct rating approaches to preference elicitation ask respondents to assign a single number to a health state, usually on a scale of 0 to 100, with these anchors being the worst and best imaginable health states, or death and perfect health. Visual aids, such as the “feeling thermometer” in the EuroQol Group’s generic HRQL survey instrument, the EuroQoL-5D (EQ-5D), are often used in this approach. (See Kind et al., 1998, for a reproduction of the “feeling thermometer.”) If the direct rating scale is divided into discrete points of equal intervals that the respondent must select, the approach is called CR. If there are no constraints on the location of assignments between the anchor points, the approach is referred to as a VAS.

Direct rating approaches apply psychometrically based attitudinal scaling methods to questions related to health. Rating scale methods are familiar to many and have been used extensively in survey research. They are generally thought to impose the least cognitive burden among value

Page 80 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

elicitation methods. CR and VAS values have been treated as having interval properties as measures of strength of preference by their proponents (Revicki, 1992; Kaplan et al., 1993). Health state values generated by VAS tend to correlate more closely with health status indicators such as pain, functioning, and clinical symptoms, and with health status profile scores, than do values generated by SG and TTO methods (Brazier et al., 1999a).

Direct rating, however, lacks the theoretical support of the trade-off-based methods (Bleichrodt and Johannesson, 1997). Respondents to rating scale surveys are not told that, in calculating QALYs, rating an impaired state of health at 50 on a scale of 0 to 100 will be interpreted as considering 1 year of life in perfect health equivalent to 2 years of life in the impaired health state. Empirical findings of both clustering of responses away from the extremes of the scale and response spreading have raised concerns that CR and VAS do not reflect the interval-scale properties that are required for QALY valuation.

Person Trade-Off

The PTO represents a fundamentally different approach to establishing relative values for health states. This method was designed to inform societal decision making about investments in and priorities for health care interventions, and most notably was used in setting the original disability-adjusted life-year (DALY) weights (Murray and Lopez, 1996; Murray and Acharya, 1997). In a PTO exercise, respondents are asked to make choices about health interventions and health states for groups of people other than themselves. For example, a respondent may be presented with a situation in which a given number of people (x) have a particular health-related impairment A and another group of y members have a different health impairment B (the time in health states A and B are the same). The respondent is asked to choose which group to help if she could help only one group because of limited resources. By varying the number of persons in one or the other group (x′) until the respondent concludes that helping x′ persons with condition A is equivalent to helping y persons with condition B, the societal value of health condition A relative to health condition B is determined: (1 − x′)/(1 − y).

PTO choices incorporate concerns about relative health status and the distribution of benefits in the particular choice scenario. Specifically, PTO choices are more responsive to the relative severity of conditions involved and to life-saving interventions than are individual preference-based valuation techniques, reflecting an interest in benefiting the worst off (Nord, 1999). Yet, at the same time, participants in PTO exercises appear to take into account the total gains in health across all participants, even if those who are initially worst off are not necessarily helped (Dolan and Green,

Page 81 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

1998). Several reviews of the PTO methodology applied in different contexts, including the World Health Organization’s (WHO’s) Global Burden of Disease DALY measure, have called for more research to refine and improve the reliability of the technique and specifically for further development of its theoretical rationale (Brazier et al., 1999a; Dolan, 2000; Green, 2001; Walker and Siegel, 2002).

The PTO technique is cognitively demanding and it requires posing a large number of choices to construct a robust set of relative values for different diseases (Green, 2001). It has also performed poorly, relative to other approaches, in tests of reliability and internal consistency (Patrick et al., 1973; Ubel et al., 1996).

Comparisons Among Elicitation Methods

This review reinforces the caveat stated at the beginning of the section: Each approach elicits relative health state values that incorporate different characteristics of the health states or aspects of the choices posed. For example, SG results incorporate attitudes about risk, most often risk aversion, so that SG-based values tend to be higher than values estimated with other approaches. Similarly, TTO elicitations capture time preferences and direct rating methods reflect elements of current health status.

In a study in which 69 public health professionals valued 12 health states according to each of the four previously described elicitation methods, Salomon and Murray (2004) explored the hypothesis that a consistent set of core valuations of health states underlies the preference estimates produced via each elicitation technique. In their modeling of responses, the authors estimated the contributions of various factors (e.g., risk attitudes, discounting, distributional concerns, and scale distortion effects) in explaining the differences among the valuation techniques, in order to isolate an underlying strength of preference. This study is encouraging with respect to the possibility of ascertaining consistent and stable preferences for health. At the same time, it suggests that comparing the results of studies using different valuation techniques should be approached with caution and that mixing valuation approaches within one study may be unwise.

In the following discussion, the Committee considers the relative performance of the three predominant elicitation techniques in terms of feasibility, reliability, and theoretical and empirical validity. Because the PTO approach differs from the other elicitation techniques in what it intends to measure, it is not included in this comparison. Furthermore, there is little evaluative research on the performance of the PTO.

Feasibility Of the three main elicitation methods, rating scale approaches like CR or VAS are the most feasible and least expensive, and are accept-

Page 82 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

able to respondents, with a high completion rate (95 percent and above). Some researchers have reported completion problems and difficulty in understanding the probabilistic choices with the SG (Froberg and Kane, 1989). In their more recent review of the literature, Brazier and colleagues (1999a) concluded that the SG methodology was comparable to TTO in terms of completion rates. Both the SG and the TTO may require an interview-based approach because of the complexity of the valuation exercises, in contrast with VAS, which is more amenable to a mail survey format.

Reliability Table 3-2 presents intrarater test–retest reliability results for the SG, TTO, and VAS methods from studies that resurveyed respondents at different time intervals, ranging from less than one week to a year. None

TABLE 3-2 Intrarater Test–Retest Reliability of the Standard Gamble, Time Trade-Off, and Visual Analogue Scale Techniques

Test–Retest Reliability	Standard Gamble	Time Trade-Off	Visual Analogue Scale
1 week or less	0.80^a	0.87^a	0.77^a
	0.77–0.79^b		0.70–0.95^b
4 weeks or less	0.82^c	0.81^d	0.62^c
		0.63^e	0.89^e
6 weeks		0.63–0.80^d
		0.85^f
10 weeks		0.73^g	0.78^h
6–16 weeks	0.63 (props)ⁱ	0.83 (props)ⁱ
	0.74 (no props)ⁱ	0.55 (no props)ⁱ
1 year	0.53^j	0.62^j	0.49^j
NOTE: Correlation as specified; intraclass correlation coefficient: b, c, g, h; Pearson correlation coefficient: e, i; others unspecified. “Props” and “no props” referred to mode of administration, with or without specially designed aids in decision making (boards or cards). ^aO’Connor and Pennie (1995). ^bBakker et al. (1994). ^cO’Brien and Viramontes (1994). ^dChurchill et al. (1987). ^eGabriel et al. (1993). ^fMolzahn et al. (1996). ^gDolan et al. (1996a). ^hGudex et al. (1996). ⁱDolan et al. (1996b). ^jTorrance (1976). SOURCE: As reported in Brazier et al. (1999a, Table 1).

Page 83 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

of the three elicitation methods has been shown to perform consistently better than the others.

Theoretical validitySeveral economists and QALY valuation researchers engaged in health-related CEA have noted that the ultimate test of validity should be the extent to which a technique or measure predicts the preference revealed in actual decisions (Brazier et al., 1999a; Dolan, 2000), consistent with the theoretic basis of welfare economics. In research on willingness to pay for risk reductions, the results from a stated preference survey (e.g., for safety interventions that reduce the risk of accidental death) can be compared with revealed preference studies (e.g., based on labor market studies of wage-rate differentials for risky jobs). It is more difficult to use revealed preference methods in studying choices in health and health care because the relative prices paid for treating different conditions cannot be assumed to reflect consumers’ relative preferences. Thus the “gold standard” of validity testing is not available for HRQL stated preference results. Instead, validity testing has been conducted primarily within the psychometric tradition, and has focused on construct validity, that is, the extent to which measures discriminate among unlike health states and converge on like ones (Dolan, 2000).

Empirical validity The SG and TTO methods have been compared in terms of producing logically consistent orderings of health states. In one study in which about 150 participants each compared 12 pairs of health states ordered in terms of level of impairment, TTO elicitations resulted in somewhat higher rates of logically consistent rankings (92 percent) compared with SG elicitations (84–88 percent), but this difference between methods was not statistically significant (Dolan et al., 1996a).

Internal inconsistencies in valuation have been found in some TTO studies as well. A recent study by Bleichrodt and colleagues (2003) concludes that these inconsistencies occur for short but not longer duration health states. They suggest that this phenomenon explains why TTO valuations sometimes exceed SG values, even though values elicited with SG approaches generally tend to be higher than those elicited by TTO. For example, the EQ-5D uses a relatively short-gauge duration of 10 years for comparison with remaining life expectancy; the authors argue that this leads to valuations that are too high.

Dolan (2000) argues that, although the SG and TTO methods are preferable in the abstract to rating scale approaches, both of these methods incorporate features that influence valuation. Because many people are averse to risk, they may assign a higher value to the intermediate health state that is certain. Because people generally have positive time preferences and value years in the near future more highly than those more distant, they

Page 84 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

will more readily trade off years of life closer to death. (Box 3-2 addresses the question of how this phenomenon relates to discounting QALYs.) Taken together, these measurement biases lead to higher SG values than TTO values for the same health states. Furthermore, many respondents are unwilling to accept any risk of death, or trade off any longevity, for a health improvement, leading to relatively high values for impaired health states (Reed et al., 1993). These results also suggest that individuals’ preferences are not fully consistent with QALYs.

Although SG and TTO values are ordinally correlated with VAS values, their relationship is not proportionate. The practice of mapping from VAS to SG or TTO valuations has been reviewed by Brazier and colleagues (1999a,b) and directly evaluated in an original study by Krabbe and col-

BOX 3-2
The Time Trade-Off Method and Discounting

It has been argued that, because the TTO preference elicitation method incorporates respondents’ time preferences, discounting QALYs elicited by TTO results in double discounting. The following demonstrates why this is not the case.

Time preferences in health are usually modeled with a constant discount rate, r, over time. Assume that a respondent has a positive time preference (r), meaning that she prefers that good things happen sooner and bad things happen later. In a TTO choice, then, the longer lasting health state alternative would diminish in value proportionately more than the shorter term alternative. Thus, to equilibrate the two options, the respondent would decrease the value assigned to the shorter term, better health state option, resulting in a lower TTO score for the health state of interest. TTO scores are negatively related to the respondent’s positive time preference; however, they are not proportionally related.

If the individual’s utility function can be represented by the discounting factor r, then QALY values could be adjusted by calculating the TTO score by dividing the discounted (at rate r) years in full health by the discounted years in the health state of interest. However, this works only at the individual, not aggregate, level (Johannesson et al., 1994). In societal evaluation, the discount rate reflects the time preferences assigned by the decision makers.

Although TTO preference scores are affected by the respondent’s time preference, this effect is neither uniform nor proportionate. Individual time preferences for health have been found to be highly variable and range from positive to negative rates of discount (Dolan and Gudex, 1995). Conventional social rates of discount do not necessarily reflect individual time preferences. Because no method of accounting for time preferences exists at the aggregate or societal level, Drummond and colleagues (1997) recommend that, regardless of the elicitation method, QALYs should be discounted at the recommended social rate.

SOURCE: Drummond et al. (1997, pp. 184–185).

Page 85 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

FIGURE 3-2 Mean Valuations for 13 EQ-5D Health States with Four Estimation Methods

NOTE: Each set of health state numbers refers to a specific combination of attribute levels for the EQ-5D survey instrument. See Appendix B for the corresponding descriptions.

SOURCE: Reprinted from Krabbe et al. (1997, Figure 1), with permission from Elsevier.

leagues (1997). Brazier and colleagues report that the seven studies they examined had inconsistent results with respect to the relationships between VAS and either SG or TTO.³ Krabbe and colleagues’ study comparing SG, TTO, rating scale (RS), and willingness-to-pay values for 13 generically described health states (taken from the EuroQol Group’s EQ-5D classification system) did, however, find a consistent relationship between RS and TTO mean values, as shown in Figure 3-2. They estimated an algebraic power function for the relationship between the 13 mean RSs and TTO values with an R-squared of 0.96 (RS = 1 − (1 − TTO)^α; α = 0.42).

³	Krabbe et al. (1997) was published too late to be included in the Brazier et al. (1999a) review, which was completed in early 1997.

Page 86 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Conclusion This section has considered the performance of different elicitation techniques and the distinctive aspects of value implicitly conveyed by each one. Comparing methods for eliciting health state preferences directly may be less relevant for guiding CEA in regulatory analysis, however, than is comparing the specific HRQL instruments, or generic indexes, presented later in this chapter. The choice among alternative preference elicitation techniques is embedded in the choice of generic index, because each index relies on a valuation survey that employed a particular elicitation method. If health state values are elicited directly in new surveys, however, the researcher must choose a preference elicitation method.

Alternative HALY Metrics for Regulatory CEA

In the previous section we relied on the QALY as the construct for which different preference elicitation methods are applied. This section further considers the QALY and several other HALY constructs, in light of their suitability for regulatory CEA.

Quality-Adjusted Life Years

As noted in Chapter 1, the QALY was the first HALY metric, developed about 30 years ago as an outcome measure for CEA. It was designed to facilitate the maximization, in accordance with individual preferences for health, of aggregate health benefits for a given level of resources invested.

QALYs can be interpreted in different ways. When initially developed, the QALY was simply an index with an intuitive meaning, corresponding to the equivalent number of years in full health. More formally, QALYs can be thought of as an index for which relative values are calculated using utility theory or as a measure of economic utility (Gafni, 2004).

Pliskin and colleagues (1980) first proposed an underlying utility model for QALYs. This model applies to individual decision makers who are presumed to maximize expected utility when outcomes are uncertain. The authors derived the behavioral assumptions about preferences for health states and longevity that would be consistent with QALYs as a utility function, in situations where health status is constant over the life span. As described in the previous section, the SG and TTO are commonly used to determine the value of a particular health state that will last the rest of one’s life in terms of the risk of death or the loss of life expectancy that the individual is willing to accept in order to achieve optimal health.

The behavioral assumptions of the utility-theoretical model are as follows:

Page 87 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Individual decision makers follow the axioms of expected utility theory, which is based on preferences for outcomes that are uncertain. These are that (1) preferences for outcomes exist and are transitive; (2) preferences for an uncertain prospect do not depend on whether the prospect has one stage or two; and (3) preferences are continuous (von Neumann and Morgenstern, 1947; see Patrick and Erickson, 1993, for an exposition in terms of HRQL valuation).
The proportion of remaining life that the individual decision maker would trade off for a given quality improvement is independent of the length of life remaining. That is, if someone with severe osteoarthritis would trade off 10 years of a remaining life expectancy (LE) of 40 years for 30 years free of disease followed by immediate death, then that person would be willing to trade off 5 years of a remaining 20-year LE to live free of disease.
The utility for a health state is independent of its duration.
An individual’s health utilities are independent of nonhealth factors in her overall utility function. This means that preferences for income, leisure time, and other features of life do not affect her preferences for different states of health.
Individuals are risk neutral with respect to gambles over life years (Dolan, 2000). Risk neutrality implies indifference among lotteries on future longevity that have the same life expectancy (and that are the same in terms of health).

An additional assumption that is required when health states vary over the life span is that preferences for health in different time periods are additive, in accordance with individual preferences for health.

Miyamoto and Eraker (1985) have investigated the behavioral content of the theoretical assumptions and concluded that:

the QALY model deserves consideration as a description of patient preferences … because it concisely formulates two aspects of utility that are crucial to any viable medical utility model…. By summarizing risk attitude toward survival and the effect of health quality in a few easily assessed parameters, the QALY model provides a powerful and general instrument for describing patient values (p. 205).

As a measure of the production of health, QALYs are relatively simple and “modular,” allowing longevity and HRQL to be equated, combined, and traded off at both the individual and population levels. Thus, despite some evidence that the independence and risk neutrality assumptions of the QALY model are violated in empirical studies, the model remains useful for decision making because its parameters can be readily estimated and it reflects trade-offs between survival and quality of life (Miyamoto and Eraker, 1985, 1988).

Page 88 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

QALYs are by far the most commonly used metric in CEA. A literature survey of cost-effectiveness studies published over 20 years (1981–2000) in the medical and health services research literature identified 328 original CEAs that used a HALY outcome measure. All but one study, which used the healthy year equivalent (HYE) metric, used QALYs (Greenberg and Pliskin, 2002).

Healthy Year Equivalents

The HYE is an economic concept used to determine the number of years in optimal health that would produce the same level of utility for an individual as produced by a lifetime health profile (i.e., a particular succession of health states).

In a critique of the QALY model, Mehrez and Gafni (1990, 1991) proposed alternative approaches for estimating the relative values of alternative health states that do not rely on the strong independence assumptions posited by Pliskin et al. (1980) and the assumption of additivity over time. First, individuals may value two different sequences of health states that result in the same number of QALYs differently. Second, quality and length of life are not valued independently of each other, in contrast to a fundamental assumption of QALYs. Mehrez and Gafni addressed these empirical results by constructing dynamic health profiles extending over the course of life and then eliciting the relative values for these profiles in their entirety with a TTO elicitation technique.

The HYE approach requires comparing a large number of alternative health profiles. Although the HYE has an advantage in that some of the restrictive assumptions associated with QALYs do not apply, preferences must be elicited for specific health profiles, or sequences of health states, rather than for individual health states as with QALYs. Although proponents of the HYE metric contend that the greater methodological demands of the approach are justified in terms of its closer adherence to the theoretical conditions of utility theory, critics counter that developing an empirical base of HYE values for widespread use is not practical. The debate between proponents of QALYs and HYEs boils down to a choice between a simpler model that imposes a smaller information collection burden and a more complex but better fitting model that has demanding and costly data collection requirements.

Disability-Adjusted Life Years

The DALY is a measure of potential years of life lost to premature death, adjusted to include the equivalent years of healthy life lost through poor health or disability. Box 3-3 provides some background on the origin

Page 89 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

BOX 3-3
The World Health Organization’s Disability-Adjusted Life Year

DALYs were developed as a summary measure of population health for the WHO Global Burden of Disease study (Murray and Acharya, 1997). Three objectives motivated this project. First, international health policy debates previously had depended primarily on mortality statistics, and policy makers and researchers wanted to include the impact of nonfatal health outcomes in their assessments and deliberations. Second, to allocate resources across a spectrum of health interventions more effectively, a common measure was needed to estimate the relative magnitude of particular diseases in terms of their impact on longevity and disability. Last, such information could reduce existing allocative inefficiencies by comparing investments in different kinds of interventions for particular populations and societies.

The valuation of various health states using a variant of the PTO elicitation method was undertaken with WHO’s concerns and objectives in mind. In 1995, health experts were brought together by WHO and first asked to determine the numbers of persons in full health and those with a particular condition that they would consider equivalent in terms of a given life extension (say, of one year). Next they were asked to determine the number of persons in the health-impaired group who would have to experience an improvement in HRQL to full health to be equivalent to gaining a life extension of one year for the fully healthy group. These PTO values were then compared and reconciled in a final weighting. The official DALY weights are available in Mathers et al. (2003), which can be downloaded from the WHO website (http://www3.who.int/whosis/discussion_papers/pdf/paper54.pdf).

of the DALY measure. DALYs are calculated by summing the life years lost from an optimal life expectancy, adjusted downward by any mental or physical disability caused by disease or injury. Like QALYs, DALYs can be discounted to present value. The DALY index scale is an inversion of the QALY scale: for DALYs, 0 corresponds to perfect health and 1 to death. DALY index values correspond to specific health conditions rather than to generically characterized health states.

The initial characterization of nonfatal health outcomes in DALYs was based on the International Classification of Impairments, Disabilities, and Handicap. DALYs focus on functional disability from diseases and other health-related conditions. In the WHO DALY study, health professionals constructed the descriptions of disabilities, and other groups of health experts valued the disabilities using the PTO method in a deliberative, iterative process (Murray and Lopez, 1996; Gold et al., 2002). These DALY condition weights do not purport to reflect individual utilities. Rather, they represent the relative social value of different states of health as judged by

Page 90 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

experts, “… a variant of QALYs which have been standardized for comparative [international] use” (Murray and Acharya, 1997, p. 704).

The DALY construct reflected two much-criticized analytical choices that are no longer considered essential for the measure. First, decrements in longevity were calculated from a worldwide optimum life expectancy, represented by that of Japanese women (82.5 years). The second distinctive feature was age weighting. Years lived in young adulthood were given a greater value in comparison to years lived earliest and at the end of the life span. Age weighting gives priority to the potential for improving health outcomes among the members of society most critical to the well-being of society as a whole, those in their productive years of life.

Age weighting and the use of optimum life expectancy are not, however, in principle necessary to the DALY construct. DALY weights may be determined based on any of the methods described earlier in this chapter, including PTO, SG, TTO, or RS. Some more recent applications of DALYs are not age weighted, use life tables for the actual target population, and apply DALY weights derived from sources based on different methods (see, e.g., de Hollander et al., 1999; Fox-Rushby and Hanson, 2001, for applications and discussion of analytic options using DALYs).

Saved-Young-Life Equivalents

QALYs and other individually based preference or utility measures are deemed by some to be inappropriate for societal resource allocation decisions. These measures do not adequately account for the value attached to saving lives relative to improving health or to the priority that may be given to improving outcomes for the most severely impaired, regardless of the size of the improvement. QALYs measure only the size of an improvement in health and disregard health state starting and endpoints. This reflects the irrelevance, in the calculation of QALY gains or losses, of all personal attributes except the quality adjustment to a life year and the number of aggregate QALYs. However, in surveys of people’s preferences for public investments in health, their “health-related social welfare function” is rarely consistent with QALY maximization (Ubel et al., 1996; Menzel, 1999).

Nord (1992, 1999) has proposed several strategies to incorporate this concern for severity and life saving in HALY measurement. One of these approaches, related to the PTO valuation method described earlier, selects a single health care outcome as the common unit of measurement for all health-related outcomes. The common unit Nord proposes is the SAVE, the value of saving the life of a young person and restoring him to full health. To determine the relative societal value of a given health outcome, two equally expensive programs are compared. One program saves a young life each year and the other produces n health outcomes of type x each year.

Page 91 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Respondents are asked how many outcomes of type x would be considered as valuable as saving the life of one young person. This direct elicitation allows all aspects of the given health outcome to be taken into account, including the initial health states as well as the extent of potential gains in health and the characteristics of the persons who would benefit. Nord proposes this unit of measure as a common denominator for all societal investments in health and longevity improvements.

The SAVE measure, like the HYE, requires direct elicitation for many specific health profiles, and thus faces the same implementation difficulties. Index values for SAVEs are not available in the research literature and, as with PTO values more generally, the reliability of the technique has not been determined.

We take up the issue of societal values and QALYs again in Chapter 4, where we examine the ethical assumptions embedded in the QALY metric and strategies for addressing distributive and other ethical issues that arise in regulatory CEAs that employ QALYs.

Choosing a HALY Measure for Regulatory Analysis

The QALY is the obvious choice at this time for standardizing regulatory analysis on a single HALY metric. Researchers have completed only limited work using the HYE and the SAVE, and health state values using these metrics are not readily available. Furthermore, values for the wide range of health conditions considered in regulatory analysis are not likely to be developed in the near term using these approaches, given the complexities of establishing values (such as conditioning health state values on duration or transitions from prior health states) and the expense of related research. The HYE, while in theory superior to the QALY as a measure of preferences for health, would require a significantly more complex elicitation process, as would the SAVE, which is valued using variants of the PTO method. The DALY can be valued using a variety of methods consistent with QALY measurement. However, the inversion of the calculations, as losses averted from some normative life expectancy, introduces opportunities for confusion in interpretation if other results are presented as QALY gains.

Alternatives to the QALY have not undergone extensive reliability evaluation. Although the QALY can be criticized for not adhering to expected utility theory or for ignoring certain dimensions of societal values for health-related improvements such as severity or threat to life, it is feasible and widely used. In addition, the QALY is supported by a number of generic, multiattribute HRQL survey instruments and can be estimated for health endpoints in regulatory analysis using a variety of approaches.

Page 92 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

SOURCES OF HEALTH STATE VALUES FOR REGULATORY ANALYSIS

Relative values or preference weights for health states that represent endpoints in regulatory analyses can be obtained using a variety of sources. As already noted, the field of HRQL measurement was initially developed to inform medical technology assessment and resource allocation decisions. The data sets, information needs, and analytic priorities for these policy contexts tend, not surprisingly, to differ from those of regulatory analysts and policy makers. Consequently, the measurement tools that have been designed to answer questions of clinical effectiveness and efficiency in improving health outcomes are unlikely to be perfectly matched to the demands of regulatory analysis. The following discussion reviews various ways of obtaining preference-based HRQL values, focusing on the information needs and constraints of those involved with risk regulation. This section reviews:

Primary elicitation of health state index values for specific conditions,
Four commonly used generic HRQL survey instruments or indexes,
Use of condition-specific indexes,
Use of experts to assign health states,
Use of data from routine population surveys,
Use of health state index values from prior studies and benefit transfer practices, and
Assessing uncertainty in the estimation of health-related effects from regulatory interventions.

The section concludes with a brief review of innovations in survey instruments and measurement techniques and key areas for further research and development of HRQL metrics and methods for regulatory CEA.

Primary Elicitation of Condition-Specific Index Values

One way to obtain index values for particular states of health is to elicit preferences for those states directly from the population whose interests are at stake, or from proxies for that population. For example, to value a reduction in a particular type of cardiac disease in the U.S. population, researchers might conduct a survey that described the effects of the disease and ask a representative sample of the U.S. population to value these effects. When QALY-based CEA was first introduced, direct elicitation of preferences for specific health states, conditions, or treatment outcomes was the only available approach (Bush et al., 1973; Torrance et al., 1973;

Page 93 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

McNeil et al., 1978; Pliskin et al., 1980). Every CEA had to estimate values for the outcomes of interest. Generic HRQL indexes had not been developed, and a research literature reporting values that could be used “off the shelf” had not yet accumulated.

By the time the PCEHM issued its report in 1996, several generic HRQL survey instruments were available. The panel recommended generic health state classification systems as the preferred measurement approach in CEA because these systems offer the best opportunity to achieve consistency in the valuation of health states across studies and across different health interventions and diseases (Gold et al., 1996b).⁴

In some cases, existing studies may provide suitable, high-quality estimates for valuing the health states of interest in regulatory analysis. In the absence of such studies, new, primary research to value the health conditions targeted by a regulatory intervention might appear to be the most desirable course. However, it is unlikely to be a realistic option in the near term for the vast majority of regulatory analyses. Both the time available to conduct analysis of proposed regulations and the resource demands of survey research militate against undertaking original studies, except as part of a separate project without the constraints of regulatory analysis. In addition, federally sponsored survey research is subject to Office of Management and Budget review and approval under the Paperwork Reduction Act, which creates additional time and resource burdens and uncertainty. As a result, the sources of health state index values discussed in the remainder of this section are likely to be the more feasible options for regulatory CEA in the near term.

Generic HRQL Indexes

An alternative to directly eliciting preferences for specific conditions is to use a multiattribute health state classification system with predetermined index values for generically described health states. These indexes are widely used and accepted in medical CEA as a way to assign general population or “community” index values to highly disparate conditions and diseases, with minimal burden on respondents. Characterizing particular health conditions in terms of the conditions’ generic features or attributes can be done in a number of ways: by patients with the condition, by members of the general public based on a detailed description of the condition or scenario, or by clinical experts familiar with the condition. These characteristics are

⁴

In the context of regulatory CEA, directly eliciting preferences for health states is analogous to conducting an original willingness-to-pay survey to value health effects for BCA. Although some researchers have proposed standardizing valuation approaches for BCA, established generic methods like the indexes used in CEA do not exist.

Page 94 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

then valued using the preexisting health state values developed for that particular index.

The case studies conducted by the Committee to inform and illustrate the discussion and recommendations in this report employed four generic indexes:

The Quality of Well-Being Scale,
The Health Utilities Index (in two versions, Mark 2 and Mark 3),
The EuroQol-5D, and
The SF-6D.

These instruments were chosen for in-depth examination from a much larger field of such instruments based on their widespread use in U.S. and Canadian health care outcomes and cost-effectiveness research (in the case of the first three instruments listed) or because the index values could be calculated from health profile data that are collected extensively in the United States (in the case of the SF-6D).

After briefly reviewing the structure of such instruments and the theories on which they are based, we describe each of them in turn. Tables 3-3 and 3-4 present the basic features of each of the four instruments in summary and comparative form. The instruments themselves and sources for their valuation or scoring algorithms are presented in Appendix B.

The use of generic health indexes to estimate preference-based HRQL values involves two steps. First, the health state of interest must be described in terms of the several domains of HRQL. (See Table 3-1 for a conceptual overview of these domains.) A given respondent characterizes or describes the health state according to the generic set of attributes offered by the index’s standardized questionnaire. For example, under the EQ-5D, the respondent may indicate that the health state leads to “no” problems with walking about, “some” problems washing or dressing, and so forth. Once a health state has been characterized in terms of the domains of the generic instrument, a single index value for the overall health state can be calculated on a 0-to-1 scale.

These index values for health states are based on a separate valuation exercise (typically conducted with respondents drawn from a local community’s residents or a nationally representative sample) that elicits preferences for health states (described generically, not as particular diseases or conditions) in terms of the survey instrument’s HRQL domains. The relationship between general population valuation of health states and the characterization of health states using a generic HRQL index is depicted in Figure 3-3, for the case in which patients with a health condition describe the condition.

Index values for health states using multiattribute generic instruments

Page 95 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

TABLE 3-3 Domains and Number of Attribute Levels for Generic HRQL Indexes

QWB	[QWB-SA]	HUI-2	HUI-3	EQ-5D	SF-6D-36	[-12]
Mobility (3)	[3]	Sensation (4)	Vision (6)	Mobility (3)	Physical functioning (6)	[3]
Physical activities (3)	[3]	Mobility (5)	Hearing (6)	Self-care (3)	Role limitation (4)	[4]
Social activities (5)	[5]	Emotion (5)	Speech (5)	Usual activities (3)	Social functioning (5)	[5]
Symptom/problem complexes (26)^a	[58]^a	Cognition (4)	Ambulation (6)	Pain (3)	Mental health (5)	[5]
Symptom/problem complexes (26)^a		Self-care (4)	Dexterity (6)	Anxiety/depression (3)	Bodily pain (6)	[6]
		Pain (5)	Emotion (5)		Vitality (5)	[5]
		Fertility (3)	Cognition (6)
			Pain (5)
NOTES: HRQL = health-related quality of life; HUI = Health Utilities Index; QWB = Quality of Well-Being; QWB-SA = self-administered format QWB. ^aEach of the symptoms/problem complexes is measured as present or absent. SOURCES: Feeny et al. (1996); Torrance et al. (1996); Kaplan et al. (1997); Kopec and Willison (2003); Brazier and Roberts (2004). See Appendix B for complete descriptions and sources for these generic indexes.

Page 96 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

TABLE 3-4 Valuation Surveys for Generic HRQL Instruments

Index	Sampling Frame	Sample Size/Year/Response Rate	Valuation Technique	Number of Health States Measured/Measured by Each Respondent	Number of PossibleHealth States
QWB	San Diego community residents	866 adults/1974–1975/NA	VAS	42/42	945
HUI-2	Ontario, Canada	293 parents, for children/NA/NA	VAS transformed into SG	21 with VAS; 4 with SG	24,000
HUI-3	Ontario, Canada community residents age 16+	504 adults/1994/65%	VAS transformed into SG	Modeling sample: 22–24 with VAS; 5 with SG	972,000
				Direct valuation: 73/16 with VAS; 9 with SG
EQ-5D
U.K.	U.K. community residents age 18+	2,997 with complete data/1993/56%	TTO; VAS	42/13	243
U.S.	U.S. community residents age 18+	3,773 with complete data/2002/59%	TTO; VAS for own health state only	45/15	243
SF-6D	U.K. community residents age 16+	611 with usable data/1998/65%	SG	249/6 (for SF-36 version)	18,000
SF-6D	U.K. community residents age 16+	611 with usable data/1998/65%	SG	241/6 (for SF-12 version)	7,500
SOURCES: Kaplan and Anderson (1988); Feeny et al. (1995, 2002); Torrance et al. (1995); Dolan (1997); Brazier et al. (1999a, 2002); Fryback (2003); Kopec and Willison (2003); Brazier and Roberts (2004); Shaw et al. (2005).

Page 97 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

FIGURE 3-3 Measuring HRQL with Generic Instruments: Community Valuations, Patient Characterizations

can be estimated holistically or in decomposed form. In the holistic approach, respondents are asked to value a composite scenario reflecting a particular combination of functional and other characteristics represented by particular domain levels, using any of the elicitation techniques described above. In the decomposed valuation approach, respondents determine the relative value of each possible health-related attribute for each domain (e.g., pain, mobility, self-care) independently. When multiattribute systems are valued holistically, the weights for individual attributes and attribute levels are estimated through statistical modeling. The decomposed approach employs an algebraic approach to combining the single-attribute estimates. Weighting formulas can be additive or multiplicative under either approach.

Each of the generic indexes used in the case studies and described below has at least one set of values for all possible health states that is based on a general population or community valuation survey, presented in Table 3-4. The features of each instrument’s standard or reference valuation survey are described below.

Quality of Well-Being Scale

History The QWB was developed from the first generic HRQL index, the Index of Well-Being, which was envisioned as part of a general health

Page 98 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

policy model to guide health services and health program investments (Fanshel and Bush, 1970; Kaplan et al., 1976; Kaplan and Anderson, 1988, 1996). Its early introduction and comprehensiveness made it a point of departure for the design of subsequent instruments (McDowell and Newell, 1996; Drummond et al., 1997). Until 1997, when a self-administered version of the QWB was released, the QWB had been available only in an interviewer-administered format. The interviewer-administered questionnaire takes about 15 minutes to complete and the self-administered version 14 minutes (Andresen et al., 1998).

Domains The QWB includes four dimensions: physical function, mobility, social function, and immediate symptoms or problems. The first 3 dimensions produce a total of 46 function levels, including death. In the interviewer-administered version, there are 27 symptom or problem complexes (including no symptom or problem), while there are 58 symptom or problem complexes in the self-administered version. The symptom/problem complex domain and related weights are unique to the QWB among the four indexes considered here.

Valuation The original community-based valuation survey for the QWB included 856 adults from a probability sample of households in San Diego conducted in 1974–75. This survey has been the basis for scoring all versions of the QWB since then. Each respondent in the valuation survey rated 42 descriptive health profiles using a CR procedure, with zero corresponding to death (Fryback, 2003). The survey asked respondents to consider the relative value of being in the health state in question for a single day. This short-term valuation period is unique to the QWB among the indexes considered here. The statistically modeled weighting formula is additive (i.e., it reflects no interactions between attributes) and yields summary index values between 0 and 1.

Availability Age- and sex-specific QWB norms for the U.S. noninstitutionalized population have been estimated from National Health Interview Survey (NHIS) data for 1986–88 and 1994 through a process of mapping NHIS responses to the QWB instrument (Anderson et al., 2004). The QWB questionnaires and weights are available to the public without charge.

Health Utilities Index

History The HUI family of HRQL measures (the HUI Mark 1, Mark 2, and Mark 3) is the second-oldest set of HRQL instruments. The earliest version of the HUI was developed in the late 1970s and early 1980s by Torrance and colleagues at McMaster University, Ontario, Canada, and

Page 99 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

incorporated in a CEA of neonatal intensive care (Boyle et al., 1983). The later versions of the instrument, the HUI-2 and HUI-3 (both of which are current, having succeeded the original index), were developed from the HUI-1. The HUI-2 was initially applied in studies of childhood cancer and was later modified for adult applications. The HUI-3 was developed for general application (Drummond et al., 1997). HUI questionnaires can be either self- or interviewer-administered, and proxy assessment versions (for completion by parents or caretakers rather than the subject) are also available.

Domains The HUI-2 consists of seven domains: sensory, mobility, emotion, cognitive, self-care, pain, and fertility. The eight-domain HUI-3 is closely related to the HUI-2, with the sensory domain split into vision, hearing, and speech; a new attribute for dexterity; and the self-care and fertility categories eliminated. Some of these changes in domains (see Table 3-3) were made to reduce overlap in the constructs measured.

Valuation The values for the HUI-2 were elicited from a random sample of 293 parents of schoolchildren in Hamilton, Ontario, and environs (Torrance et al., 1996). The valuation survey for the HUI-3 included a probability sample of the general adult population (n = 504) in Hamilton, Ontario (Feeny et al., 2002). For both valuation surveys, the VAS was used to elicit values within each domain while SG was used to assess utilities for the “corner states” (where one domain is at its worst level and all other domains are at their best levels). Respondents were asked to consider being in the health state they were valuing for the rest of their life.

The HUI instruments represent a direct application of multiattribute utility theory, which describes how different mathematical functions can be used to represent different types of interdependence among attribute values (Keeney and Raiffa, 1976; Torrance et al., 1982, 1995; Feeny et al., 2002). The HUI-2 and -3 scoring formulas are multiplicative, allowing for a limited form of interaction among domains. Each of the 8 HUI-3 domains has 5 or 6 levels, resulting in 972,000 possible health states and making it in this sense the most detailed of the four instruments with respect to the measurement of generic health-related characteristics.

Availability Since 1990, the HUI-3 has been included in every major Canadian population health survey, and more recently in three major U.S. surveys: the Health and Retirement Survey 2000, the joint U.S.–Canada Health Survey (2002–03), and as part of the U.S. EQ-5D valuation survey. The latter two surveys allow for the calculation of U.S. population norms for the HUI-2 and -3. The HUI-2 and -3 questionnaires are available from their developers for a survey administration licensing fee.

Page 100 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

The EuroQoL-5D

History The EQ-5D instrument has been developed by the EuroQoL Group, a multidisciplinary network of researchers based largely in Europe. This research consortium was set up in 1987 and formally established under Dutch law in 1995, with the shared interest in creating a standard, simple, self-administered, generic HRQL index for use in economic evaluation and clinical outcomes studies. The EQ-5D is the simplest of the generic HRQL indexes. Initially, it was envisioned as an abstracting device that could be used in tandem with more specific HRQL measures, and serve as a bridge among particular studies and surveys (Williams, 1995).

The EQ-5D can be used as a self-administered mail survey or through phone or in-person interviews. It takes about a minute to complete, and data are less often missing than with longer, more complex HRQL surveys. The brevity that is the source of the EQ-5D’s advantages is also, however, a limitation on sensitivity. In a side-by-side comparison to the HUI-3, little difference was found between the EQ-5D and the HUI-3 with respect to their ability to discriminate between respondents with and without a variety of self-reported health conditions. Those who are assigned to the best EQ-5D health state are, however, somewhat more differentiated by the HUI-3 (Houle and Berthelot, 2000). To address this limitation of the current EQ-5D, its sponsors have been developing a five-level version of the instrument, which would improve its ability to discriminate among health states (Kind, 2004).

Domains The EQ-5D descriptive system has five domains: mobility, self-care, usual activity, pain/discomfort, and anxiety/depression. Each domain has the same three levels, designated as no, some, or extreme problems in the particular domain. The total number of health states is 243.

Valuation One standard valuation method for the EQ-5D uses VAS rating. In addition, researchers in a number of European countries (including the United Kingdom) and in the United States have elicited weights for the EQ-5D using TTO methods (Dolan, 1997; Shaw et al., 2005). In the TTO valuation exercises, respondents were asked to regard the health state of interest as lasting for 10 years without change, followed by death. The U.K. TTO index values are the most widely used EQ-5D valuations in the English-language health outcomes literature. The TTO values have been analyzed in two different ways, by applying them directly to the health states for which they were elicited, and by constructing a statistical model in which the added impact is estimated for each attribute. In the statistical model, two interaction terms are included to allow for additional value or greater decrement in value if one or more attributes are at their best or worst levels (Dolan, 1997; Dolan and Roberts, 2002). More recently, Shaw

Page 101 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

and colleagues (2005) elicited EQ-5D values in a nationally representative U.S. sample using TTO methods.

Availability In 2000 through 2002, the EQ-5D was included in the Medical Expenditure Panel Survey (MEPS), a routinely conducted panel survey with roughly 24,000 to 37,000 U.S. noninstitutionalized adult respondents (depending on the year). MEPS is administered by the Agency for Healthcare Research and Quality. This survey provides U.S. national norms for the EQ-5D. Estimates of chronic condition index values have been developed from it as well (Sullivan et al., 2005). These condition-specific values were used in the Committee’s Environmental Protection Agency (EPA) case study and are described later in this chapter. The EuroQoL Group has put the EQ-5D survey instrument in the public domain and thus users do not have to pay licensing fees to administer the survey or analyze data.

The SF-6D

History The SF-6D is the most recently developed generic, preference-based HRQL index (Brazier et al., 1998, 2002). It was designed to take advantage of the most widely used health status profiles in the world, the short-form health survey (SF-36) and its subset profile instrument, the SF-12. Two versions of the SF-6D are available, based on the 36-item and 12-item profiles, respectively. As discussed earlier, health profiling instruments produce quantified measures of health status but do not yield a single, preference-based value for HRQL as do index measures.

The SF-36 originated in research tools designed for the RAND Health Insurance Experiment, and was refined and applied in a series of medical outcomes studies that investigated specific conditions (Patrick and Erickson, 1993; Ware, 2000). Both the 36- and 12-item instruments measure general health in 8 dimensions, and yield 2 summary scores, 1 for physical health and the other for mental health, and 8 single-dimension scores. In the late 1990s, a British research group developed a simplified six-dimension health state classification system derived from the data collected in the SF-36. The SF-6D instruments use 11 items from the SF-36 (Brazier et al., 2002) and 7 items from the SF-12 (Brazier and Roberts, 2004). Their limitations include a floor effect (i.e., relatively high scores for physical function and role performance, at the lowest levels of these domains, compared with other indexes) and the fact that weights are available only from a U.K. valuation study.

Domains This index has six domains: physical functioning, role limitation, social functioning, pain, mental health, and vitality. The number of levels per domain depends on the profile questionnaire (either version 1 or

Page 102 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

2) from which it was derived. For SF-6D states taken from the SF-36 version 2, each domain has from 4 to 6 levels, defining a possible total of 18,000 health states.

Valuation A representative sample of 836 residents of the United Kingdom participated in interviews and ranked and then valued a total of 249 SF-6D health states from the SF-36 version 2 (each participant rated 6 health states) using a SG technique. A scoring algorithm for the six-dimension model was developed using multivariate statistical methods. The same valuation survey was used to develop the scoring algorithm for the SF-12 version of the SF-6D as for the SF-36 version. There are also algorithms available to score responses to SF-12 and SF-36 version 1, which have fewer response categories than version 2.

Availability The attractiveness of the SF-6D instruments lies in their derivation from widely collected health profile data sets. However, scoring these instruments requires access to item-level data rather than the more widely reported physical and mental health summary scores. Item-level data are available for the SF-12 version 1 in MEPS for years 2000 to 2002. These data make it possible to calculate national age-specific population norms for the SF-6D as well as condition-specific norms.

The SF data sets and the latest versions of the SF-36 and SF-12 (version 2) questionnaires (from which the SF-6D questions were chosen) are proprietary, and must be licensed for use from the Medical Outcomes Trust. The SF-6D scoring algorithms are available free from their authors. Version 1 of each instrument is available free, and the algorithms to compute SF-6D scores from these versions are available from the authors (Brazier and Roberts, 2004).

Condition-Specific Indexes and Applications to Special Populations

Many HRQL instruments have been developed for specific diseases or conditions, such as asthma, cancer, depression, diabetes, and rheumatoid arthritis. Others have been developed for specific populations, such as children or nursing home residents.⁵ These “targeted” instruments have been used in many health outcomes studies and CEAs of medical interventions or

⁵

See the web-based Quality of Life Instruments Database, developed and maintained by the Mapi Research Institute and Mapi Research Trust, Lyon, France. It contains approximately 500 HRQL instruments, including generic instruments, condition-specific instruments, and population-specific instruments and is located online at http://www.qolid.org/ind_home2004.html.

Page 103 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

prevention programs. More specialized classification systems can often provide greater sensitivity to changes in HRQL relevant to a particular condition or patient population than can generic instruments.

Condition-Specific Instruments

Evaluations to support the marketing of pharmaceutical therapies and to assess the effectiveness of other therapeutic and diagnostic interventions have led to the development of disease-specific health profiling instruments and preference-based indexes. Not all disease-specific instruments yield summary index values for calculating QALYs, however. Such profiling instruments may be used in conjunction with direct valuation of specific health states by patients.

Cost-effectiveness analysis, as compared with clinical outcomes research, depends on a measure of effectiveness that captures all aspects of health-related functioning and quality of life, and cannot rely on those that exclusively measure changes relative to a particular organ system or disease. The wider compass of the domains and attribute levels of a generic HRQL instrument, which make it less attuned to any particular health condition and its impacts on symptoms and function, also ensures that it can be applied broadly and provide comparability of results across health conditions. Although the PCEHM recommended that analysts use generic indexes, it concluded that, if disease-specific classification systems are used, health states still should be framed in terms of overall health. If necessary, default values should be assigned for domains found in generic indexes (e.g., social or role function) so that results from targeted instruments can be mapped onto a generic measure for comparability (Gold et al., 1996b).

In our case study analysis of the National Highway Traffic Safety Administration (NHTSA) rule establishing installation standards for child restraint anchoring systems, the Committee included a specialized instrument developed to assess the impact of traumatic injury on long-term functioning, the Functional Capacity Index (FCI), to value health effects (MacKenzie et al., 1996). Although the FCI is not a preference-based index, it does produce health state values, similar to those of the generic indexes, that reflect the relative impact of different traumatic injuries on long-term functioning (MacKenzie et al., 1996, 2004).

In this case study, both the health effects being measured (traumatic injuries and their long-term functional impacts) and the population affected by the regulation (children under 6 years of age) presented particular challenges. Although the FCI has been designed for application to adults, and in that respect offers no advantage over the generic indexes, it is designed for use 12 months after a traumatic injury. One version of the FCI predicts this long-term functional capacity from the categorical injury severity data that

Page 104 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

NHTSA routinely collects. Appendix A describes the information available for estimating the kinds of injuries prevented by the child restraint rule and the Committee’s approach to estimating long-term HRQL impacts for injuries to infants and young children. A fuller description of the FCI is included at Appendix B.

Special Populations

Applications of HRQL instruments to special populations raise several issues. One is the adaptation of generic instruments for use with language, ethnic and racial, and socioeconomic subgroups. A basic premise that underlies the use of an HRQL measure cross-culturally is that there is a universal or general quality-of-life concept that can be measured by a common set of indicators (Anderson et al., 1996). We have not attempted to determine the validity or consistency of specific generic instruments across population subgroups or cross-culturally; these sorts of evaluations have not been conducted in any systematic fashion. However, several generic instruments have been tested in subpopulations and/or are available in several languages. The EQ-5D is available in more than 80 languages, with all versions conforming to guidelines established for the instrument by the EuroQol Group. The recent U.S. EQ-5D valuation survey oversampled non-Hispanic black and Hispanic respondents to provide reliable subgroup estimates for these populations. The HUI questionnaires are available in 15 languages, and additional versions are under development.

Another important issue for HRQL measurement is the application of generic instruments, and their underlying valuations, to children. Children’s HRQL measurement has been handled in several ways. First, parents and clinicians have served as proxy respondents, both in characterizing children’s HRQL and in valuing children’s health states and outcomes. Second, specialized instruments—frequently condition-specific instruments such as those for asthma or childhood cancer—have been developed for use with children or their proxies. Third, generic instruments have been designed or adapted for use specifically with children. For example, the HUI-2 was developed for use with children and was valued by parents (see Appendix B), and a “child-friendly” version of the EQ-5D with rephrased questions has been developed (Hennessy and Kind, 2002).

None of these strategies to address the special challenges of predicting and capturing changes in the HRQL consequences of illness and injury in childhood is entirely satisfactory. Although using standard generic indexes to value children’s health outcomes allows for comparability with results for adults, these instruments do not capture many aspects of children’s health-related well being. At the same time, while condition-specific HRQL instruments tailored for children may be more sensitive to changes in condi-

Page 105 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

tion, they do not permit comparisons across different types of pediatric illnesses and impairments. Finally, both children and parents have important perspectives on children’s HRQL, and instruments that focus on just one or the other offer only a partial view. Box 3-4 summarizes some of the considerations unique to valuing children’s health outcomes and quality of life.

A recent literature review of QALY-based cost-effectiveness studies in pediatric populations found that the majority of such studies did not adhere to PCEHM recommendations to use generic indexes, SG or TTO valuation, and values elicited from the general population (Griebsch et al., 2005). The authors were unable to determine whether departures from these recommended practices were a result of ignorance or disregard of the recommended practices, or were instead a conscious choice to use an alternative approach that the researchers deemed more appropriate for children. For example, one concern with the use of parents as proxy respondents for their children is that parents may not be able to distinguish their own preferences from those of their child; this concern may lead researchers to use clinicians as proxy respondents instead (generally considered to be a less desirable approach because clinicians are less likely to be familiar with the ongoing HRQL impacts on their patients than are daily caregivers). Griebsch and colleagues argue that the evidence base for developing best practices, both in the characterization and description of health states and in valuing them for children, has yet to be established. Thus they conclude that the use of QALY-based CEA is not ready for standardization when used in pediatric populations.

Another challenge is that chronic illness or severe injuries that occur in childhood often have long-term impacts. Thus the requirements for appropriately assessing and valuing the impacts on HRQL change over time. Approaches that are appropriate for the childhood impacts may be less appropriate for impacts in the adult years and vice versa. However, if different instruments were developed and used for different ages, consistency could become a concern. This problem is compounded by the fact that the long-term impacts of childhood illness and injuries are often difficult to predict and can involve many aspects of well being, including social development and educational achievement.

The Committee’s child restraints case study provides an example of these difficulties in prediction. The case study used generic HRQL instruments that included attribute descriptions that were inappropriately described or valued for young children, who cannot, for example, normally perform many self-care activities independently. The experts who applied these instruments noted that it was difficult to assess the long-term implications, and differed somewhat in their assessments of long-term effects.

Page 106 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

BOX 3-4
HRQL Measurement for Children

Evaluating the HRQL of children poses particularly difficult choices and challenges about which there has been little consensus or resolution (Griebsch et al., 2005). These challenges relate to (1) the conceptualization of children’s HRQL (do instruments developed for adults reflect the appropriate dimensions for children at particular developmental stages?); (2) the ability of children or their proxies to describe relevant aspects of children’s health states adequately; and (3) the valuation of children’s HRQL, including whose values should be reflected in the valuation and, to the extent that children’s own valuations are desired, how these values might be elicited.

The construction and domains of HRQL instruments developed for adults may not be well suited to capture children’s experience and functioning (Eiser and Morse, 2001a). Childhood is qualitatively different (culturally distinct) from adulthood, and ideally HRQL instruments for children should take account of particular developmental stages and thresholds (Landgraf and Abetz, 1996). This has implications both for the scope of the HRQL instrument and its format; it should measure developmentally important aspects of functioning, such as cognitive abilities, motor skills, social interactions, and body image, for example, and be calibrated for administration to children (or their proxies) at different developmental stages and ages. In addition, because children undergo relatively rapid changes in functional capacities, such as in self-care and mobility, at different rates, it is difficult to determine whether any observed changes are due to normal development or are the result of illness or intervention.

Generic HRQL survey instruments have been developed or modified for administration to children, and even more have been developed to assess HRQL in children with a specific disease. In a survey of the field of pediatric HRQL instru-

Assignment of Health States by Experts or Other Proxies

Proxies are used in HRQL assessments for a variety of reasons. As just discussed, parents may be asked to serve as proxies for children; caregivers, guardians, or family members for temporarily or permanently incapacitated adults (and children); and clinicians for patients. With the aging of the U.S. population and the growing incidence of conditions affecting cognitive functions, the use of proxy respondents for incapacitated adults can be expected to increase.

Proxies may be asked to (1) establish the relative values of different health states, or (2) describe or locate another person’s HRQL using a multiattribute classification instrument. Pickard and Knight (2005) distinguish two proxy perspectives. The first is when a proxy describes another

Page 107 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

ments, Eiser and Morse (2001b) identified 19 generic instruments developed or adapted for pediatric use and 24 condition-specific pediatric instruments. These pediatric instruments in some cases elicit responses from the child, in others from the parent, and in yet others from both parent and child. These authors argue that the perspectives of both parents and children are important for gaining a good understanding of children’s HRQL. Although parent and child health state characterizations have shown good agreement in domains that reflect physical functioning, activity, and symptoms, characterizations in domains that reflect emotional or social health demonstrate less agreement (Eiser and Morse, 2001b). Clinicians tend to identify fewer deficits in HRQL domains when serving as child proxies than do parents and teachers (Eiser and Morse, 2001b). Most of the attention in developing these instruments has been on the characterization or description of health states in children, rather than on valuation. Where valuation is specifically addressed, as in the HUI-2, the values of parents have been elicited.

Likewise, the valuation of children’s health states raises issues of whose preferences to take into account and how to measure preferences in children with limited but maturing cognitive and other capacities (Petrou, 2003; Matza et al., 2004). Many argue that children’s valuation of their own health states should be included along with those of their parents in the societal value of these effects (Eiser and Morse, 2001a; Petrou, 2003). One study that evaluated the ability of children with asthma, ages 7 through 17, to comprehend and provide reliable responses to questions eliciting their preferences for different health states concluded that at least sixth-grade reading skills were necessary for SG exercises and that at least second-grade reading skills were necessary for using a VAS technique (Juniper et al., 1997). In addition to the challenges that valuation questions and elicitation techniques pose for children’s and adolescents’ valuation of their own health, a further valuation issue is how to include the effects of children’s health on the well-being of parents and caretakers, as these effects are not captured by individual-level HRQL measures.

person’s HRQL or assigns a relative value to HRQL as the person would rate himself. The second is when the proxy is asked to make those judgments about another’s HRQL from the proxy’s own perspective. In most cases the proxy perspective is understood in the sense first described; however, proxy surveys can be ambiguous and it is important to ascertain what proxy questionnaires and responses actually reflect.

This section does not discuss the important and well-documented issue of self-versus-proxy concordance and discordance in characterizing and valuing HRQL and functional capacities. Here we are considering experts as proxies in characterizing regulatory health endpoints in terms of the specific attributes of a multiattribute generic index. This exercise differs from individual self-descriptions on generic HRQL surveys or even experts’ proxy descriptions for individual patients. Still, some of the findings from

Page 108 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

clinical and institutional settings, namely, that caregiver and professional proxies tend to rate quality of life as poorer within some domains and overestimate disabilities of patients, may carry over to expert elicitation exercises such as those described here (Magaziner et al., 1988; Rothman et al., 1991).

In the context of regulatory analysis, proxies may be most often used to describe the impacts of a health condition using the characterization scheme of a particular generic index. If it is not feasible to conduct a new survey of the population affected by the regulation to determine health state values, an alternative is to ask clinician experts (i.e., physicians and others involved in patient care) to characterize the health conditions of interest using a generic instrument. In these cases, index values for health states are obtained separately from community or general population valuation surveys. The Committee explored this approach in the three case studies conducted as part of its investigations. (See Appendix A for synopses of the case studies; and Robinson et al., 2005a,b,c, for complete reports.) Because of limited time and resources for preparing the case studies, we skipped or abbreviated several steps in eliciting expert judgments that are often recommended. Thus this discussion includes good practices in expert elicitation that we did not follow in the case studies. Box 3-5 summarizes the steps in using experts to assign the health endpoints specified in a regulatory analysis to a generic HRQL index to estimate the regulatory intervention’s effectiveness.⁶^,⁷

In regulatory analysis, health impacts are predicted based on one or more studies of the risks associated with a particular hazard. The descriptive information available for regulatory analysis may be developed from statistical reporting systems or epidemiological studies, and sometimes from animal studies and laboratory results developed for other purposes. These studies vary in the extent to which they provide detailed descriptions of the health impacts avoided or the characteristics of the affected population.

Assessing the health impacts associated with regulations differs from assessing individual patients’ HRQL, because of both the limited risk information available and the lack of identifiable affected individuals. It is often

⁶

For more information on expert elicitation practices, see Morgan and Henrion (1990), especially Chapters 6 and 7; Keeney and von Winterfeldt (1991); and Bedford and Cooke (2001), especially Chapter 10. Although these sources discuss practices developed in the context of risk assessment, they are generally applicable to a broad range of contexts involving expert judgment, including the estimation of HRQL.

⁷

As already mentioned, federally funded survey research involving 10 or more respondents must be approved by the Office of Management and Budget under the Paperwork Reduction Act. This requirement makes a valuation strategy that employs experts more attractive relative to conducting population surveys, but can also limit the number of experts involved in any given survey.

Page 109 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

necessary to supplement the information from these studies with information from other sources (as in all of the Committee’s case studies), for example, to provide the data on symptoms, treatment, duration, affected population, and life expectancy needed for a QALY-based CEA. Depending on the available data, it may be necessary to characterize an average case, or a set of typical cases, that reflect the variation in health impacts. For example, the Committee’s case study of nonroad diesel air emissions considered three severity levels for chronic bronchitis, and split the cases of cardiac disease into endpoints with and without angina and congestive heart failure. The results of the assessment suggested, however, that the HRQL instruments were not sensitive enough to distinguish between some of the endpoints. Pretesting these descriptions would have allowed us to determine the extent to which different scenarios are needed for the expert assessment process.

Regulatory analyses track the impacts of health effects such as cases of reactive arthritis, myocardial infarctions, or severe injuries over the full course of the disease or injury, which may include the predicted remaining life spans of the affected individuals. Asking clinical experts to estimate average HRQL impacts across time may be even more difficult and uncertain a task than is estimating HRQL for the typical or average patient with the condition.

Despite the difficulties and uncertainties engendered by an expert elicitation approach to applying generic indexes, as well as its cost, several features of regulatory analysis make such approaches potentially necessary. First, the health states of interest may differ from those measured in clinical outcomes studies (as discussed in a following section on use of index values from prior studies). For example, the characterization of health effects from environmental risks such as particulate matter in the air or carcinogens in drinking water may be more vague than the specific disease states described and assessed in clinical CEAs.

In addition, expert assignment allows one to focus on the HRQL impact of a single health effect in isolation from unrelated co-morbid conditions. This is important in regulatory analysis if the regulation does not avert the co-morbidities. For example, regulations that reduce diabetes incidence may also prevent the related heart disease, but will not prevent other types of illnesses. In the expert assignments conducted for each of the three case studies, nearly all the clinical experts reported that they considered the HRQL impacts of the condition of interest in isolation from potential co-morbidities. At the same time, this location of a single condition of interest on a health state classification instrument necessitates an additional step in calibrating the resulting values on a scale that reflects overall health, as described in Appendix A.

Page 110 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

BOX 3-5
Expert Assignment of Health States Using Generic HRQL Instruments

Several protocols for the formal elicitation of expert opinions regarding uncertain quantities are summarized in Morgan and Henrion (1990) and Bedford and Cooke (2001). These approaches vary in the details of their implementation; no generally agreed-on set of “best practices” has been developed specifically for use in HRQL assessment. However, these approaches generally consist of five basic steps:

Develop and pretest descriptions of the health endpoints. Clear, unambiguous descriptions of the health states that will be the subject of the expert assignment process will help produce more reliable judgments. As illustrated in the Committee’s case studies (see Appendix A), the basis for these descriptions should be the epidemiological literature and other materials used to estimate the types and numbers of cases of illness or injury averted by the regulation. A key issue is determining the extent to which these health states should be disaggregated for the assignment process to reflect different severity levels, disease phases, or other subcategories that may lead to variation in the attribute assignments. These descriptions should be reviewed and pretested (e.g., by asking a group of experts to complete the assignment process) to ensure that they provide the needed information and reflect the appropriate level of disaggregation of more globally described health conditions that persist and change over time.
Identify and recruit experts. The second step involves identifying the experts (i.e., clinicians) who will be involved in the assignment process. The starting point for this step is the development of criteria for the selection process. These criteria may address the types of patients with whom each expert is familiar; taken together, the experts’ range of experience should relate to the individuals whose health may be affected by the regulatory action in terms of age, health conditions, socioeconomic characteristics, and/or geographic distribution as relevant. In addition,

Approaches Based on Population Survey Data

The PCEHM urged the development of a standard catalogue of index values for “well-described health states” that would facilitate valid comparisons of CEA across conditions and illnesses and eliminate the need for collecting primary data for every analysis (Gold et al., 1996b). The PCEHM envisioned using generic indexes for which health states were valued by a general population or community sample, while people with particular conditions (rather than experts, as just described) characterized those health states according to the generic instrument’s domains and attribute levels.

Researchers have responded to this call with different approaches. Table 3-5 presents an overview of efforts to develop sets of population

Page 111 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

the criteria should identify the range of specialties needed to provide a complete perspective on different aspects of the HRQL impacts. For example, for traumatic injuries to children, relevant fields could include trauma surgery, orthopedics, general pediatrics, and rehabilitation medicine. Once the criteria are developed, relevant experts can be identified by querying professional contacts and through professional and scientific organizations.
Train experts in the assignment exercise. The third step involves educating the experts in the assignment process so they have a common understanding of the health effects to be assessed, the attribute descriptions to be applied, and the overall task. An important component of this step is making the experts more aware of their own judgment processes and biases. This step is generally best accomplished by convening a workshop or training session.
Conduct expert assignments. The fourth step involves asking the experts to assign the attribute levels to each endpoint. To the extent possible, experts should be asked to give a range of values rather than point estimates. For example, they could be asked to distribute a representative group of 100 patients across the different attribute levels to indicate the percentage they would expect to fall within each category. Ideally, this task should be completed in a structured one-on-one interview with each expert by a trained member of the project team. In-person interviews are generally desirable, but phone interviews or mail-in questionnaires may be used if necessary.
Assess results. The final step involves analyzing the results of the assessment. This step should involve a feedback loop: asking each expert to verify the results and discuss the rationale for their assignments. The expert elicitation literature reflects some debate about whether, and how, to combine results across experts, including whether to allow interaction or to treat each set of results separately. Consequently, either of two approaches are reasonable. The group could be required to meet and discuss the attribute assignments until it reaches consensus; alternatively, the experts could exchange information anonymously, using processes generally referred to as “Delphi methods.” In either case, the results should be reported as a range of values and not collapsed to a single point estimate.

based condition-specific index values during the past decade. The most recent versions of two of these conceptually distinct approaches are described below.

Catalogues of Chronic Condition HRQL Values

Sullivan and colleagues (2005) used the MEPS to develop EQ-5D index values for a number of chronic conditions, based on pooled MEPS data for the years 2000, 2001, and 2002 for respondents ages 18 or older. MEPS includes data on sociodemographic characteristics as well as responses to the EQ-5D health status questionnaire; valid responses were received for about 38,000 unique respondents. The researchers weighted respondents’

Page 112 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

TABLE 3-5 Sources for Population-Based, Condition-Specific HRQL Values

Source	Sampling Frame	Sample Size/Type/Year
Fryback et al. (1993)	Wisconsin township	N~1,400/Random sample adults ages 43–84 years/1991–1992
Gold et al. (1996a)	U.S. civilian, community-based population, ages 25–74	N~14,400/NHANES I probability sample/1971–1975; N~10,200/NHANES I Epidemiological Followup Study (NHEFS)/1982–1984; N~8,300 (reinterviewed in 1987)
Gold et al. (1998)	U.S. civilian community-based population, all ages	N~84,400/merged NHIS samples/1987–1992
Sullivan et al. (2005)	U.S. civilian community-based population, age 18+	N~28,800/MEPS nonduplicated sample/2000+2001
Cutler and Richardson (1999)	U.S. civilian community-based population	N~110,000/NHIS/1990
Stewart et al. (2005)	From Fryback et al. (1993)	N~1,400 respondents with complete QWB data/See Fryback et al.
NOTES: EVGGFP = Five-item global health status measure: excellent, very good, good, fair, poor MEPS = Medical Expenditure Panel Survey NHANES = National Health and Nutrition Examination Survey NHIS = National Health Interview Survey SEER = Surveillance, Epidemiology, and End Results Program

attribute scores using a model derived from a valuation survey of a representative sample of the U.S. adult population (Shaw et al., 2005; see Table 3-4 for a summary of the survey). They then calculated mean EQ-5D condition values for those respondents reporting each condition. These “with condition” values reflect both the condition itself and any co-morbidities, indicating the overall health of the individual. To separate out the effects of these co-morbidities, the researchers then used regression analysis to determine the marginal impact of the condition of interest alone, calculated as a decrement from median population health. These marginal decrements in EQ-5D values controlled for the effects of age, gender, race, ethnicity,

Page 113 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Survey Instrument(s)	Elicitation Method	Condition-Specific Index Values
SF-36; QWB; chronic medical condition; EVGGFP rating	TTO; rank ordering of health states	28 conditions
NHEFS mapped onto HUI	Constructed HUI based on responses to NHEFS questions	18 conditions
NHIS self-rated health (EVGGFP scale) activity limitations	Derived from HUI-2 weights	130 illnesses and conditions
EQ-5D	TTO	68 conditions
NHIS self-rated health status; chronic conditions from NHIS + SEER	Statistically inferred	21 conditions and 2 co-morbid health states
See Fryback et al.	Statistically inferred, based on Fryback et al.	33 chronic conditions

income, and education as well as co-morbidities. The marginal decrements can be added across conditions.

In their article, Sullivan et al. (2005) report the results for 74 clusters of chronic conditions (clinical classification categories) and for 10 priority conditions of particular interest to health care researchers. Estimates for individual conditions (by three-digit International Classification of Disease Version 9, or ICD-9, codes) are also available from the authors. The Committee used preliminary estimates of individual condition (ICD-9) marginal chronic condition decrements as one valuation approach in the air quality case study.

Page 114 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Statistically Inferred HRQL Values

Another approach to obtaining relative values for chronic conditions is statistical inference. Researchers have demonstrated that self-reported health status predicts changes in functional status and mortality and is correlated with specific aspects of health. Cutler and Richardson (1997) first proposed this approach using National Health Interview Survey data. The approach has been developed further in more recent work by Stewart and colleagues (2005) using a data set that included extensive health status information and HRQL valuations (see Fryback et al., 1993, described in Table 3-5).

Stewart et al. used ordered probit and ordinary least squares regression analyses to examine the effect of specific symptoms and impairments reported on the QWB survey on self-rated overall health status and, separately, on TTO valuations of current health. They estimated health effects as analogous in form to disutility weights (i.e., index-value decrements) for 30 chronic conditions, based on the likelihood of people with each chronic condition experiencing each symptom/impairment and on the regression coefficients for each symptom/impairment. The approach considers interaction effects between pairs of symptom/impairments.

The earlier model developed by Cutler and Richardson was adapted for regulatory analyses by the Food and Drug Administration (FDA) to estimate HRQL values for reactive arthritis and heart disease (Scharff and Jessup, 2001). Although this general approach must be used with caution because it does not directly value health states, it could stimulate the development of descriptive systems based on symptoms and conditions. Condition-specific values could be particularly useful in the context of regulatory analysis because the risk assessments underlying these analyses frequently report health-related impacts in terms of cases of particular diseases.

Incorporation of Health Profiles and HRQL Questions and Instruments in Routine Population Surveys

Routine and periodic national health surveys in the United States have included various health profiles, HRQL questions, and generic HRQL instruments over the past few decades. Table 3-6 lists the major surveys and the profiles, questions, and instruments they have included. Coordination among and long-term planning for these data collection activities have been minimal.

The apparently ad hoc and sporadic collection of HRQL data stems from several circumstances. First, multiple agencies with different if overlapping missions and interests have collected HRQL data. Second, every survey is constructed with both budgetary constraints and constraints related to response burden. Competition for time and space on questionnaires

Page 115 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

TABLE 3-6 HRQL Measurement in National Health Surveys

Survey	Sample/Format/Periodicity	HRQL Information Collected	Response Rate/Other Comments
National Health Interview Survey	~94,000 persons of all ages in 37,000 households/personal interviews/annual	SRHS, health conditions, ADL, IADL, “Healthy Life Expectancy” calculation based on LE and SRHS	87%; includes ~12,000 < 18 years; proxy response for children < 12; excludes institutional residents, military
Medical Expenditure Panel Survey	15–19,000 adults 17+ years/personal interviews and phone/annual, in 2-year cohort panels	EQ-5D in 2000–03, SF-12 2000–present, ADL, IADL, functional disabilities, usual activities, chronic conditions	85–88% (2001) EQ-5D and SF-12 self-administered in mail survey
Behavioral Risk Factor Surveillance System	~300,000 adults 18+/telephone/annual, continuous	SRHS, “Healthy Days” measure	53% (median); range: 32–66%; conducted within each state by health department
National Health and Nutrition Examination Survey	~5,000 adults and children/personal interview, physical exam, lab tests/annual	“Health Days” questions administered to all participants 12+ years	Each survey focused on particular health problem in addition to core data
Medicare Current Beneficiary Survey	~16,000 Medicare beneficiaries/personal interview/annual	SRHS, ADL, IADL, chronic conditions
Medicare Health Outcomes Survey	~200,000 initially, 60,000 follow-up (longitudinal)/mail with phone follow-up/annual	SRHS, “Healthy Days,” SF-36, ADL, chronic conditions	Survey of Medicare beneficiaries in managed care plans; 1,000 respondents/plan
Medicare Fee-for-Service CAHPS	~200,000/mail with phone follow-up/annual	SRHS, SF-12, ADL	600 Medicare beneficiaries in each geographic area
Medicare+ Choice CAHPS	~200,000/mail with phone follow-up/annual	SRHS	600 managed care enrollees per plan area
NOTES: ADL = activities of daily living; IADL = immediate activities of daily living; LE = life expectancy; SRHS = self-reported health status; Healthy Days measure: core includes four questions encompassing SRHS, number of physically and/or mentally unhealthy days within the past month, and restricted activity days within the past month. SOURCES: Fleishman and Lawrence (2004); Haffer (2004); Moriarty (2004); CDC (2005); NCHS (2005).

Page 116 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

is keen for both reasons. Third, there has not been an ongoing focus or forum for coordination among agencies that design and field health surveys. Following the release in 1998 of the Institute of Medicine report, Summarizing Population Health, an Interagency Working Group on Summary Measures of Population Health was formed within the Department of Health and Human Services. The Working Group met for several years but is now inactive. Increased reliance on HRQL measurement for regulatory analysis will require more regular and coordinated surveys for valuation and establishment of population baselines.

Health State Index Values from Prior Studies and Benefits Transfer

When it is not practicable for analysts to conduct primary research on HRQL values for the specific health states and affected populations addressed by regulatory CEAs, another alternative is to use estimates from published research, commonly referred to as “off-the-shelf” values or preference weights.⁸ This strategy, known as “benefits transfer” by welfare economists, refers specifically to using values estimated in one context (the “study scenario”) in a new context (the “regulatory scenario”). Generally, these contexts differ in at least some respects, for example, in the specific details of the health state addressed or in some of the characteristics of the affected population. Because of these differences, a benefit transfer strategy is rarely the preferred approach; as noted in Chapter 2, the Office of Management and Budget (OMB) guidance describes it as a “last resort” because it may introduce uncertainties and biases of “unknown magnitude” (OMB, 2003a). However, the Committee’s review of current practices (Robinson, 2004) suggests that regulatory agencies often rely on transfers because they lack the time and resources needed to conduct new primary research.

Guidance for transfer of benefit values has been relatively well developed in the context of regulatory analysis and natural resource economics (see, e.g., Desvousges et al., 1998). While the specific criteria for study selection are described in various ways in different sources, they generally involve consideration of both the quality and the applicability of the study. “Quality” refers to the extent to which the study adheres to generally accepted best practices for the particular type of study. It also relates to the accuracy, reliability, and completeness of the underlying data sources, and the appropriateness of the approaches used for sampling and survey administration, including sample size, response rate, and estimated standard er-

⁸	We avoid the term “off-the-shelf” and instead refer to “health state index values from prior studies” and “benefits transfer” for consistency with the terminology used in guidance for regulatory analysis.

Page 117 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

ror. The appropriateness of the techniques used for statistical or econometric analysis should also be considered.

“Applicability” includes the similarity between the health effect assessed in the study and the effect addressed in the regulatory analysis, and the similarity between the populations affected. For the health effect, similarities in factors such as symptoms, treatments, severity, and duration should be considered. For the population affected, factors such as age, gender, and/or baseline health may be of interest.

In addition, analysts should consider the extent to which there are opportunities to adjust the study data to better match the regulatory scenario. In some cases, the researchers may be willing to supply the original study data so that the results can be reestimated using different techniques or breakouts (to better match the characteristics of the affected population). Results can also be combined across studies using meta-analysis or other statistical methods. See, for example, Tengs et al. (2001) and Tengs and Lin (2003) for a meta-analytic approach to estimating index values for stroke.

The technique of benefits transfer relies heavily on the judgment of the analysts conducting the analysis. Thus, it is important that analysts be explicit about the studies reviewed, the criteria used to select particular studies and values from these studies, and the uncertainties involved. Where more than one study provides suitable values of reasonable quality and these values differ noticeably, the range of estimates should be used in a sensitivity analysis or in a probabilistic model (see Briggs, 2001; Briggs et al., 2002; Claxton et al., 2005).

These considerations and strategies are generally applicable when transferring health state values for QALY-based CEA. In recent years, researchers at the Center for Risk Analysis at the Harvard School of Public Health developed a comprehensive, open-access registry of CEAs. It contains detailed information on analyses published over a 25-year period in the health and medical literature that use HALYs as the effectiveness measure (Bell et al., 2001). It should be noted that many medical outcomes and effectiveness studies that do not include a CEA also estimate and report HRQL values and the CEA Registry does not include these sources of original HRQL values. Nevertheless, this registry is a convenient source of health state index values for regulatory CEA and has been used by agencies such as FDA and EPA for this purpose. Box 3-6 describes the registry in greater detail.

In work commissioned by the Committee to support its case study effort, Brauer and Neumann (2004, 2005) reviewed health state values in the CEA database with respect to their applicability to the EPA case study of air quality improvements. Informed by the development of the EPA case study analysis and Brauer and Neumann’s work, we identified the follow-

Page 118 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

BOX 3-6
The CEA Registry

The CEA Registry is a repository of information that currently includes more than 500 distinct health and medical CEAs published between 1976 and 2001. The database was developed through a computerized search of the English-language literature using the medical subject headings and/or text keywords “quality-adjusted,” “QALY,” and “cost-utility.” Two trained readers independently abstracted data on the health state description, corresponding point estimates and ranges for health state index values, method of elicitation and the source of the estimates (e.g., general population or patient samples), cost-effectiveness ratios, and a wide variety of reporting practices. The readers met to reconcile their results, and a third reviewer adjudicated any discrepancies. Details of this work are described on the Registry website, along with selected data from each study (http://www.tufts-nemc.org/cearegistry).

SOURCES: Chapman et al. (2000); Neumann et al. (2000); Bell et al. (2001); Harvard Center for Risk Analysis (2003).

ing steps and questions that regulatory analysts should consider when reviewing available studies. These build on recommendations by the PCEHM as well as the benefits transfer guidance referenced earlier. There is some arbitrariness to the ordering of selection criteria used in this approach; selecting appropriate values for a particular set of health endpoints involves discretion and requires judgment on the part of the regulatory analyst. The format for reviewing potentially applicable index values is also useful for deriving possible ranges of values for uncertainty analysis.

The first step in applying index values from the research literature is to define the health endpoints in the regulatory analysis as precisely and accurately as possible. Because many disease and injury-related conditions are dynamic processes, regulatory health endpoints may be best represented by a series of health states, each with its own HRQL implications, which change over time. This could require either a relatively complex model of different health states that represent disease progression or, alternatively, simplifying assumptions about states of health and functioning on average over an extended time.⁹Box 3-7 and Appendix A describe how these chronic health states were defined for the Committee’s air quality case study, including our review of health state values in the CEA Registry.

The next step is to assess the applicability of the health states available

⁹	See Weinstein et al. (1987) and Sullivan et al. (2005) for examples of models of disease progression.

Page 119 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

in the published literature, that is, the extent to which a particular study addresses the same health state as defined by the risk assessment that underlies the regulatory analysis. The applicability of a particular study’s health state values depends most importantly on similarities in the clinical descriptions of the health states, such as the severity of disease and the timing and duration of any treatment, as well as characteristics of the study population, including age, sex, and co-morbidities.

The third step is to assess the appropriateness of the method used to elicit the health state values. Most importantly, analysts should consider the following features: type of population surveyed, elicitation technique, and sample size. As already noted, the PCEHM recommends using index values derived from a community valuation survey in CEAs intended to inform broad societal resource allocation decisions. Deriving health state index values from a sample that represents the population subject to the costs and benefits of the regulation to the maximum extent possible will enhance the credibility of the estimates. In Chapter 4 we consider the implications of the valuation perspective in greater depth.

Within the category of elicitation techniques, values from a generic instrument (such as the EQ-5D or the HUI) or those elicited directly with TTO or SG are preferred. Less desirable are values elicited by an RS technique or values from clinicians, other experts, or author judgment. Larger sample sizes are better than smaller ones, and more recent studies are preferred to older studies, if other characteristics of the studies are comparable.

This format for reviewing potentially applicable health state index values is also useful for deriving possible ranges of such values for sensitivity analysis.

The Committee’s review of published studies for applicable health state values for the air quality case study revealed both the advantages and drawbacks of using index values from prior studies for regulatory analysis. On the positive side, it confirmed that the published literature can be a fruitful source of health state values for at least some regulatory health endpoints, and that using index values from the published literature is a relatively simple and inexpensive approach. On the other hand, the case study team found that the health state descriptions in published studies often did not match the description of health endpoints as described in the underlying health research used in regulatory risk assessments, and may not correspond on dimensions such as disease severity, patient age, or baseline risk factors. In addition, quality varies considerably across outcomes studies and CEAs in the literature, and published studies are not always clear about their methods.

Because published studies employ different populations and elicitation methods, the individually “best” estimates for particular health endpoints

Page 120 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

BOX 3-7
An Example Using Health State Index Values from Published Studies

The Committee’s case study based on the EPA nonroad diesel engine rule (EPA, 2004a,b) provided an opportunity to investigate the use of published health state index values to develop estimates of the HRQL impacts of air quality improvements. The nonfatal health endpoints (disease conditions) assessed in the case study were “chronic bronchitis” and “myocardial infarction” (MI), that is, the course of cardiac disease following a nonfatal heart attack, based on the risk assessment studies used by EPA (Abbey et al., 1995; Peters et al., 2001).

As commissioned by the Committee, Carmen Brauer and Peter Neumann of the Harvard Center for Risk Analysis searched the CEA Registry’s catalogue of preference weights, updated through 2001, to identify estimates related to these regulatory health endpoints. They found 127 health states and preference weights in the respiratory and cardiovascular disease categories published since 1994.

The Committee case study team reviewed the original studies that appeared to be most promising as a source of health state values in the case study. Two studies were selected as the basis of the chronic bronchitis and post-MI health endpoints. For chronic bronchitis, estimates came from a Canadian study of alternative treatments for patients with acute exacerbations (Torrance et al., 1999). Over a 1-year period, the researchers asked the patients to complete assessments (including the HUI-3 questionnaire) after each acute exacerbation as well as once every 3 months. During the study period, the health state index values for these patients averaged 0.79 or 0.76 (depending on the treatment), calculated with the standard community-based valuation formula for the HUI. The mean value for both groups combined was approximately 0.78, when weighted by the number of participants in each group. This estimate was used for all cases of chronic bronchitis.

within a regulatory analysis may not be derived from consistent methods. For example, the published index value estimates for cardiovascular and respiratory conditions used in the EPA case study were based on different generic instruments, the EQ-5D with U.K. population values and the HUI-3 with values from a single Canadian community, respectively. Because no tenable alternatives for HRQL values for the different conditions were available, in the case study we violated the prima facie rule of using values derived with consistent elicitation methods. Perhaps most important, different studies of similar endpoints reported significantly different estimates. As discussed in Box 3-7 and below, the uncertainty in both the estimation of health impacts and the estimation of preferences for the health states associated with those impacts underscores the importance of reporting key limitations and discussing their implications, as well as conducting quantitative analyses of uncertainty.

Page 121 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

A study by Oostenbrink et al. (2001) provided estimates for the course of cardiac disease following nonfatal MI. This Dutch study followed patients after infrainguinal bypass surgery to compare the effects of different drug treatments. The researchers administered the EQ-5D survey instrument to study participants and used the standard U.K. population-based TTO valuation survey results to value the health states (Dolan, 1997). For those patients in the overall study sample who later experienced an MI, their subsequent index values averaged 0.58. The case study team used this estimate for all of the post-MI health states included in our assessment.

Other studies provide widely varying results. Although the values from these other studies appeared less suitable for transfer than the ones selected by the case study team, they show how different sources yield a wide range of HRQL estimates. Brauer and Neumann (2005) report index values for chronic bronchitis that range from 0.37 to 0.75, depending on the study approach, the disease severity, and the age of the patient.

Estimates for post-MI health states also varied, in part because of the different populations studied, the different approaches to HRQL measurement used, and the different severities of illness considered. For example, one study reports an index value of 0.33 for the hospitalization period, angina studies report a range of 0.67 to 0.95, studies of congestive heart failure yield values ranging from 0.46 to 0.70, and (paradoxically) a study of angina and congestive heart failure combined yields values ranging from 0.82 to 0.85 (higher than the value for heart failure alone from other studies). Although the team considered using different estimates for cases with and without congestive heart failure or angina, we were unable to find an internally consistent set of weights that addressed all of the combinations of these conditions of interest.

Uncertainty in Health Status and Preference Measurement

Uncertainty pervades all aspects of risk assessment and economic analysis of regulatory interventions to reduce health and safety risks. In its 2002 report, Estimating the Public Health Benefits of Proposed Air Pollution Regulations, a consensus committee of the Board on Environmental Studies and Toxicology of the National Research Council called for greater attention to the sources and analysis of uncertainty in developing and promulgating regulatory interventions. In particular, the committee recommended more extensive use of probabilistic uncertainty analysis.

OMB has also long encouraged the use of probabilistic analysis. Circular A-4 mandated that agencies conduct probabilistic uncertainty analyses as part of the economic analyses of regulations with a cost or benefit estimate exceeding $1 billion annually (OMB, 2003a). OMB also requires

Page 122 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

analysis of uncertainty for rules with less substantial impacts, but probabilistic methods need not be used.

In this section we consider sources of uncertainty and its treatment in the measurement of health effectiveness only. As outlined in the report of the PCEHM, the cost-effectiveness ratio is the end of a process of estimation, synthesis, and modeling. Uncertainty in cost-effectiveness analysis can stem from estimation of the numerical values of factors that are inputs of the analysis or from the analytic model or modeling process (Manning et al., 1996). One major source of uncertainty in HALY estimates for regulatory CEA is the estimation of the health impacts of the proposed intervention—the number of cases of each fatal and nonfatal health effect averted, the severity of disease or disability incurred, and so on.

Even taking the quantified estimates of cases averted as givens, however, uncertainty remains in the characterization and measurement of HRQL effects of those conditions. At least four aspects of HRQL measurement contribute to the uncertainty of the ultimate values assigned to the estimated health-related impact of a regulation:

Variability in preferences across individuals, which contributes to uncertainty in estimating population means;
Variability in the estimation of preferences for health states depending on the elicitation technique;
Differences in the specificity and scope of attributes included by the generic HRQL instruments; and
The statistical models that assign relative health state values for each of the generic instruments.

The case study results, in particular the case study of foodborne illness in which the same groups of experts assessed the regulatory health endpoints with four generic indexes, demonstrate that the instrument used to value health effects does indeed affect the results. The estimates of QALY losses averted with the juice processing rule ranged from 1,300 for the QWB and SF-6D, to 1,500 for the EQ-5D, to 1,900 for the HUI-3, using a 3 percent discount rate. This yielded cost-effectiveness ratios ranging from $13,000 to $18,000 per QALY. (See Tables A-5 through A-7.) Whether this range of estimates is significant enough to affect the regulatory decision is unclear, because this particular rulemaking did not include quantified information about other regulatory options.

In the case study of nonroad diesel emissions, which estimated QALY losses averted using the EQ-5D, but according to different approaches (expert assignment as compared with a catalogue of index values from a population survey), the variability in estimates was even less. The results ranged from 109,000 QALYs (based on the catalogue values) to 120,000

Page 123 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

QALYs (based on expert assignment), using a 3 percent discount rate, which produced a small difference in the cost-effectiveness ratios (Tables A-16 and A-17).

The expert assignment of regulatory health endpoints using generic indexes, as described in the preceding section, also introduces additional uncertainty into the analysis. In debriefing interviews following the assignment exercise, experts raised concerns about several aspects of the task. First, characterizing a condition (the regulatory health endpoint) with a single multiattribute index response is difficult and imprecise, as the quality of life and functional impacts of chronic conditions change over time. Second, the disease descriptions were not always well distinguished from each other, or readily described by a generic index’s attributes. Last, some experts expressed skepticism about the ability of clinicians to characterize the impact of a condition on patients’ functioning and experience, despite having professional familiarity with the condition.

RESEARCH AND DEVELOPMENT OF METRICS AND VALUATION METHODOLOGIES

From the many fruitful avenues of research in the measurement and valuation of health-related quality of life, we focus on three issues with particular relevance to regulatory CEA:

correlating and estimating conversion factors among generic indexes so that values based on different instruments can be compared;
using information about ordinal rankings of health states to develop HRQL value scales with interval properties; and
applying insights and best practices from willingness-to-pay survey research to HRQL valuation.

Correlations and Conversions Among HRQL Measures

CEA results based on different HRQL instruments are not readily comparable because the various instruments include different domains and rely on different value elicitation techniques. Furthermore, no one instrument has achieved preeminence in the field. These circumstances have stimulated interest in research that correlates and develops conversions or cross-walks among the various instruments so that estimates and analyses based on different measures might be compared and combined.

Using data from more than 11,000 respondents to the 2000 MEPS, Franks and colleagues (2006) have calculated the relative decrements in HRQL for 47 risk factors and health conditions based on several preference-based measures, including the U.S.-valued EQ-5D, the SF-6D (SF-12

Page 124 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

version), and a statistically modeled HUI-3. Correlations between the estimates for these risk factors and health conditions using the three metrics were in all cases greater than 0.90.¹⁰ The authors concluded that, although the particular HRQL instruments would yield different cost-effectiveness results in absolute terms, the different measures are unlikely to produce different orderings of incremental cost-effectiveness ratios because of their consistent rank ordering. Table 3-7 presents the summary results of studies that have examined correlations between and among HRQL instruments.

A set of ongoing studies sponsored by the National Institute on Aging promises to contribute substantially to our understanding of the relationships among different HRQL instruments.¹¹ First, a nationally representative telephone survey of U.S. adults over the age of 35 is co-administered the EQ-5D, the HUI Mark 2/3, the SF-36 version 2, and QWB instruments. This survey will be another source of national norms for each index and will provide algorithms to convert values derived from one instrument to each of the others. Second, to evaluate the responsiveness of each measure to different conditions and to check the cross-walk algorithms and the effects of the mode of survey administration, a related study will survey two groups of patients periodically over 6 months, one a group of patients undergoing cataract surgery and the other patients with congestive heart failure. In its entirety, this research effort (planned to be completed in 2008), should provide much better and more comparable information about the performance of different HRQL instruments than is now available.

Using Ordinal Data for HRQL Valuation

Ranking of health states is often used as a preliminary step in preference elicitation exercises involving TTOs or SGs. Recent studies have explored using aggregated ranking data to predict health state valuations that closely match interval-level values produced by TTO methods (Salomon, 2003; Salomon and Murray, 2004). These findings, along with the consistency of the ordinal rankings of health states that different generic instruments produce (as just discussed), suggest that ordinal preferences may have broader applications in health state valuation than are currently exploited.

¹⁰	The estimated values were adjusted for sociodemographic factors that are distributed differently among persons with various risk factors and conditions.
¹¹	Information on the project is available at http://www.healthmeasurement.org/NHMS. html.

Page 125 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Best Practices in Stated Preference Surveys and Benefits Transfer

Much of the debate among experts on the relative merits of stated preference willingness-to-pay measures and QALY measures revolves around the protocols for and methodological rigor of surveys that elicit monetary “prices” or HRQL index values. The recommendations of an expert committee on contingent valuation convened by the National Oceanic and Atmospheric Administration in 1992 articulated such protocols for willingness-to-pay studies (Arrow et al., 1993).¹² The PCEHM (Gold et al., 1996b) serves a similar role in defining best practices in CEA, although this guidance is much less specific with respect to validity tests and methodologies for establishing the credibility of various preference elicitation techniques.

Researchers familiar with both willingness-to-pay and QALY measurement have called for cross-fertilization and even a synthesis of valuation practices across these fields (Johnson et al., 1997, 2000; Smith et al., 2003; Krupnick, 2004). For example, they have proposed that the choices underlying QALY valuations be interpreted using standard preference functions, making QALY results consistent with monetized health benefit measures. As another example, willingness-to-pay studies suggest that individual responses to risk and choices involving health depend on baseline conditions for other components of well-being (e.g., age, health status, and income), and that these factors should be taken into account in QALY measurement (Smith et al., 2003).

SUMMARY AND CONCLUSIONS

This chapter has reviewed a variety of health-related outcomes measures useful in CEA, focusing in particular on HALY measures and generic indexes for estimating QALYs. In particular, the Committee has formulated criteria for the selection of HRQL instruments and characterized alternative strategies for obtaining health state values for use in QALY-based CEA of regulatory interventions. In great measure, our recommendations conform to the guidelines and underlying rationales of the PCEHM, whose 1996 report constitutes the reference standard of best practices in CEA for clinical and public health interventions.

In two areas, however, our conclusions differ somewhat from those of the PCEHM, although the differences are more a matter of emphasis than of disagreement. These differences at least in part reflect the Committee’s focus on effectiveness measurement for regulatory analysis, and the analytic

¹²	See Mitchell and Carson (1989), Payne et al. (1999), Smith et al. (2002), Freeman (2003), and Krupnick (2004) for additional discussions of contingent valuation methodology.

Page 126 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

TABLE 3-7 Correlations and Cross-Walks of HRQL Measures

Source	Sampling Frame	Sample Size/Type/Year
Gold et al. (1998)	U.S. civilian community-based population 0–85+	N~720,000/representative, random/1987–1992
Rizzo et al. (1998); Rizzo and Sindelar (1999)	U.S. civilian community-based population age 18+	N = 19,525/NMES, nationally representative (weighted) randomized sample/1987
Nichol et al. (2001)	Enrollees insured by Southern CA Kaiser Permanente	N = 6,921/longitudinal study; random and geographic subsamples stratified by Rx use/1992–1995
Franks et al. (2003)	NY community health center patients age 18+	N = 240/Convenience sample, predominantly Hispanic and black/NA
Franks et al. (2004)	U.S. civilian community-based population age 18+	N~13,000 complete responses to both EQ-5D and SF-12 questions/MEPS household sample/2000
Lawrence and Fleishman (2004)	See Franks et al. (2004)	See Franks et al. (2004); sample split in half for derivation and validation
Hawthorne et al. (2001)	Australian community population and hospital inpatients and outpatients age 16+	Community: N = 396 Inpatients: N = 266 Outpatients: N = 334/NA
NOTES: ADL = activities of daily living; AQoL = Assessment of Quality of Life instrument; EVGGFP = five-item global health status measure: excellent, very good, good, fair, poor; NHIS = National Health Interview Survey; NMES = National Medical Expenditure Survey; WHOQOL-Bref = World Health Organization Quality of Life abbreviated assessment instrument

Page 127 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

Survey Instrument(s)	Condition-Specific Index Values (y/n; if so, list conditions)	Correlations with Other Indexes
NHIS: EVGGFP, ADL	130 illnesses and conditions	Yes: QWB (Beaver Dam) R2 = 0.78; HUI (NHEFS) R2 = 0.86 for conditions
Linking NMES responses to HUI-1 and EQ-5D questions	7 conditions: diabetes, atherosclerosis, cancer, myocardial infarct, heart disease, hypertension, stroke	Yes: EQ-5D and HUI-1 imputations had correlations ranging between 67% and 74%
SF-36; HUI-2; chronic disease score	No	SF-36 and HUI-2:50% of variation in HUI-2 predicted by SF-36 scores
SF-12; EQ-5D; HUI-3	No	HUI-3 and EQ-5D: 0.69; predicted HUI/HUI: 0.71; predicted EQ w/EQ: 0.77
SF-12; EQ-5D	No	Regression of EQ-5D scores onto mental and physical component summary scores of SF-12; physical component R2 = 0.67; mental component R2 = 0.47
SF-12;	EQ-5D EQ-5D values reported for: asthma, diabetes, emphysema, high blood pressure, heart attack, stroke	Mean EQ-5D scores predicted from mean physical and mental component summary scores R2 = 0.61
AQoL; SF-6D (36); WHOQOL-Bref; EQ-5D; HUI-3; Finnish 15D	No	Spearman correlations of AQoL with EQ-5D: 0.73; HUI-3:0.74; 15D: 0.80; SF-6D: 0.74

Page 128 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

traditions and goals of regulatory decision making. First, we do not place the same emphasis on the theoretical grounding of QALY measurement in utility theory as did the PCEHM. Rather, we have taken an explicitly practical and instrumental approach to the measurement of health-related effects of regulatory interventions.

Second, the Committee in principle favors direct elicitation of preferences for the health states of interest over the use of generic indexes, whenever well-designed and executed preference elicitation studies for the appropriate health endpoints and the affected populations exist or are feasible. In practice, we recognize that such original research will often not be possible to support regulatory analysis. The use of generic indexes, possibly with expert characterization of the health states of interest, and the transfer of health state values from existing research databases are the more likely, and also acceptable, approaches.

Before turning to the ethical implications of QALY-based CEA and the larger context of regulatory policy determination, we reiterate the major conclusions and insights discussed throughout this chapter.

Single-dimension measures such as deaths averted and life years gained are informative measures of effectiveness in regulatory analysis.

For practical reasons, the QALY is currently the best among the family of HALY measures to use in regulatory CEAs. The QALY is in widespread use, it is flexible in application, and the construct has the advantage of simplicity and comparatively modest informational demands.

No single elicitation technique or common generic index for QALYs is superior in all respects to the alternatives. Given the current state of the art in HRQL measurement, however, the EQ-5D has several important advantages over other generic indexes. The EQ-5D:

Has been valued using a nationally representative U.S. sample.
Uses a choice-based elicitation method (TTO).
Is simple and inexpensive to administer.
Can be used without charge (i.e., it is not a proprietary instrument).

Several strategies for obtaining health state values for regulatory CEA are available. In the absence of new studies valuing the health impacts of interest, QALY estimates based on well-developed, generally accepted, and widely used generic HRQL indexes are desirable. These values may be derived from a number of sources, including population surveys, transfer of index values from prior studies, or by using experts to characterize health endpoints with generic indexes.

Page 129 Cite

Suggested Citation:"3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis." Institute of Medicine. 2006. Valuing Health for Regulatory Cost-Effectiveness Analysis. Washington, DC: The National Academies Press. doi: 10.17226/11534.

×

The measurement of HRQL in children poses special challenges in characterizing, reporting, and valuing health states, and is particularly in need of further research and development of approaches and instruments.

Nationally representative data that support HRQL measurement are essential for QALY-based CEA for regulations. To date, efforts to incorporate HRQL measures into national health surveys have been ad hoc and unsystematic.

HRQL measures and methods can be improved with further research. In particular, establishing the relationships among and conversion factors for estimates derived from the most commonly used generic HRQL instruments would make integration and synthesis of the results from different studies possible and thus expand the tools and data available for regulatory analysis. In addition, it would improve the reliability of cost-effectiveness comparisons among different analyses and regulations.

Standards of good research practice such as those that have been developed for stated preference valuation surveys for BCA offer a model for developing best practice standards for HRQL valuation instruments, surveys, and studies.