Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 67
Valuing Health for Regulatory Cost-Effectiveness Analysis 3 Measures and Strategies for Obtaining Health Benefit Values for Regulatory Analysis As described in Chapter 2, federal agencies apply a variety of approaches to estimate and value the health-related benefits of regulatory interventions. Agencies are currently developing measures of health impacts for use in cost-effectiveness analysis (CEA) along with monetized estimates for use in benefit–cost analysis (BCA). These effectiveness measures include both single-dimension measures such as deaths or cases of illness averted and integrated measures of morbidity and mortality, that is, the health-adjusted life-year (HALY) measures that are a central focus of this report. In this chapter the Committee describes different effectiveness metrics for health-related CEA and sources for estimates of health-related quality of life (HRQL) based on these metrics. We first introduce criteria for selecting among effectiveness measures for use in regulatory analysis, and then discuss various approaches in light of these criteria. We cover much of the same ground as “Identifying and Valuing Outcomes,” Chapter 4 of the report of the U.S. Panel on Cost-Effectiveness in Health and Medicine (PCEHM) (Gold et al., 1996b). The emphasis and detail of this report, however, are tailored for an audience of regulatory analysts and decision makers. We reiterate some of the material in the PCEHM report here so that this volume will be a largely self-contained reference. In many instances the Committee follows and endorses the PCEHM’s interpretations and recommendations; in a few respects, our judgments differ, as summarized at the end of the chapter. This chapter begins with a discussion of criteria for selecting among different HALY measures and for determining which approach to applying
OCR for page 68
Valuing Health for Regulatory Cost-Effectiveness Analysis these measures is most appropriate for regulatory analysis. We then describe and evaluate each approach in more detail. The subsequent sections of this chapter first briefly review the single-dimension measures common in statistical reporting systems and epidemiological studies, including case reporting of illness or injury, preventable deaths, and life years lost. This section also considers the contribution of mortality and longevity changes, relative to changes in HRQL, to overall estimates of effectiveness. Next we examine alternative HALY metrics, discuss their construction and theoretical roots, and methods for determining the relative values of specific health states. These metrics, survey instruments, and methods for eliciting preferences or values for particular health states are evaluated in terms of their practicality, reliability, and theoretical and empirical validity. In the following section we consider sources of health state values for regulatory analysis and review four commonly used generic HRQL survey instruments. The fifth section identifies data collection and research priorities as well as promising developments for improving the measurement of health effects for regulatory analysis. Last, we briefly summarize the Committee’s findings and conclusions based on the material presented in the chapter. CRITERIA FOR SELECTING HALY METRICS FOR REGULATORY CEA As introduced in the preceding chapters, regulatory analysts face a series of choices in determining how to structure the effectiveness measure in their analyses. First, they may choose between a single-dimension or integrated measure. Although single-dimension measures, such as lives saved, life years extended, or cases of illness or injury avoided provide important information of interest to decision makers, analyses of major regulations generally include more than one health effect of concern. Thus our focus is on developing criteria for selecting among the integrated measures that are the main focus of the report. The first choice that analysts face in selecting an integrated measure is whether to rely on the most commonly used approach—the quality-adjusted life year (QALY)—or one of the other HALY approaches. HALY approaches, which rest on how length of life is combined with a value or preference for a given state of health, are discussed in detail later in this chapter. They vary primarily in the extent to which they are widely accepted, available, and used. Because the requirements for regulatory CEA are already in effect and analysts need tools that are ready for use, the Committee’s criteria for selecting among these HALY measures are largely practical ones. (The development and pursuit of a longer term research agenda are discussed separately at the end of this chapter.) At this broadest conceptual level, the relevant performance characteris-
OCR for page 69
Valuing Health for Regulatory Cost-Effectiveness Analysis tics of a HALY effectiveness measure for regulatory CEA conform to some straightforward criteria. First, the HALY metric should have a “track record,” that is, it should be in relatively widespread use and methods for estimating index values, as well as estimates themselves, should be available in the literature. Second, the metric should be easy to understand and interpret. To some extent, the comprehensibility of a metric is a function of the extent to which it has been used, and thus depends on the first criterion. Third, the metric should be relatively inexpensive to use, both in terms of the availability of methods and values for immediate application and in terms of the development and collection of new values. Of course, in addition to these practical considerations, measures must also provide valid and reliable estimates of the relative value of different health states. Assessing reliability and validity is, however, largely a function of the extent of the research base; the measures that do not meet the first criteria above are less likely to have been subject to extensive tests of validity and reliability. As discussed in more detail in the following sections of this chapter, the Committee believes that the QALY best meets these criteria. Once an analyst makes the decision to use the QALY metric, the next set of choices involves determining how to apply this measure in the context of a particular regulatory analysis. As already discussed, analysts face the choice in BCA and CEA alike of conducting new research on benefit values or transferring estimates from existing studies. In CEA analysts have a third option: they can use generic indexes. The use of these indexes can be based on existing studies or new research; i.e., the analyst may transfer estimates from an existing study that used a generic index, or may use the index to generate new valuation estimates. As illustrated by the Committee’s case studies, these indexes have the advantage of allowing the analyst to value new health states without the substantial investment of time and resources required for new primary valuation research. Each of these approaches is discussed in detail in the later sections of this chapter. Because several generic indexes are well established and easy to use, the Committee expects that they will often be applied in regulatory analysis in the near term. As already discussed, regulatory analysts lack the time or resources to engage in the development of instruments for health status valuation in the context of individual regulatory analysis. Thus we focused our criteria for implementing the QALY measure on the choice among available generic instruments. Several authorities have offered criteria for assessing the construction and performance of HRQL measures, primarily with respect to their use in
OCR for page 70
Valuing Health for Regulatory Cost-Effectiveness Analysis CEAs of health care services and in clinical outcomes studies. Box 3-1 presents standard performance criteria for preference-based HRQL survey instruments. While each of these features of an HRQL instrument may be BOX 3-1 Standard Performance Criteria for HRQL Instruments The PCEHM proposed that valuation approaches should have a theoretical foundation and be empirically derived. Economists and decision theorists tend to favor choice-based valuation such as standard gamble and time trade-off methods because they are more closely connected to utility theory. Some psychologists have also used techniques such as rating scales and magnitude estimation. An ideal measurement method would satisfy a long list of criteria. While any given list is probably incomplete, some criteria deserve particular attention. For example, the ultimate standard of validity is construct validity, the extent to which an instrument accurately measures or identifies the thing it is intended to measure. Because HRQL is an unobservable construct with alternative theoretical foundations, there is some ambiguity and tension as to how to demonstrate an instrument’s validity. Three subsidiary or partial aspects of validity that are more readily demonstrated are content validity (adequate or appropriate scope to the measure); criterion validity (the degree of correspondence of the instrument to an agreed-on measure of the construct); and predictive validity (ability to predict future behaviors and outcomes). An instrument’s valuation survey sample should be adequate in size and response rate, and the population from which the sample was drawn should be representative of the population of interest in the CEA. In the case of regulatory CEA, this would be the population affected by costs and/or benefits of the regulatory intervention. A measure should be reliable, that is, exhibit consistency in repeated measurements by the same individual over time or across different groups drawn from the same population. A measure should be widely applicable to a range of health states and conditions. It should be sensitive, that is, responsive to change, and not exhibit floor or ceiling effects in the range of anticipated effects. An HRQL instrument should be flexible and universal, as demonstrated by applications to and adaptations for cultural and language subpopulations and alternative administration formats. An HRQL measure should be well documented, transparent, and interpretable. An instrument should be feasible to administer, not burdensome for respondents, and acceptable to users and the public. This may be judged by administration format, completion times, and rates of missing responses. Preference elicitation surveys should have satisfactory completion rates; if respondents consistently decline to make choices within an elicitation exercise, the measure or method may not be appropriate or adequately informative. SOURCES: Gold et al. (1996b); Lohr et al. (1996); IOM (1998); Brazier et al. (1999b).
OCR for page 71
Valuing Health for Regulatory Cost-Effectiveness Analysis desirable, some are particularly important and take a specific form in the context of informing regulatory decision making. As discussed later in this chapter, in applying these criteria the Committee found that no one HRQL index is obviously superior to the others in all respects for all applications. Thus, to designate any single instrument as a standard for all regulatory analyses would be arbitrary. Judging the appropriateness of a given instrument for a particular regulatory application depends not only on the features of the HRQL instrument, but also the characteristics of the affected population, the intervention, and the health research that underlies the risk assessment. The Committee emphasizes the following criteria for choice of an HRQL instrument in a regulatory analysis. First, an HRQL instrument must be applicable to the range of health-related effects being evaluated. Generic HRQL instruments are designed for application to a wide range of health states that can result from a variety of health-related risks or interventions. Still, as described below, each generic instrument has distinctive features absent from the others. For example, the Quality of Well-Being Scale (QWB) includes symptoms and problems in its valuation formula, along with functional attributes; the Health Utilities Index (HUI) instruments specify sensory and cognitive functions, which make them relatively sensitive instruments for conditions with these manifestations; and the SF-6D allows the use of widely collected SF-36 and SF-12 data sets. Second, the instrument should be sensitive enough to distinguish among health endpoints. This criterion addresses the “fit” between the HRQL instrument, the health condition(s) of interest, and the risk assessment data used to estimate and characterize the health impacts. For example, a highly differentiated HRQL instrument may not be readily “mappable” onto epidemiological data about respiratory symptoms related to air quality if the later dataset is based on very general symptom-based categories. Conversely, if the regulatory health impacts of interest are very specific, such as functional limitations resulting from long-term effects of traumatic injury, and the domains of an HRQL instrument do not reflect those effects, that instrument might not be sufficiently sensitive. In the Committee’s case study of child seat restraint anchoring systems, in which head injuries were a prominent risk, some but not all indexes included a cognitive function domain. In this case study, however, the similarity of estimates of QALY effects (as assigned by experts) across different instruments does not demonstrate that the more specific attributes are critical to the sensitivity of the instrument (see Appendix A, Tables A-11 and A-12). Third, a generic instrument should reflect the values or preferences for health of the population(s) of interest. In most cases, for major regulations, those who will bear the costs and/or receive the benefits can be represented
OCR for page 72
Valuing Health for Regulatory Cost-Effectiveness Analysis by the U.S. population as a whole. Hence it is the preferences of this population that will matter most for valuation. Of the generic instruments reviewed, only the QWB and the EuroQol Group’s EQ-5D have preferences derived from the U.S. population. Whereas the U.S. EQ-5D valuation survey is recent and based on a nationally representative sample, the QWB valuation survey is about 30 years old and was conducted in a single community (San Diego, CA). The HUI-3 valuation survey was conducted with a representative sample from Hamilton, Ontario, Canada, and the SF-6D values are derived from a U.K. general population survey. Fourth, as in the case of the HALY measure, the HRQL instrument also must be acceptable to and understandable by survey respondents, policy makers, and the general public. One indication of a measure’s acceptability is the extent to which valuation survey respondents comprehend, and are willing to engage in, the preference elicitation exercise. In a broader sense, the ethical commitments and implications of the HRQL instrument and the health state values it generates must be viewed as legitimate by the ultimate users of the analytic results. Transparency, in the sense of relying on data that is publicly available (not proprietary), may also contribute to a measure’s acceptability. Finally, as in the case of the HALY measure, the HRQL instrument should be as inexpensive to use as is compatible with the other objectives. This criterion applies to considerations such as mode of administration (e.g., mail surveys are less costly than personal interviews) and also to the proprietary status of the instrument and related analytic tools. SINGLE-DIMENSION MEASURES OF HEALTH-RELATED OUTCOMES Cases of illness or injuries, deaths, hospitalizations, and days of work or school lost are commonly reported outcomes based on routine health information collection activities. These measures are familiar, easily comprehended, generally stable, and can be obtained or calculated from standard statistical sources. Tables 2-2, 2-4, and 2-5 in the previous chapter provide examples of specific single-dimension outcome measures used in regulatory analyses. The drawback of relying on these types of measures alone, without benefit of more comprehensive measures, is that they are not readily aggregated. Mortality-based indicators have long dominated population-based health status measurement. They are also prominent in risk assessments and economic analyses for health and safety regulations. Life expectancy and age-specific death rates are familiar and straightforward health outcomes measures. Early analyses counted preventable or premature deaths
OCR for page 73
Valuing Health for Regulatory Cost-Effectiveness Analysis averted.1 With the advent of CEA in health care settings, analysts turned to counting years of life saved, thus reflecting differences in remaining life expectancies. Much of the information needed to calculate integrated measures of morbidity and mortality relates to the determination of the relative values attached to different health states, yet changes in survival tend to swamp the impact of changes in HRQL in HALY calculations for health care programs. In a review of 63 studies that included 173 cost-effectiveness-ratio pairs that reported both cost per life year ($/LY) and cost per quality-adjusted life year ($/QALY), Chapman and colleagues (2004) found that quality-adjusting life years resulted in a median difference between LY and QALY ratios for the 173 ratio pairs of just $1,300. (The median ratios were $24,600/LY and $20,400/QALY.) In a separate review of 110 cancer prevention, early detection, and treatment interventions, Tengs (2004) also compared $/LY and $/QALY ratios. Consistent with the findings of Chapman et al., she reported a very high rank-order correlation between LY and QALY ratios. Both studies concluded that the difference in quality-adjusting life years would have affected decisions about cost-effectiveness in just a small proportion of cases (8 and 5 percent in the Chapman and Tengs studies, respectively, at a $50,000 decision threshold in each case). The results of these two review studies suggest that accounting for mortality impacts may be more important than adjusting for the HRQL impacts associated with diseases for which the intervention saves many lives. In these cases, calculation of life years gained may capture the majority of the impact of the intervention on health. However, this will not be the case for programs or regulations that improve health and functioning but do not significantly change life expectancy, such as one might expect with mitigation of environmental exposures to lead or mercury. In the juice processing case study (summarized in Appendix A), for example, chronic illness impacts accounted for the majority of QALY gains. HEALTH-ADJUSTED LIFE YEARS HALY measures were designed to address the limitations of single-dimension measures. HALYs capture information about both length of life and the states of health experienced during those years. The virtue of such an index of health—that it combines information about diverse health-related conditions as well as mortality—also poses challenges. A HALY is a 1 Throughout this report, we use the term “preventable” rather than “premature” deaths. These terms refer to decreases in the risk of death attributable to a regulation, in other words, expected gains in life expectancy.
OCR for page 74
Valuing Health for Regulatory Cost-Effectiveness Analysis relatively abstract concept, and some users of health statistics may find it harder to understand than more concrete and simpler health indicators, such as a change over time in the incidence of lung cancer or life expectancy in a population. Hence reporting the constituents of HALY measures and presenting cost-effectiveness ratios using specific outcomes such as preventable deaths remain important. HALYs not only meld descriptive information about health states and longevity, they also incorporate judgments about the relative value of different states of health, taking into account their impact on functioning and subjective experience. Such judgments about HRQL may be individual, aggregated and averaged for a population, or reached collectively by individuals participating in an interactive or consensus process. HALY measures are constructed in three steps. First, a description of a health state or disease condition is needed. Second, that state or condition must be given a value or weight, relative to other states and conditions. By convention, HRQL scales are anchored by values of 0 and 1, where 0 corresponds with death and 1 with the state of full, optimal, or “perfect” health. (States of health considered worse than death can be accommodated by negative values.) Third, the values for different health states or conditions must be combined with estimates of the duration in each health state over the predicted remaining life span. Figure 3-1 represents an illustrative health-adjusted life expectancy (for either an individual or a population, on average) as the shaded area on a two-dimensional graph where the vertical axis represents HRQL and the horizontal axis represents duration of life. When interpreted as an individual life, the figure suggests how one moves through different states of health, implying different levels of HRQL, over the course of a lifetime. Several approaches to estimating HALYs are discussed later in this section and many are illustrated in the Committee’s case studies. The most familiar and widely used measure is the QALY, and that is the metric given fullest consideration here. Before discussing the QALY and alternative metrics, we describe some general features of HALY measurement, using the QALY as the case in point. Describing Health States HRQL measurement relies on concepts such as “health status,” “functional status,” “well-being,” and “quality of life.” Although these terms, along with “health-related quality of life,” are often applied interchangeably, in fact they encompass narrower or broader arrays of domains, with “health status” denoting a more restrictive concept and “quality of life” a more extensive one. Table 3-1 presents concepts and domains that fall within these broader rubrics.
OCR for page 75
Valuing Health for Regulatory Cost-Effectiveness Analysis FIGURE 3-1 Health-Adjusted Life Years At a minimum, the measurement of HRQL incorporates both the description of health status (which may include observable and unobservable symptoms, functional capabilities, and health perceptions) and the importance or value that people, individually and/or collectively, attach to these aspects of health. Health states may be described (and valued) either as related to or representing specific disease conditions or in generic terms. Valuations of generically described health states, using multiattribute health state classification systems, are reviewed in some detail later in this chapter. HALY metrics such as QALYs have been constructed with tools from both psychometrics (the theory and techniques of measuring psychological phenomena such as attitudes) and utility theory (defined in Chapter 1). They are developed most often from some combination of psychological survey and decision-theoretical techniques. All generically described health states used in HRQL indexes depend on psychometric scaling and concepts to some degree. Such generic indexes thus share common features with health profiling instruments, such as the SF-36. Like other health status profiling instruments, the SF-6D was not designed to produce a preference-based index value.2 2 The SF-36 is described later in the Chapter, when its derivative preference-based index, the SF-6D, is discussed.
OCR for page 76
Valuing Health for Regulatory Cost-Effectiveness Analysis TABLE 3-1 Concepts and Domains Used in Defining Self-Reported Health Status, Quality of Life, and Health-Related Quality of Life Concepts Domains Attributes Symptoms Reports of physical and psychological symptoms or sensations not directly observable, such as energy and fatigue, nausea, and irritability Frequency, severity, bothersomeness Functional status Frequency, difficulty, severity, ability, with help Physical Functional limitations and activity restrictions, such as self-care, walking, mobility, sleep, sexual Psychological Positive or negative affect and cognitive, such as anger, alertness, self-esteem, sense of well-being, distress Social Limitations in work or school, participation in community Health perceptions Frequency, severity/intensity, satisfaction Global General ratings of health and quality of life, such as satisfaction or overall well-being Worries and concerns About health, finances, the future Spiritual Meaning and purpose of life or relationship to the universe Disadvantage/opportunity Perceptions of stigma or reports of discrimination because of health condition Frequency, impact Resiliency Reports of ability to cope or withstand stress and illness Frequency, satisfaction, ability Environmental Evaluations of personal safety, adequacy of housing, respect, freedom, and so on Satisfaction, importance SOURCE: Reprinted from Patrick and Chiang (2000, Table 1).
OCR for page 77
Valuing Health for Regulatory Cost-Effectiveness Analysis Valuing Health States and Preference Elicitation Methods The scaling of values associated with particular health states reflects the relative strength of preference for one state as compared with another. Health states must be located on an interval scale (and not simply ranked) in order to be incorporated in a HALY measure. This section reviews four methods for eliciting preferences for health states: Standard gamble (SG), Time trade-off (TTO), Category rating (CR) or visual analogue scale (VAS), and Person trade-off (PTO). These preference elicitation techniques pose different questions and emphasize different facets of the relative value of various health states. Most analysts who use these valuation techniques recommend that results from two or more approaches not be combined within a single analysis, and that their interpretation and the discussion of results consider the elicitation method (Lenert and Kaplan, 2000). Each of the four methods has particular strengths. Economists generally prefer metrics or instruments that use SG or TTO. These elicitation techniques produce relative preference weights using methods consistent with neoclassical economic utility theory, which requires choices reflecting an opportunity cost—the sacrifice of one valuable good for another. Preference or value elicitation methods grounded in utility theory correspond more closely than do psychometric approaches to the model of consumer choice. Rating scale approaches such as CR or VAS are considered the least burdensome for respondents, although some studies have reported that respondents found the task more challenging than TTO or SG. CR or VAS are understood to reflect respondents’ internal representations of health states in a comparative sense, and may be anchored or influenced by the actual health of the respondent (Krabbe et al., 1997). PTO valuation methods have been designed to introduce other-directed interests and considerations into societal resource allocation and priority-setting contexts. In contrast with other techniques, the PTO approach does not purport to represent primarily self-interested or consumer preferences for health states. PTO has not been as widely applied as the other techniques. Unless new surveys are conducted to elicit values for specific health states, the elicitation technique is part and parcel of the choice of a generic, multiattribute HRQL index. Thus, although the following discussion addresses elicitation methods in isolation from other features of valuation surveys, in practice these methods are not readily mixed and matched with
OCR for page 119
Valuing Health for Regulatory Cost-Effectiveness Analysis in the published literature, that is, the extent to which a particular study addresses the same health state as defined by the risk assessment that underlies the regulatory analysis. The applicability of a particular study’s health state values depends most importantly on similarities in the clinical descriptions of the health states, such as the severity of disease and the timing and duration of any treatment, as well as characteristics of the study population, including age, sex, and co-morbidities. The third step is to assess the appropriateness of the method used to elicit the health state values. Most importantly, analysts should consider the following features: type of population surveyed, elicitation technique, and sample size. As already noted, the PCEHM recommends using index values derived from a community valuation survey in CEAs intended to inform broad societal resource allocation decisions. Deriving health state index values from a sample that represents the population subject to the costs and benefits of the regulation to the maximum extent possible will enhance the credibility of the estimates. In Chapter 4 we consider the implications of the valuation perspective in greater depth. Within the category of elicitation techniques, values from a generic instrument (such as the EQ-5D or the HUI) or those elicited directly with TTO or SG are preferred. Less desirable are values elicited by an RS technique or values from clinicians, other experts, or author judgment. Larger sample sizes are better than smaller ones, and more recent studies are preferred to older studies, if other characteristics of the studies are comparable. This format for reviewing potentially applicable health state index values is also useful for deriving possible ranges of such values for sensitivity analysis. The Committee’s review of published studies for applicable health state values for the air quality case study revealed both the advantages and drawbacks of using index values from prior studies for regulatory analysis. On the positive side, it confirmed that the published literature can be a fruitful source of health state values for at least some regulatory health endpoints, and that using index values from the published literature is a relatively simple and inexpensive approach. On the other hand, the case study team found that the health state descriptions in published studies often did not match the description of health endpoints as described in the underlying health research used in regulatory risk assessments, and may not correspond on dimensions such as disease severity, patient age, or baseline risk factors. In addition, quality varies considerably across outcomes studies and CEAs in the literature, and published studies are not always clear about their methods. Because published studies employ different populations and elicitation methods, the individually “best” estimates for particular health endpoints
OCR for page 120
Valuing Health for Regulatory Cost-Effectiveness Analysis BOX 3-7 An Example Using Health State Index Values from Published Studies The Committee’s case study based on the EPA nonroad diesel engine rule (EPA, 2004a,b) provided an opportunity to investigate the use of published health state index values to develop estimates of the HRQL impacts of air quality improvements. The nonfatal health endpoints (disease conditions) assessed in the case study were “chronic bronchitis” and “myocardial infarction” (MI), that is, the course of cardiac disease following a nonfatal heart attack, based on the risk assessment studies used by EPA (Abbey et al., 1995; Peters et al., 2001). As commissioned by the Committee, Carmen Brauer and Peter Neumann of the Harvard Center for Risk Analysis searched the CEA Registry’s catalogue of preference weights, updated through 2001, to identify estimates related to these regulatory health endpoints. They found 127 health states and preference weights in the respiratory and cardiovascular disease categories published since 1994. The Committee case study team reviewed the original studies that appeared to be most promising as a source of health state values in the case study. Two studies were selected as the basis of the chronic bronchitis and post-MI health endpoints. For chronic bronchitis, estimates came from a Canadian study of alternative treatments for patients with acute exacerbations (Torrance et al., 1999). Over a 1-year period, the researchers asked the patients to complete assessments (including the HUI-3 questionnaire) after each acute exacerbation as well as once every 3 months. During the study period, the health state index values for these patients averaged 0.79 or 0.76 (depending on the treatment), calculated with the standard community-based valuation formula for the HUI. The mean value for both groups combined was approximately 0.78, when weighted by the number of participants in each group. This estimate was used for all cases of chronic bronchitis. within a regulatory analysis may not be derived from consistent methods. For example, the published index value estimates for cardiovascular and respiratory conditions used in the EPA case study were based on different generic instruments, the EQ-5D with U.K. population values and the HUI-3 with values from a single Canadian community, respectively. Because no tenable alternatives for HRQL values for the different conditions were available, in the case study we violated the prima facie rule of using values derived with consistent elicitation methods. Perhaps most important, different studies of similar endpoints reported significantly different estimates. As discussed in Box 3-7 and below, the uncertainty in both the estimation of health impacts and the estimation of preferences for the health states associated with those impacts underscores the importance of reporting key limitations and discussing their implications, as well as conducting quantitative analyses of uncertainty.
OCR for page 121
Valuing Health for Regulatory Cost-Effectiveness Analysis A study by Oostenbrink et al. (2001) provided estimates for the course of cardiac disease following nonfatal MI. This Dutch study followed patients after infrainguinal bypass surgery to compare the effects of different drug treatments. The researchers administered the EQ-5D survey instrument to study participants and used the standard U.K. population-based TTO valuation survey results to value the health states (Dolan, 1997). For those patients in the overall study sample who later experienced an MI, their subsequent index values averaged 0.58. The case study team used this estimate for all of the post-MI health states included in our assessment. Other studies provide widely varying results. Although the values from these other studies appeared less suitable for transfer than the ones selected by the case study team, they show how different sources yield a wide range of HRQL estimates. Brauer and Neumann (2005) report index values for chronic bronchitis that range from 0.37 to 0.75, depending on the study approach, the disease severity, and the age of the patient. Estimates for post-MI health states also varied, in part because of the different populations studied, the different approaches to HRQL measurement used, and the different severities of illness considered. For example, one study reports an index value of 0.33 for the hospitalization period, angina studies report a range of 0.67 to 0.95, studies of congestive heart failure yield values ranging from 0.46 to 0.70, and (paradoxically) a study of angina and congestive heart failure combined yields values ranging from 0.82 to 0.85 (higher than the value for heart failure alone from other studies). Although the team considered using different estimates for cases with and without congestive heart failure or angina, we were unable to find an internally consistent set of weights that addressed all of the combinations of these conditions of interest. Uncertainty in Health Status and Preference Measurement Uncertainty pervades all aspects of risk assessment and economic analysis of regulatory interventions to reduce health and safety risks. In its 2002 report, Estimating the Public Health Benefits of Proposed Air Pollution Regulations, a consensus committee of the Board on Environmental Studies and Toxicology of the National Research Council called for greater attention to the sources and analysis of uncertainty in developing and promulgating regulatory interventions. In particular, the committee recommended more extensive use of probabilistic uncertainty analysis. OMB has also long encouraged the use of probabilistic analysis. Circular A-4 mandated that agencies conduct probabilistic uncertainty analyses as part of the economic analyses of regulations with a cost or benefit estimate exceeding $1 billion annually (OMB, 2003a). OMB also requires
OCR for page 122
Valuing Health for Regulatory Cost-Effectiveness Analysis analysis of uncertainty for rules with less substantial impacts, but probabilistic methods need not be used. In this section we consider sources of uncertainty and its treatment in the measurement of health effectiveness only. As outlined in the report of the PCEHM, the cost-effectiveness ratio is the end of a process of estimation, synthesis, and modeling. Uncertainty in cost-effectiveness analysis can stem from estimation of the numerical values of factors that are inputs of the analysis or from the analytic model or modeling process (Manning et al., 1996). One major source of uncertainty in HALY estimates for regulatory CEA is the estimation of the health impacts of the proposed intervention—the number of cases of each fatal and nonfatal health effect averted, the severity of disease or disability incurred, and so on. Even taking the quantified estimates of cases averted as givens, however, uncertainty remains in the characterization and measurement of HRQL effects of those conditions. At least four aspects of HRQL measurement contribute to the uncertainty of the ultimate values assigned to the estimated health-related impact of a regulation: Variability in preferences across individuals, which contributes to uncertainty in estimating population means; Variability in the estimation of preferences for health states depending on the elicitation technique; Differences in the specificity and scope of attributes included by the generic HRQL instruments; and The statistical models that assign relative health state values for each of the generic instruments. The case study results, in particular the case study of foodborne illness in which the same groups of experts assessed the regulatory health endpoints with four generic indexes, demonstrate that the instrument used to value health effects does indeed affect the results. The estimates of QALY losses averted with the juice processing rule ranged from 1,300 for the QWB and SF-6D, to 1,500 for the EQ-5D, to 1,900 for the HUI-3, using a 3 percent discount rate. This yielded cost-effectiveness ratios ranging from $13,000 to $18,000 per QALY. (See Tables A-5 through A-7.) Whether this range of estimates is significant enough to affect the regulatory decision is unclear, because this particular rulemaking did not include quantified information about other regulatory options. In the case study of nonroad diesel emissions, which estimated QALY losses averted using the EQ-5D, but according to different approaches (expert assignment as compared with a catalogue of index values from a population survey), the variability in estimates was even less. The results ranged from 109,000 QALYs (based on the catalogue values) to 120,000
OCR for page 123
Valuing Health for Regulatory Cost-Effectiveness Analysis QALYs (based on expert assignment), using a 3 percent discount rate, which produced a small difference in the cost-effectiveness ratios (Tables A-16 and A-17). The expert assignment of regulatory health endpoints using generic indexes, as described in the preceding section, also introduces additional uncertainty into the analysis. In debriefing interviews following the assignment exercise, experts raised concerns about several aspects of the task. First, characterizing a condition (the regulatory health endpoint) with a single multiattribute index response is difficult and imprecise, as the quality of life and functional impacts of chronic conditions change over time. Second, the disease descriptions were not always well distinguished from each other, or readily described by a generic index’s attributes. Last, some experts expressed skepticism about the ability of clinicians to characterize the impact of a condition on patients’ functioning and experience, despite having professional familiarity with the condition. RESEARCH AND DEVELOPMENT OF METRICS AND VALUATION METHODOLOGIES From the many fruitful avenues of research in the measurement and valuation of health-related quality of life, we focus on three issues with particular relevance to regulatory CEA: correlating and estimating conversion factors among generic indexes so that values based on different instruments can be compared; using information about ordinal rankings of health states to develop HRQL value scales with interval properties; and applying insights and best practices from willingness-to-pay survey research to HRQL valuation. Correlations and Conversions Among HRQL Measures CEA results based on different HRQL instruments are not readily comparable because the various instruments include different domains and rely on different value elicitation techniques. Furthermore, no one instrument has achieved preeminence in the field. These circumstances have stimulated interest in research that correlates and develops conversions or cross-walks among the various instruments so that estimates and analyses based on different measures might be compared and combined. Using data from more than 11,000 respondents to the 2000 MEPS, Franks and colleagues (2006) have calculated the relative decrements in HRQL for 47 risk factors and health conditions based on several preference-based measures, including the U.S.-valued EQ-5D, the SF-6D (SF-12
OCR for page 124
Valuing Health for Regulatory Cost-Effectiveness Analysis version), and a statistically modeled HUI-3. Correlations between the estimates for these risk factors and health conditions using the three metrics were in all cases greater than 0.90.10 The authors concluded that, although the particular HRQL instruments would yield different cost-effectiveness results in absolute terms, the different measures are unlikely to produce different orderings of incremental cost-effectiveness ratios because of their consistent rank ordering. Table 3-7 presents the summary results of studies that have examined correlations between and among HRQL instruments. A set of ongoing studies sponsored by the National Institute on Aging promises to contribute substantially to our understanding of the relationships among different HRQL instruments.11 First, a nationally representative telephone survey of U.S. adults over the age of 35 is co-administered the EQ-5D, the HUI Mark 2/3, the SF-36 version 2, and QWB instruments. This survey will be another source of national norms for each index and will provide algorithms to convert values derived from one instrument to each of the others. Second, to evaluate the responsiveness of each measure to different conditions and to check the cross-walk algorithms and the effects of the mode of survey administration, a related study will survey two groups of patients periodically over 6 months, one a group of patients undergoing cataract surgery and the other patients with congestive heart failure. In its entirety, this research effort (planned to be completed in 2008), should provide much better and more comparable information about the performance of different HRQL instruments than is now available. Using Ordinal Data for HRQL Valuation Ranking of health states is often used as a preliminary step in preference elicitation exercises involving TTOs or SGs. Recent studies have explored using aggregated ranking data to predict health state valuations that closely match interval-level values produced by TTO methods (Salomon, 2003; Salomon and Murray, 2004). These findings, along with the consistency of the ordinal rankings of health states that different generic instruments produce (as just discussed), suggest that ordinal preferences may have broader applications in health state valuation than are currently exploited. 10 The estimated values were adjusted for sociodemographic factors that are distributed differently among persons with various risk factors and conditions. 11 Information on the project is available at http://www.healthmeasurement.org/NHMS. html.
OCR for page 125
Valuing Health for Regulatory Cost-Effectiveness Analysis Best Practices in Stated Preference Surveys and Benefits Transfer Much of the debate among experts on the relative merits of stated preference willingness-to-pay measures and QALY measures revolves around the protocols for and methodological rigor of surveys that elicit monetary “prices” or HRQL index values. The recommendations of an expert committee on contingent valuation convened by the National Oceanic and Atmospheric Administration in 1992 articulated such protocols for willingness-to-pay studies (Arrow et al., 1993).12 The PCEHM (Gold et al., 1996b) serves a similar role in defining best practices in CEA, although this guidance is much less specific with respect to validity tests and methodologies for establishing the credibility of various preference elicitation techniques. Researchers familiar with both willingness-to-pay and QALY measurement have called for cross-fertilization and even a synthesis of valuation practices across these fields (Johnson et al., 1997, 2000; Smith et al., 2003; Krupnick, 2004). For example, they have proposed that the choices underlying QALY valuations be interpreted using standard preference functions, making QALY results consistent with monetized health benefit measures. As another example, willingness-to-pay studies suggest that individual responses to risk and choices involving health depend on baseline conditions for other components of well-being (e.g., age, health status, and income), and that these factors should be taken into account in QALY measurement (Smith et al., 2003). SUMMARY AND CONCLUSIONS This chapter has reviewed a variety of health-related outcomes measures useful in CEA, focusing in particular on HALY measures and generic indexes for estimating QALYs. In particular, the Committee has formulated criteria for the selection of HRQL instruments and characterized alternative strategies for obtaining health state values for use in QALY-based CEA of regulatory interventions. In great measure, our recommendations conform to the guidelines and underlying rationales of the PCEHM, whose 1996 report constitutes the reference standard of best practices in CEA for clinical and public health interventions. In two areas, however, our conclusions differ somewhat from those of the PCEHM, although the differences are more a matter of emphasis than of disagreement. These differences at least in part reflect the Committee’s focus on effectiveness measurement for regulatory analysis, and the analytic 12 See Mitchell and Carson (1989), Payne et al. (1999), Smith et al. (2002), Freeman (2003), and Krupnick (2004) for additional discussions of contingent valuation methodology.
OCR for page 126
Valuing Health for Regulatory Cost-Effectiveness Analysis TABLE 3-7 Correlations and Cross-Walks of HRQL Measures Source Sampling Frame Sample Size/Type/Year Gold et al. (1998) U.S. civilian community-based population 0–85+ N~720,000/representative, random/1987–1992 Rizzo et al. (1998); Rizzo and Sindelar (1999) U.S. civilian community-based population age 18+ N = 19,525/NMES, nationally representative (weighted) randomized sample/1987 Nichol et al. (2001) Enrollees insured by Southern CA Kaiser Permanente N = 6,921/longitudinal study; random and geographic subsamples stratified by Rx use/1992–1995 Franks et al. (2003) NY community health center patients age 18+ N = 240/Convenience sample, predominantly Hispanic and black/NA Franks et al. (2004) U.S. civilian community-based population age 18+ N~13,000 complete responses to both EQ-5D and SF-12 questions/MEPS household sample/2000 Lawrence and Fleishman (2004) See Franks et al. (2004) See Franks et al. (2004); sample split in half for derivation and validation Hawthorne et al. (2001) Australian community population and hospital inpatients and outpatients age 16+ Community: N = 396 Inpatients: N = 266 Outpatients: N = 334/NA NOTES: ADL = activities of daily living; AQoL = Assessment of Quality of Life instrument; EVGGFP = five-item global health status measure: excellent, very good, good, fair, poor; NHIS = National Health Interview Survey; NMES = National Medical Expenditure Survey; WHOQOL-Bref = World Health Organization Quality of Life abbreviated assessment instrument
OCR for page 127
Valuing Health for Regulatory Cost-Effectiveness Analysis Survey Instrument(s) Condition-Specific Index Values (y/n; if so, list conditions) Correlations with Other Indexes NHIS: EVGGFP, ADL 130 illnesses and conditions Yes: QWB (Beaver Dam) R2 = 0.78; HUI (NHEFS) R2 = 0.86 for conditions Linking NMES responses to HUI-1 and EQ-5D questions 7 conditions: diabetes, atherosclerosis, cancer, myocardial infarct, heart disease, hypertension, stroke Yes: EQ-5D and HUI-1 imputations had correlations ranging between 67% and 74% SF-36; HUI-2; chronic disease score No SF-36 and HUI-2:50% of variation in HUI-2 predicted by SF-36 scores SF-12; EQ-5D; HUI-3 No HUI-3 and EQ-5D: 0.69; predicted HUI/HUI: 0.71; predicted EQ w/EQ: 0.77 SF-12; EQ-5D No Regression of EQ-5D scores onto mental and physical component summary scores of SF-12; physical component R2 = 0.67; mental component R2 = 0.47 SF-12; EQ-5D EQ-5D values reported for: asthma, diabetes, emphysema, high blood pressure, heart attack, stroke Mean EQ-5D scores predicted from mean physical and mental component summary scores R2 = 0.61 AQoL; SF-6D (36); WHOQOL-Bref; EQ-5D; HUI-3; Finnish 15D No Spearman correlations of AQoL with EQ-5D: 0.73; HUI-3:0.74; 15D: 0.80; SF-6D: 0.74
OCR for page 128
Valuing Health for Regulatory Cost-Effectiveness Analysis traditions and goals of regulatory decision making. First, we do not place the same emphasis on the theoretical grounding of QALY measurement in utility theory as did the PCEHM. Rather, we have taken an explicitly practical and instrumental approach to the measurement of health-related effects of regulatory interventions. Second, the Committee in principle favors direct elicitation of preferences for the health states of interest over the use of generic indexes, whenever well-designed and executed preference elicitation studies for the appropriate health endpoints and the affected populations exist or are feasible. In practice, we recognize that such original research will often not be possible to support regulatory analysis. The use of generic indexes, possibly with expert characterization of the health states of interest, and the transfer of health state values from existing research databases are the more likely, and also acceptable, approaches. Before turning to the ethical implications of QALY-based CEA and the larger context of regulatory policy determination, we reiterate the major conclusions and insights discussed throughout this chapter. Single-dimension measures such as deaths averted and life years gained are informative measures of effectiveness in regulatory analysis. For practical reasons, the QALY is currently the best among the family of HALY measures to use in regulatory CEAs. The QALY is in widespread use, it is flexible in application, and the construct has the advantage of simplicity and comparatively modest informational demands. No single elicitation technique or common generic index for QALYs is superior in all respects to the alternatives. Given the current state of the art in HRQL measurement, however, the EQ-5D has several important advantages over other generic indexes. The EQ-5D: Has been valued using a nationally representative U.S. sample. Uses a choice-based elicitation method (TTO). Is simple and inexpensive to administer. Can be used without charge (i.e., it is not a proprietary instrument). Several strategies for obtaining health state values for regulatory CEA are available. In the absence of new studies valuing the health impacts of interest, QALY estimates based on well-developed, generally accepted, and widely used generic HRQL indexes are desirable. These values may be derived from a number of sources, including population surveys, transfer of index values from prior studies, or by using experts to characterize health endpoints with generic indexes.
OCR for page 129
Valuing Health for Regulatory Cost-Effectiveness Analysis The measurement of HRQL in children poses special challenges in characterizing, reporting, and valuing health states, and is particularly in need of further research and development of approaches and instruments. Nationally representative data that support HRQL measurement are essential for QALY-based CEA for regulations. To date, efforts to incorporate HRQL measures into national health surveys have been ad hoc and unsystematic. HRQL measures and methods can be improved with further research. In particular, establishing the relationships among and conversion factors for estimates derived from the most commonly used generic HRQL instruments would make integration and synthesis of the results from different studies possible and thus expand the tools and data available for regulatory analysis. In addition, it would improve the reliability of cost-effectiveness comparisons among different analyses and regulations. Standards of good research practice such as those that have been developed for stated preference valuation surveys for BCA offer a model for developing best practice standards for HRQL valuation instruments, surveys, and studies.
Representative terms from entire chapter: