health. Measures such as the SF-36 are widely used in observational and experimental studies to characterize changes over time in the health of groups or individuals.

A glossary provides some additional definitions for several general terms and specific measures used in this report. These definitions are not, however, universally accepted, and both explicit disagreement and unrecognized differences in word use characterize the field. Work to clarify terminology is one element covered by the committee’s recommendation about standard setting.

CONCLUSIONS

The committee’s conclusions and recommendations are intended to provide general rather than detailed perspectives and directions for the future development and application of summary measures of population health. The IOM originally proposed a more intensive, three-year investigation of technical and policy issues and may still seek to undertake this work and prepare an in-depth report on some of the issues raised here, for example, common terminology and definitions.

Mortality measures, although important, provide incomplete and insensitive in formation for decisionmaking

Many major decisions about individual and public health are today informed primarily by summary data on population mortality. The family of mortality-based summary measures includes measures that aggregate data over causes, for example, age-adjusted death rates and age-specific life expectancy. Measures may also be broken down to describe the mortality experience of different population subgroups (e.g., men and women or different ethnic or racial groups) or the number of deaths or years of life lost due to different causes (e.g., heart disease or suicide)— given suitable data for attributing cause of death.

Both ordinary people and policymakers are deeply interested in extending life. At the same time, they recognize other important health goals including preventing disability, improving physical and mental functioning, and relieving pain and the distress caused by other physical and emotional symptoms. Failure to include morbidity data in summary measures of health status distorts the resulting profiles of disease burden and the information available for assessing needs for preventive, curative, palliative, and rehabilitative services.

The limits of mortality data are perhaps most evident for diseases and injuries that impose serious, continuing burdens of disability and suffering on people they do not kill. Some of these diseases, for example, arthritis, are not major sources of mortality, although others, such as heart disease, are significant contributors to both mortality and morbidity.

To illustrate, Figure 2a and Figure 2b depict the hypothetical life paths of two people, one who dies from a condition that kills suddenly and the other who dies at the same age from a condition that causes progressive disability and distress for many years before death. A measure based only on mortality would not distinguish between the health burdens created by the two conditions because life duration is identical. Such a measure would also fail to differentiate between a health intervention for the second condition that extended life and one that both extended life and reduced the burden of disability before death.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 5
health. Measures such as the SF-36 are widely used in observational and experimental studies to characterize changes over time in the health of groups or individuals. A glossary provides some additional definitions for several general terms and specific measures used in this report. These definitions are not, however, universally accepted, and both explicit disagreement and unrecognized differences in word use characterize the field. Work to clarify terminology is one element covered by the committee’s recommendation about standard setting. CONCLUSIONS The committee’s conclusions and recommendations are intended to provide general rather than detailed perspectives and directions for the future development and application of summary measures of population health. The IOM originally proposed a more intensive, three-year investigation of technical and policy issues and may still seek to undertake this work and prepare an in-depth report on some of the issues raised here, for example, common terminology and definitions. Mortality measures, although important, provide incomplete and insensitive in formation for decisionmaking Many major decisions about individual and public health are today informed primarily by summary data on population mortality. The family of mortality-based summary measures includes measures that aggregate data over causes, for example, age-adjusted death rates and age-specific life expectancy. Measures may also be broken down to describe the mortality experience of different population subgroups (e.g., men and women or different ethnic or racial groups) or the number of deaths or years of life lost due to different causes (e.g., heart disease or suicide)— given suitable data for attributing cause of death. Both ordinary people and policymakers are deeply interested in extending life. At the same time, they recognize other important health goals including preventing disability, improving physical and mental functioning, and relieving pain and the distress caused by other physical and emotional symptoms. Failure to include morbidity data in summary measures of health status distorts the resulting profiles of disease burden and the information available for assessing needs for preventive, curative, palliative, and rehabilitative services. The limits of mortality data are perhaps most evident for diseases and injuries that impose serious, continuing burdens of disability and suffering on people they do not kill. Some of these diseases, for example, arthritis, are not major sources of mortality, although others, such as heart disease, are significant contributors to both mortality and morbidity. To illustrate, Figure 2a and Figure 2b depict the hypothetical life paths of two people, one who dies from a condition that kills suddenly and the other who dies at the same age from a condition that causes progressive disability and distress for many years before death. A measure based only on mortality would not distinguish between the health burdens created by the two conditions because life duration is identical. Such a measure would also fail to differentiate between a health intervention for the second condition that extended life and one that both extended life and reduced the burden of disability before death.

OCR for page 5
FIGURE 2 Hypothetical life paths of (A) one person who dies suddenly after living in excellent health and (B) another who dies at the same age after living with progressive disability for many years. SOURCE: Adapted from Panel on Cost-Effectiveness in Health and Medicine (1996).

OCR for page 5
That different measures produce different pictures of health status is supported by a recent analysis of the global burden of disease that separately rank-ordered causes of death and causes of ill-health using 107 disease and injury categories and data from a number of sources (Murray and Lopez, 1996). The analysis showed considerable disparity, including 14 conditions that were ranked in the top half for burden of ill-health but in the bottom half for deaths. Summary measures of population health that integrate mortality and morbidity information are increasingly relevant to both public health and medical deci sionmakers. As policymakers and analysts have understood the limitations of mortality data alone as a basis for decisionmaking, they have become interested in other ways of viewing overall population health. In response, many individuals and organizations have worked to develop new measures that reflect both life duration and morbidity or health-related life quality (see, e.g., Moriyama, 1968; Fansel and Bush, 1970; Sullivan, 1971, and more generally, the background paper by Fryback). The following sections of this report provide a very basic overview of how such integrative summary measures have been developed, examples of their uses, and simple categorizations of these uses. Developing Summary Measures of Population Health As noted earlier, summary measures of population health are constructed by first attaching a single number—where 0 represents death and 1 represents optimal health—to a complex of social and personal attributes that represent health status or health-related quality of life. (Negative numbers representing states regarded as worse than death are not excluded from this general conceptualization.) This number is then linked to life expectancy to form a single measure of population health that integrates morbidity and mortality information. Health-adjusted life expectancy is arrived at by summing the products of expected years of life at each age multiplied by a numerical weight representing average health status at that age. The individual products (life expectancy at a particular age multiplied by average health status at the age) are health-adjusted life years (HALYs). Although measures of all types seem to become known by their acronyms or abbreviations, this report generally avoids acronyms and abbreviations. Most of the discussion of summary measures of population health focuses on one category of measure, the quality-adjusted life year (QALY), and its usual building blocks, measures of health-related quality of life. Even more particularly, the focus is on QALYs built from measures of health-related quality of life that are based on utilities or preferences for health states that meet or approximate conditions of welfare economics. A more detailed discussion of the theoretical basis of preference-based measures of health status is well beyond the scope of this short report (see Panel on Cost-Effectiveness in Health and Medicine, hereafter PCEHM, 1996, for an introduction to the basic concepts). The attributes of health (sometimes called domains or dimensions) that may be captured in a measure of health-related quality of life include physical function, mental and emotional well-being, social and role function, general health perceptions, symptoms, and vitality (see, e.g.,

OCR for page 5
Lohr, 1989; Stewart et al, 1989; Patrick and Erickson, 1993; Williams, 1995; Kindig, 1997). Table 1 illustrates how several instruments for measuring health-related quality of life (as described in the Glossary ) vary in the health dimensions or domains they include. To the extent that the labels for subscales do not reflect their substance, however, the table may not adequately depict differences among instruments. For example, newer versions of the Quality of Well-Being scale include questions about sensation and sensory organs but do not group and label these questions as a subscale (Robert Kaplan, Ph.D., University of California San Diego, personal communication, March 23, 1998). The health domains selected for different measures reflect both different conceptualizations of health and different purposes for which measures were developed (e.g., clinical research or resource planning). (Fryback notes that the actual health states that can, in principle, be distinguished by different measures ranges from less than a dozen to more than 1,000.) As measures of health-related quality of life have become widely used, these differences complicate comparisons of conditions or populations. The choice of what aspects of health to measure or ignore (and how to define these aspects operationally) also raises ethical questions, implying perhaps that domains not included are unimportant. TABLE 1 Principal Concepts and Domains of Health-Related Quality of Life Contained in General Preference-Weighted Instruments for Assessing Quality-Adjusted Life Years   Instrument EQ-5D Health Utility Index Quality of Well-Being Scale Health perceptions HUI:1 HUI:2 HUI:3 Social function Social relations X X   X Usual social role X X   Intimacy or sexual function Communication or speech     X X   Psychological function Cognitive function     X X   Emotional function X X X X   Mood or feelings Physical function Mobility X X X X X Physical activity   X   X X Self-care   X X     Impairment Sensory function or loss   X X     Symptoms or impairments X X X X X   SOURCE: Adapted from Patrick and Erickson (1993) and PCEHM (1996). The measures of health-related quality of life listed in Table 1 —the Quality of Well-Being (QWB) scale, the three versions of the Health Utilities Index (HUI), and the EQ-5D—are preference-weighted health status measures because they use some procedure for rating the relative desirability or undesirability of living with or without various functional limitations (e.g.,

OCR for page 5
an inability to dress oneself), symptoms (e.g., pain or urinary frequency), or other health states. Again, the approaches differ. The weighting or rating approaches used have included expert panels, national or community surveys that ask people to rate health states or use utility derivation instruments (e.g., standard gamble or time trade-off methods), and deliberative discussion or other means to determine “reasoned” preferences. Choices among options typically reflect a mix of conceptual, technical, and practical (e.g., cost and convenience) considerations. In addition to critiques of specific methods, the variability in weighting techniques has—like the variability in health domains tapped—been criticized as a barrier to standardizing cost-effectiveness and other analyses across illnesses and conditions (PCEHM, 1996). In addition, as discussed in later sections of this report, much of the controversy about summary measures involves the ethical implications of different approaches to rating or valuing health states. Examples of Applications or Tests of Summary Measures Within the United States, the Center for Chronic Disease Prevention and Health Promotion at the Centers for Disease Control and Prevention (CDC) has been testing the DALY methodology for measuring the burden of disease to assess prevention options and priorities for the nation. The nation’s disease prevention and health promotion strategy, Healthy People 2000, has been monitored by the National Center For Health Statistics (NCHS) at the CDC. The NCHS has used the years of healthy life (YHL) measure to chart progress in increasing the span of healthy life for Americans and to reduce disparities in health status among subgroups of the population. NCHS is currently developing its approach for Healthy People 2010 (Sondik, 1997). The Health Care Financing Administration (HCFA) is using the SF-36, a measure described earlier that profiles health status, in the Health of Senior’s Survey, which will track longitudinally the health of enrollees in managed care settings (Fried, 1997). Although cost-effectiveness analyses often still rely on mortality measures, such analyses are increasingly using QALYs or other summary measures that cover morbidity as well as mortality. For example, the U.S. Agency for Health Care Policy and Research sponsors or directs numerous cost-effectiveness analyses that incorporate such measures of health into their estimates of effectiveness. Thus, instead of estimating the cost of an additional year of life achieved by different strategies (e.g., screening for cervical cancer every year versus screening every three years), the analyses estimate the cost of an additional quality- or disability-adjusted life year. Composite measures of population health are also being used in other countries to supplement traditional mortality measures and to illuminate the impact of disability. A Canadian task force, for example, has recently recommended the creation of the health equivalent of the Gross Domestic Product to act as an overall indicator of trends in the nation’s health (Wolfson, 1997). In the Netherlands, analysts have devised variants on the traditional survival curve (a line) that not only track the proportion of a population dying at each age but also depict levels of slight or severe disability as shaded areas within the curve (Gunnig-Schepers, 1997) (see Figure 3 ). Such graphs, similar versions of which have been used in Canada, help make evident the extent to which slight versus severe disability increases with age (in general or for a particular condition) for a population or population subgroup.

OCR for page 5
FIGURE 3 Survial curve and “health curves” by level of ill health, Dutch men, 1994. SOURCE: Dutch Public Health Status and Forecast Report, 1997. In the international arena, comparisons of health status across countries are useful to focus attention on countries with critical needs for international assistance, to set priorities for investments in combating particular conditions and diseases, and to gain an understanding of differentials in the burden of disease among nations. The World Health Organization and the World Bank are among the organizations sponsoring such comparisons. A recent closed meeting of member countries of the Organization for Economic Cooperation and Development identified their requirements for health outcomes measures to monitor trends in population health, to strengthen the science base for evaluating the effectiveness of health interventions (e.g., screening for various illnesses), and to assist in guiding resource allocation decisions among competing health needs within member countries (Poullier, 1997). Descriptive and Evaluative Uses of Summary Measures Much of the discussion during the December 1997 workshop and the IOM committee deliberations focused on alternative uses of summary measures and the implications of these alternatives for the approaches used to construct various measures. Although the committee and workshop participants recognized that the categories are not independent, they found it helpful to distinguish between descriptive and decisionmaking uses of summary measures of population health. Political and cultural values and biases, of course, play a major role in determining who benefits from public policies, but descriptive information is an important resource for those seeking to assess and modify established spending patterns. The reliable, valid description of

OCR for page 5
health status is an important achievement in and of itself. For example, such information resources as the CDC’s Morbidity and Mortality Weekly Reports represent more than a century’s worth of effort to develop accurate state and local data on death, disease, and injury. Efforts to devise summary measures that provide an overall view of individual or population health are more recent but still date back several decades. Such measures include both the preference-based measures described above and other ways of profiling health such as the SF-36. Descriptive data on health status are often an important first step in identifying health problems as part of a process for improving community health, deploying resources, and overseeing the performance of those responsible in some way for individual or population health (IOM, 1997). For these purposes, comparisons are needed on both cross-sectional (single point in time) and longitudinal bases. Then, within populations, summary comparisons broken down—when data permit—by age, sex, ethnic, income, and other categories and by disease condition contribute to further analyses that seek to identify and understand differences in health status among different segments of the population. A particularly important type of descriptive use of summary measures involves comparisons that identify inequalities, in health and well-being suggest testable hypotheses about the sources of these inequalities and lead to strategies for remedying them. These strategies may involve public health programs, personal health services, interventions beyond the health care system (e.g., income subsidies or antipollution programs), or some combination of these. To assess the relative impact of such strategies, it is desirable that data collection and measurement approaches be comparable across public health and personal health care systems and, if possible, beyond health systems. Comparable in this context means that the summary measures are constructed in the same nominal units (e.g., QALYs) and derived from data collected using a consistent set of conceptual and operational definitions. The existence of summary measures of population health and the availability of techniques such as cost-effectiveness analysis not, of course, guarantee that they will be used—or used appropriately—in policymaking. For example, whether or not the Health Care Financing Administration should formally include cost-effectiveness as a criterion for making Medicare coverage decisions has generated controversy for nearly two decades. Similarly, when Oregon sought to set priorities for coverage of different health services, an initial ranking of health services developed using cost-effectiveness analysis was substantially modified by policymakers to better reflect “public values” and to rely on assessments of net benefit rather than cost-effectiveness in developing rankings (Eddy, 1991, Hadorn, 1991; Klevit et al., 1991; Patrick and Erickson, 1993). In contrast, hospital or health plan managers attempting to control costs may more readily consider cost-effectiveness in making less visible decisions (e.g., the choice of which drug to use first in managing an infection or treating depression) (Luce and Brown, 1995). Such controversies about the use of summary measures need to be, first, understood (e.g., problems with the technical calculation or presentation of the measures should be distinguished from discomfort with the ethical implications of the measures) and, second, evaluated for possible methodological, educational, or other responses. The committee assumes neither that all barriers to use are surmountable nor that perfect measures are achievable. If the value judgments underlying a measure conflict with those of policymakers and the society that they serve, then it is reasonable that policymakers would look to other bases for making decisions. It is, however, important that these other, often informal bases be subject to similar critical analysis. The focus of the committee’s work was on identifying directions that could strengthen the credibility and

OCR for page 5
utility of summary measures of population health rather than specifying directions for developing alternatives. Given both the technical and the ethical issues that need further exploration, there was considerable but not complete agreement among workshop participants that summary measures of population health were, at this time, best suited for use in descriptive comparisons of populations. There was not clear agreement on how large a role summary measures—improvements notwithstanding—should ultimately play in resource allocation decisions or comparisons of the performance of health care organizations. The committee believes that the kinds of developmental and analytic work recommended in this report will aid future discussions on the appropriate use of summary measures to guide resource decisions and performance comparisons. The similarities and differences among summary measures of population health need further examination as part of a strategy for assessing how well particular measures and measurement strategies may serve different local, na tional, and international purposes. Recommendations about roles for summary measures of population health depend, in part, on clearer understanding of (1) the uses of such measures, (2) the ways different partial and summary measures behave in depicting health, and (3) the extent to which users of the measures appreciate the characteristics or limitations of the measures and identify what additional information they may need. However, despite controversy over various issues, the ways in which different measurement strategies may shape the information and analyses provided to decisionmakers in different context has not been systematically described, evaluated, or compared to other decisionmaking tools and approaches. Some differences are clearly more consequential than others. For example, the differing strategies employed by measures of health status described earlier mean that some (those incorporating preference weights) can be combined with a measure of life expectancy to create a single integrative measure of population health. Topics that require attention include what aspects or domains of health to measure, to what extent the component domains of a summary measure can be used in conjunction with the overall measure to provide a fuller understanding of population health, and whether and how to assign weights or values to different health states and life durations. As discussed further below, these topics have important ethical as well as technical dimensions that deserve fuller exploration and consideration. Another issue for examination is whether to treat the relationship between the valuation of health states and the duration of the health state as dependent or independent—that is, whether the time spent in a particular state might affect the perception or rating of this state. Assuming independence is methodologically convenient (Fryback, 1997) but may not be consistent with evidence suggesting that people with certain disabilities adjust and become more positive about their health state over time (Sackett and Torrance, 1978). Further directions for investigation are described in following sections of this report. The next conclusion focuses on those that would contribute to a process of building consensus on standard measures and appropriate uses for such measures.

OCR for page 5
Although methodological innovation in population metrics has strengthened the analytical base for health decisions, the lack of accepted standard measures often creates confusion and caution among potential users. Over the last 30-plus years, researchers have created an array of population metrics that have attracted the attention and, in some cases, the acceptance of policymakers who see these measures as a source of critical information for decisionmaking. At the same time, however, this innovation and diversity create some problems. For example, when the use of different measures produces noncomparable or even conflicting findings, the result may be confusion, distrust of quantitative analysis, and missed opportunities to building cumulative knowledge for decisionmaking. Wariness among users or potential users of summary measures may be a particular peril when decisionmakers realize that they have only a limited grasp of the technical differences among measures and their ethical and policy implications. The committee does not expect that any single measure will necessarily (1) fit all purposes for summary measures of population health, (2) be without technical or other limitations, and (3) be appropriate as the sole measure for use in any decision. Nonetheless, it would be useful to consider steps that would increase consensus—both nationally and internationally—on which measures to use for which purposes. Some of the components of a consensus-building process include identifying parties to be involved such as users and consumers of health data and services as well as developers of tools and policymakers; clarifying and attempting to get agreement on standard terminology (i.e., nomenclature and definitions); creating mechanisms for process management, consultation, and clarification and for the resolution of disagreements at different stages in the process; defining target applications of measures; establishing important characteristics of measures for specific applications, including specification of minimum acceptable features insofar as possible; systematically evaluating and comparing existing measures along these dimensions; and setting priorities for refining measures to minimize technical and ethical problems. A process of building consensus on standards and measures does not assume that summary measures are necessarily perfectible for all purposes or that policymakers will necessarily accept the value judgments on which they may be based. It does assume that the process will help clarify the imperfections of different measures, encourage resolution of correctable deficiencies in measures, and set guidelines for the appropriate application of such measures for different purposes (notwithstanding their imperfections). A standard-setting process is not incompatible with continued methodological innovation. Indeed, such a process should take full advantage of the experience of the developers and users of current methods, with an eye toward identifying the strongest features of each measure as a basis for refining existing measures or creating new measures. This suggests the desirability of simultaneous testing of alternative methods to assess their strengths and limitations and to compare the results they generate for policymakers.

OCR for page 5
The creation of a process for building consensus on population metrics would be consistent with a recent IOM report on community-based performance monitoring. It recommended the development and voluntary use of a set of standard measures for profiling the health of communities as part of a process of community health improvement (IOM, 1997). Canada is already pursuing national consensus on health data needs and measures including both highly detailed, longitudinal data on health status and health care utilization at the individual level and a family of compatible summary measures of population health (Wolfson, 1997). For this family of measures, a key idea is to construct an overall summary measure in such a way that it can be readily disaggregated to show distributions of health status for population subgroups. Moreover, work is under way on a statistical and analytical framework that provides connections with underlying disease processes (e.g., heart disease), risk factors (e.g., smoking), and social factors (e.g., family income). One objective of this framework is to allow the burdens of disease and the “attributable fractions” of risk factors to be expressed in terms of their incremental contribution to overall population health. Beyond the issues of “how,” “what,” and “when” to value, workshop participants noted— from a pragmatic perspective—the utility of collecting population health information in ways that allow easy disaggregation on the basis of social, demographic, and economic risk factors such as age, occupation, or income. Linking information about population health (including both summary statistics and more detailed data about specific health problems and conditions) to information about risk factors can provide important epidemiologic insights for planning public health strategies and shaping health services delivery systems. Starting points already exist for a process of building consensus on standards for summary measures of population health and agreement on adoption of particular measures for specific purposes. For example, the National Commission on Vital and Health Statistics serves as a forum for a wide variety of interested parties to collaborate in developing common data standards for different information systems and users, and the Organization for Economic Cooperation and Development has taken similar steps. Building on well-established principles of measurement and analysis and a number of related exercises undertaken by the Institute of Medicine (1992) and the Medical Outcomes Trust (1995), the committee identified several desirable characteristics or attributes for summary measures of population health. These attributes include the following: Reliability or reproducibility—a measure is reliable if repeated use under identical circumstances by the same or different users produces the same results. Validity—a measure is valid if it measures the properties, qualities, or characteristics it is intended to measure. Sensitivity or responsiveness—a measure is sensitive/responsive if it can detect differences or changes in population characteristics that are of interest to users of the measure. Acceptability—a measure is acceptable if its intended users (and the constituencies upon which they depend) find the results of its application (e.g., a summary statistic) understandable, credible, and useful for their purposes. Feasibility or burdensomeness—a measure is feasible if users can collect the necessary data and perform the required analyses without imposing excessive administrative, economic, or other burdens on those whose participation or cooperation is needed.

OCR for page 5
Universality or flexibility—a measure is universal/flexible if it is adaptable to the variability of problems, populations, settings, or purposes that face potential users. Documentation—a measure is documented when the methods, criteria, assumptions, and data employed in deriving or calculating the measure are clearly identified and publicly available. For different purposes, more specific criteria may be needed. For example, the recent Panel on Cost-Effectiveness in Health and Medicine (1996, pp. 120–121) argued for the development of a standard catalog of weights for use in cost-effectiveness analysis. It suggested an approach with these characteristics: “(1) derivation from a theory-based method on which empirical data have been collected; (2) availability of weights from a representative community-based sample of the U.S. population; (3) low burden of administration in clinical and population-based settings; and (4) ability to furnish weights for health states, as well as for illnesses and conditions.” The panel went on, however, to acknowledge that none of the systems presented conforms to all of these characteristics and that any specific system may not include information relevant for a particular analysis. Any process of standard setting and standardization will have ethical and political as well as methodological or technical dimensions. The committee’s next conclusion emphasizes the value component of measurement. All measures of population health involve choices and value judgments in both their construction and their application. Summary measures of population health have descriptive as well as evaluative uses at both the individual and the population levels. As indicated earlier, the most controversial uses will be those involving resource allocation decisions at the population level. To see why summary measures are controversial, one must examine the way in which value choices enter into both the construction and the application of different measures of population health. Although the use of measures that adjust years of life by the quality of that life present special ethical issues, even the use of well-accepted mortality measures involves value judgments. In particular, when analysts or policymakers rely entirely on mortality measures in considering alternative health policies, they neglect the disease burden—both physical and mental— of nonfatal disabling conditions. Saving or extending lives is not the only morally important function of health care. Thus, reliance on mortality alone can involve an ethical error of omission. Summary measures intended to correct this error of omission by combining mortality and morbidity can, however, lead to ethical errors of commission because they require a number of controversial value judgments. As described in the background papers by Daniels and Brock, basic choices include what aspects of health to measure and how to assign weights to different health states. One central set of weighting choices involves whose evaluations of health states are to be used in these measures. On the one hand, health professionals may have, or may be perceived to have, systematic biases related to their training, social status, and work experiences. On the other hand, members of the general community may have biases related to their experience or lack of experience with illness, injury, or disability. For example, people with a long-standing disability

OCR for page 5
have had time to adapt to their condition and as a result may value these states more favorably than those who have not experienced them. Whose evaluations of disabilities should be used? Different ethnic and other population subgroups may also evaluate health states differently. Careful comparison of different summary measures is needed to determine just how much variation in summary measures is caused by different choices about weighting health states. This will help in developing advice about whether and how to use particular measures. Workshop participants suggested that for clarity, discussions of the ethical status of summary measures of population health have to distinguish between the preferences (weights assigned to different health states as determined in sample surveys) that are used to construct a measure and the cultural, moral, and other values that guide policymakers in making decisions. Preference weights, of course, involve value judgments, and the methods for incorporating preferences into summary measures are likewise value-laden. These values need to be made explicit. In addition to choices about assigning weights to different health states, other value judgments also characterize different summary measures. Though most QALY-type measures value a year of healthy life at each age equally, the DALY, as currently constructed, has weighted the value of health at different ages differently, for example, applying lower weights for the very young and the very old. This feature is not an essential element of the general DALY approach, but it is a highly controversial choice that may affect the public acceptability of the measure. Similarly, as Brock notes (1997), value judgments influence choices about which life expectancies to use as a baseline in constructing DALYs because these expectancies differ by gender and other population subgroups, as well as across nations. In any case, the value judgments embedded in the construction of specific summary measures of population health must be made evident if these measures are to be responsibly used or revised for such purposes as resource allocation. Additional ethical controversies about both the design and the application of summary measures arise when they are employed in cost-effectiveness analyses that are intended to guide resource allocation at the population level. These controversies involve a particularly difficult set of distributive issues that are intrinsic to decisions about resource allocation regardless of the data and measures used to inform the decisions. When should resources be allocated to produce “best outcomes” and when resources should be divided to give people fair chances at some benefit? How much priority should the sickest or worst-off patients have? When should the prospect of modest benefits to many people outweigh the delivery of significant benefits to fewer people? The straightforward use of cost-effectiveness analysis favors specific, yet contested, answers to these questions (Harris, 1987). That is, it would give no priority to the sickest patients, would permit any aggregation that maximized health benefit per dollar spent, and would always support the best outcomes. The contested ethical assumptions behind this approach to cost-effectiveness analysis are that a unit of a summary measure—be it a QALY or a DALY— has the same moral value wherever it is distributed, that a benefit to one always compensates for a loss to another, and that it is always morally desirable to maximize benefit in the aggregate or at the margin. Thus, for example, a loss of one quality-adjusted life year for a single person can be offset by a gain of a twentieth of a quality-adjusted life year for 20 different people. These assumptions, as Rawls (1971) argues, ignore the “separateness” of persons (i.e., that the losers and the gainers are different people with different experiences not reflected in theoretical assumptions).