Selecting Measures for the National Health Care Quality Data Set
The committee proposes two basic guidelines for defining the content of the National Health Care Quality Data Set that will be used to produce the National Health Care Quality Report (also referred to as the Quality Report). The first one is the framework ( Chapter 2), which indicates the aspects or domains of health care quality that should be measured. The second involves criteria for the selection of measures, which are discussed in this chapter. The criteria indicate the desirable characteristics of individual measures and the measure set as a whole. They can be used to help assess candidate measures and potential measure sets. This chapter presents the criteria proposed by the committee, as well as diverse aspects related to the definition of the measures and of measurement in general. Although the measures may and should change over time, the framework and the criteria for measure selection, as set forth here, should remain relatively constant.
RECOMMENDATION 2: The Agency for Healthcare Research and Quality should apply a uniform set of criteria describing desirable attributes to assess potential individual measures and measure sets for the content areas defined by the framework. For individual measures, the committee proposes ten criteria grouped into the three following sets: (1) the overall importance of the aspects of quality being measured, (2) the scientific soundness of the mea-
sures, and (3) the feasibility of the measures. For the measure set as a whole, the committee proposes three additional criteria: balance, comprehensiveness, and robustness.
Among the ten specific criteria for selecting individual measures for the National Health Care Quality Data Set, three refer to the importance of the subject of measurement: (1) its impact on health, (2) its meaningfulness, and (3) its susceptibility to being influenced by the health care system. Three other criteria pertain to the scientific soundness of the measure: (4) its validity, and (5) its reliability, and (6) the explicitness of the evidence base for the measure. The last four criteria relate to the feasibility of using the measure: (7) the availability of measure prototypes, (8) the availability of required data across the system, (9) the appropriateness of the cost or burden of measurement, and (10) the capacity of data and measure to support subgroup analysis to compare across populations and states.
The measure set as a whole should also fulfill three separate criteria. It has to be balanced so that it includes both positive and negative aspects of the quality of care; it should be comprehensive so that it will represent the majority of care; and finally, it should be robust to minor changes in the measures or in the sample so that it reflects only significant changes in the underlying quality of care.
These are ideal criteria and should not be interpreted as strict requirements for potential measures. In the short term, all criteria referring to feasibility and/or scientific soundness may not always be fulfilled. The evaluation of measures based on the proposed criteria can be used to pinpoint areas for improvement in measure development.
RECOMMENDATION 3: The Agency for Healthcare Research and Quality should have an ongoing independent committee or advisory body to help assess and guide improvements over time in the National Health Care Quality Report.
Measure selection is a complex process that includes several steps ranging from identifying a set of candidate measures to updating these measures. Activities relating to the definition of the measure set, the reporting of measures, and the eventual interpretation of the findings in the National Health Care Quality Report all require expert input. The committee recommends that the Agency for Healthcare Research and Quality obtain the collaboration of an independent advisory body with broad-based representation in this process. The advisory body should include major interested parties from both the private and the public sectors with technical expertise in measure development and reporting. It should also include representatives from labor unions and organizations of purchasers, consumers, providers, insurers, state policy makers, and academia. It should
include both national- and state-level representatives. The body could be analogous to the National Committee on Vital and Health Statistics (NCVHS) or the National Quality Forum (NQF) (Kizer, 2000; National Quality Forum, 2000; Stead, 1998) and should fulfill technical, representative, and interpretive functions.
RECOMMENDATION 4: The Agency for Healthcare Research and Quality should set the long-term goal of using a comprehensive approach to the assessment and measurement of quality of care as a basis for the National Health Care Quality Data Set.
A comprehensive system is one in which the majority of care for a given population is assessed using a large number of measures representing the many components of health care quality and consumer perspectives on health care needs in an integrated manner and spanning a variety of health care settings and conditions. This approach should result in a more complete and accurate picture of the state of quality in the nation than is now available. To this end, the Agency for Healthcare Research and Quality should evaluate current efforts to develop comprehensive quality measurement systems (for example, in the area of effectiveness) and examine how they may be used and expanded.
The committee agreed that a comprehensive approach to measurement is ideal for the National Health Care Quality Data Set on which the Quality Report will be based. A limited set of measures would be insufficient to capture the four components of quality and the diverse consumer perspectives on health care needs. Therefore, conceptually, the National Health Care Quality Data Set should be as comprehensive as possible, but by necessity, reporting will be selective (see Chapter 5 for further discussion of the Quality Report).
Experience with comprehensive systems of health care quality measurement is limited. The RAND QA Tools system (McGlynn, 2000), although still under development, shows promise as a means to assess effectiveness. QA Tools could be expanded to examine limited aspects of “patient centeredness,” timeliness, and safety, but complementary data collection and reporting systems may be have to be developed and expanded in order to cover all four components of quality in a comprehensive manner.
RECOMMENDATION 5: When possible and appropriate, and to enhance robustness, facilitate detection of trends, and simplify presentation of the measures in the National Health Care Quality Report, the Agency for Healthcare Research and Quality (AHRQ) should consider combining related individual measures into summary measures of specific aspects of quality. AHRQ should also make available to the public information on the individual mea-
sures included in any summary measure, as well as the procedures used to construct them.
Summary measures combining several individual measures have an important but selective, role in the National Health Care Quality Report. Carefully crafted and thoroughly evaluated summary measures should be used when it seems sensible and where it will facilitate understanding. They should not be overemphasized. Despite the challenges in the design and validation of summary measures in the area of health care quality, they may be useful in summarizing broad trends in quality of care, such as for each of the components of quality or their subcategories. Summary measures are clearer when presented along with a corresponding reference point or benchmark (for example, past performance, desirable performance).
The highest level of aggregation should be the major categories of the framework. In general, it is better to combine only measures for the same quality component or consumer health care need (or one of its subcategories). For example, diverse measures of safety could be aggregated across health care needs. Using a matrix to classify the measures, in which components of health care quality are represented as columns and consumer perspectives on health care needs as rows, a summary measure could aggregate measures within a cell or measures across cells in the same column or row of the matrix (see Figure 2.2). In either case, it will be necessary to determine if all component measures should or should not be weighted equally. An overall summary measure of health care quality across the components of quality (that is, safety, effectiveness, patient centeredness, and timeliness) is problematic and not feasible at this time. Regardless of the number and nature of summary measures that are defined for the Quality Report, information on individual measures that make up the summary measure should also be made available. This will allow interested parties to examine how the summary measures were constructed.
RECOMMENDATION 6: The National Health Care Quality Data Set should reflect a balance of outcome-validated process measures and condition- or procedure-specific outcome measures. Given the weak links between most structures and outcomes of care and the interests of consumers and providers in processes or practice-related aspects as well as outcome measures, structural measures should be avoided.
The National Health Care Quality Report and Data Set should rely on a balanced set of process and outcome measures and should avoid structural measures. A combination of process and outcome measures will satisfy the needs of policy makers, clinicians, and consumers. The committee recognizes that clinical processes change. Any measures for the National Health Care
Quality Report and Data Set should not stifle innovation by institutionalizing specific processes or structures that can soon become outdated.
EXAMINING POTENTIAL MEASURE SELECTION CRITERIA
Many potential measures of health care quality have been developed. The number and nature of the measures vary according to the area of quality being considered. Sound criteria are needed for the selection of individual measures, as well as for the definition of the measure set for the National Health Care Quality Data Set. As a starting point in its efforts to define the criteria to select measures for the data set, the committee reviewed criteria defined by other groups involved in similar efforts (see Appendix E) (Advisory Commission on Consumer Protection and Quality in the Health Care Industry, 1998; Department of Health, 1999; Donabedian, 1982; Foundation for Accountability, 1999a; Institute of Medicine, 1999; National Committee for Quality Assurance, 2000; National Research Council, 1999). The committee also examined the preliminary criteria proposed by the Department of Health and Human Services (DHHS) working group on the National Health Care Quality Report, led by the Agency for Healthcare Research and Quality. The most common selection criteria among those examined were relevance, meaningfulness, scientific or clinical evidence, reliability, feasibility, validity, and health importance ( Table 3.1). The relevance of the aspect being measured was the only criterion present in all of the sets of criteria examined. After relevance, the most commonly cited criteria were meaningfulness (considered an aspect of relevance by some) and the availability of scientific or clinical evidence for the measure.
The number of criteria for measure selection ranged from 6 for the Child and Adolescent Health Measurement Initiative of the Foundation for Accountability (FACCT) (Foundation for Accountability, 1999a) to 19 for the HEDIS (Health Plan Employer Data and Information Set) measures developed by the National Committee for Quality Assurance (NCQA) (National Committee for Quality Assurance, 2000). As might be expected, sets with fewer criteria usually had more inclusive definitions, whereas those with more criteria were more specific.
CRITERIA FOR SELECTING INDIVIDUAL MEASURES FOR THE NATIONAL HEALTH CARE QUALITY DATA SET
Major Aspects to Consider
For purposes of the National Health Care Quality Data Set, the committee proposed two levels of measure criteria classification: (1) a higher level of major categories to group similar criteria and, (2) a lower level of specific, individual criteria.
The measure should address features of the health care system applicable to health professionals, policy makers, and consumers.
Meaningfulness or interpretability
The measure should be understandable to at least one of the audiences. It should help inform them about important issues or concerns.
Scientific or clinical evidence
The measure should be based on evidence documenting the links between the interventions, clinical processes, and/ or outcomes it addresses.
Reliability or reproducibility
The measure should produce the same results when repeated in the same population and setting.
The measure should be specified precisely. Collection of data for the measure should be inexpensive and logistically feasible.
The measure should make sense (face validity); correlate well with other measures of the same aspects of care (construct validity); and capture meaningful aspects of care (content validity).
The measure should include the prevalence of the health condition to which it applies and the seriousness of the health outcomes affected.
a Criteria are listed in order of frequency, with the one mentioned most often listed first.
b The same label for a criterion can have different meanings depending on the framework because the criteria are not standardized. The definition, rather than the labels, were used to construct this table.
c This term was used as a category covering several criteria in some of the frameworks and as a single criterion in others.
SOURCES: This table is based on the analysis of measure selection criteria from frameworks used to study health care quality and health status (see Appendix E). Parts of this table were adapted from NCQA’s list of desirable attributes for HEDIS measures (National Committee for QUALITY assurance, 2000).
The individual measure criteria are grouped into three major categories— importance, scientific soundness, and feasibility ( Box 3.1). Each of these categories refers to different aspects of the process for measure selection. Importance, the first category, groups criteria having to do with selecting the areas or subjects of measurement. Together with the framework, the importance criteria can be used to define the content areas of the report, in other words, what will be measured. Scientific soundness, the second category of criteria, refers to the characteristics of the measures themselves. It groups criteria
describing the properties of the measure and the available evidence on the soundness of the measure being considered. The criteria for scientific soundness are used to define how to measure or, more precisely, which specific measures are best suited to evaluating the areas under consideration. Feasibility, the third and last category, refers to the ease of actually using the measures being considered. In other words, once what to measure and alternative ways of measuring it have been determined, the likelihood of success in actually using the measures proposed must to be examined.
Specific Aspects to Consider When Selecting Measures
Having defined the three major categories of criteria—importance, scientific soundness, and feasibility—that should be taken into account when examining possible measures for the National Health Care Quality Data Set, the committee then determined the specific criteria under each of these categories. In doing so, the committee aimed to capture the essential attributes of the final measures with as parsimonious a set of criteria as possible. After extensive discussion, the committee agreed on 10 criteria across the three major categories, as defined below.1
Criteria Regarding Importance
This category of criteria refers to whether the area under consideration should be measured at all or whether it is important in a clinical care sense, important to the general population, or important to improve the quality of health care delivery. The subject of measurement can refer to a health condition or to an organizational aspect of the health care system that influences quality of care. Importance criteria answer the following questions:
What is the impact on health associated with this problem?
Are policy makers and consumers concerned about this area?
Can the health care system meaningfully address this aspect or problem?
Each of the criteria for importance is defined more precisely below.
1 Definitions of criteria are based on the committee's understanding of them but draw on previous work, particularly the criteria for HEDIS measures defined by the National Committee for Quality Assurance (2000).
BOX 3.1 Desirable Characteristics of Measures for the National Health Care Quality Report
To be selected as a measure for the Quality Report, the measure or area it represents should rate highly in terms of the following:
1. Importance of what is being measured
2. Scientific soundness of the measure
3. Feasibility of using the measure
1. Impact on health The measure should address important health priorities, such as issues related to care or specific conditions. They should represent problems that significantly affect morbidity, disability, functional status, mortality, or overall health Preferably, the measure will address areas in which there is a clear gap between the actual and potential levels of health that can be influenced by improvements in the quality of care. Areas in which there is a large degree of unexplained variation in health status, death rates, or disease rates can also be the subject of measurement when there is reason to believe that quality of care influences the variation (Wennberg and Gittelsohn, 1973) Many of the health priorities for the next decade spelled out in Healthy People 2010 could apply here (U.S. Department of Health and Human Services, 2000b). However, while the first focuses on specific measures of health status (for example, the incidence of heart attack in a specific population), the second will
focus on measures of the quality of care for specific conditions (for example, the administration of aspirin after a heart attack and related outcomes).
2. Meaningfulness to policy makers and consumers. The measure should be easily understood by policy makers and individual consumers and should refer to something that matters to them or should matter to them. A meaningful measure represents an aspect that is relevant to the intended audience and can be communicated easily to that audience (the latter is also called interpretability). Consumers should be able to understand the significance of differences in quality of care that the measure conveys. Policy makers should be able to interpret easily the meaning of changes that the measure tracks, across population groups or from one period to the next—for example, changes in therapy for heart attack that lead to reduced death rates. If a measure is meaningful to key stakeholders, it is usually easier to obtain their support.
3. Susceptibility to influence by the health care system. The measure should reflect an aspect of care that can be influenced by the health care system as it exists or as it is envisioned. That is, policy makers can take specific actions (generally at the structural or process level) to improve health care in that area and, ultimately, health status. Injuries caused by automobile accidents, for example, are the leading cause of death among young adults, but most remedies (for example, changing car design or reducing the speed limit) lie outside the influence of the health care sector (National Center for Health Statistics, 2000:Table 33). The time period for the measure should also capture events that have an impact on clinical outcomes and reflect the time horizon over which the quality of the health care system can be measured. For example, while better nutritional counseling may lead to less osteoporosis and bone fractures, the lead time is too long for it to be used as a measure of quality of care.
Criteria Regarding Scientific Soundness
The second category of criteria is the scientific soundness of the measure. These criteria refer to properties of the measure that often have to be assessed formally by researchers. They largely determine the credibility of the measure, particularly among health care practitioners. These criteria answer the following questions:
Does the measure actually measure what it is intended to measure?
Does the measure provide stable results across various populations and circumstances?
Is there scientific evidence available to support the measure?
Each of the criteria for scientific soundness is defined as follows:
1. Validity. The measure should make sense logically and clinically (face validity); it should correlate well with other measures of the same aspects of the quality of care (construct validity) and should capture meaningful aspects of the quality of care (content validity) (Carmines and Zeller, 1991; Nunnally, 1978). In general, measures should be linked to significant processes or outcomes of care as demonstrated by scientific studies. For example, the provision of selected screening tests in a timely manner is a process measure of quality that has construct validity when the screening is linked to earlier detection of disease and a better prognosis or outcome. Outcome measures should be examined for validity in a similar manner.
2. Reliability. The measure should produce consistent results when repeated in the same populations and settings, even when assessed by different people or at different times. Measure variability should be result from changes in the subject of measurement rather than from artifacts of measurement (for example, a change in the definition of the measure or, for rare events, restricted sample size or small numbers of cases) (Carmines and Zeller, 1991; Nunnally, 1978). This aspect is particularly important for periodic data collection. Most measures will have to be repeated every year, and any changes in the measure should reflect a true change in quality.
3. Expliciness of the evidence base. There should be a clearly documented scientific foundation for the measure as demonstrated in the literature. An explicit evidence base could also mean that there is some other specific, formal process by which the measure has been accepted as a valid marker for quality, such as review by an expert panel (Brook, 1994). This criterion should not be interpreted as a strict requirement for evidence from randomized clinical trials only. Scientific evidence for the measure can also include observational studies since they are often complementary approaches (Black, 1996).
Criteria Regarding Feasibility
The third category of criteria refers to the feasibility of implementing the selected measures: that is, once it has been decided what to measure and how to measure it, one must examine whether it can actually be measured. The criteria for feasibility answer the following set of questions:
Is the measure in use?
Can the information needed for the measure be collected in the scale and time frame required?
How much will it cost to collect the data needed for the measure?
Can the measure be used to compare different groups of the population?
The criteria for feasibility are defined in more detail below:
1. Existence of measure prototypes. The availability of a prototype means that the measure has already been tested and applied, so it can be used by others and incorporated into a national data set. Evidence should be available that data for the measure have been collected in a variety of settings. In other words, the measure should currently be operational. In addition, it is precisely defined, and specifications have been field tested. Documentation for the measure should include clear and understandable statements of the requirements for data collection and the definition and computation of the value of the measure (Joint Commission on Accreditation of Health Care Organizations, 1999).
2. Availability of required data across the system. Data required for the measure should generally be available for the nation as a whole and available during the period allowed for data collection. Information for the measure should be anticipated from an established data source on a regular basis. Selecting measures for which several sources of data are available can increase the validity of the information when this facilitates the use of multiple measures of the same concept. This process can provide a more complete and valid picture of quality (Kvale, 1995).
3. Cost or burden of measurement. Collecting the information for the measure should not impose an excessive burden on the health care system or on national data collection systems. The cost of data collection and reporting should be justified by potential improvements in quality and outcomes that could result from the act of measurement. This criterion also means that measures that will be examined at the population subgroup or state level should not require sample sizes so large as to be virtually impractical.
4. Capacity of data and measure to support subgroup analyses. Since equity and medical conditions will be examined in the Quality Report, measures should be available for relevant groups of the population (for example, by race and ethnicity, level of income, insurance status) and by condition (for example, diabetes, breast cancer, asthma) when applicable and feasible. Although the report is national in scope, it should be possible to use the measures to drill down to the state level.2 To make meaningful comparisons across groups, the measure should not be appreciably affected by variables beyond the control of the health care system. For example, time from the occurrence of a heart attack to admission to the emergency department is crucial to survival, but it can be influenced more heavily by transportation issues than by quality of health care. Extraneous factors should be known and measurable so that the necessary data can be collected and their effects assessed. Well-described methods for taking these factors into account should be used. These include stratification to examine quality of care separately for each group of interest and case-mix or risk adjustment methods using validated statistical models to control for these factors (Greenfield, in
2 This aspect is discussed in more detail in Chapter 4 as it relates to the criteria for selecting data sources.
press; Greenfield et al., 1993; McGlynn and Asch, 1998; Romano, 2000; Wang et al., 2000; Zaslavsky et al., 2000). When measures are affected by outside factors and the required information to account formally for these is not available, data should still be reported for the unadjusted measures. Researchers can subsequently perform the statistical analyses required to adjust the data.
Evaluating Individual Measures According to the Criteria
Each of the criteria described above is a desirable attribute of the measures selected for the report, but not an absolute requirement. That is, a measure does not have to fulfill all 10 criteria to be part of the National Health Care Quality Data Set, but measures in the final set should be those that satisfy the greatest number of criteria. In addition, there is a hierarchy among criteria categories. Measures should be evaluated first for importance and scientific soundness and then for feasibility. Measures that address important areas and are scientifically sound, but are not feasible in the immediate future, deserve potential inclusion in the data set and further consideration. However, measures that are scientifically sound and feasible, but do not address an important problem area, would not qualify for the report regardless of the degree of feasibility or scientific soundness.
The level of scientific evidence available will vary by measure and by subject. Several rating scales have been proposed to evaluate the quality of the studies or level of evidence (Clark and Oxman. 2000; Lohr and Carey, 1999). In some instances, the level of evidence for the measures will not be as high as desired, while in others, new measures will have to be developed. Rather than ignore such areas of quality of care, measures should be included if they fit the framework and meet other criteria while additional evidence is being developed to assess them.
The committee's vision for the Quality Report has not been limited by the fact that it might not be possible at present to fulfill all of the desired criteria. The feasibility criteria do not mean that only easily available measures should be used. In order to define needed measures, it may be desirable to relax the requirement for feasibility to go beyond existing measures for evaluating the quality of care. Previous experience with the definition and production of Healthy People 2000 has shown that feasibility can be constructed over time. In 1990, approximately one-third of the original Healthy People 2000 objectives were not measurable and were labeled “developmental objectives.” By the end of the decade, data were available for most of these objectives. Including them in the agenda led to data generation (U.S. Department of Health and Human Services, 2000b).
EVALUATION CRITERIA FOR THE NATIONAL HEALTH CARE QUALITY MEASURE SET
The committee discussed several aspects that should characterize the complete set of measures for the National Health Care Quality Data Set. Three criteria are basic for the measurement set: balance, comprehensiveness, and robustness.
First, the set of measures should be characterized by balance. Collectively, the measures should be useful for examining areas in which the quality of health care delivery is usually satisfactory or outstanding and areas in which it is often deficient. The set of measures should be representative of the entire range of experiences with care and not be limited to just the negative or positive aspects of the current health care quality landscape. There should be balance across components of quality and health care needs. The number of measures in each category (for example, safety versus effectiveness) does not have to be the same, but it should be sufficient to adequately measure the area of interest, rather than just one narrow aspect. For example, several measures of safety focusing only on surgical operations might be balanced by measures of medication errors and certain sentinel events. The number of measures will also be determined partly by a combination of the degree of differentiation of the area of measurement, the generality of the measures available, and the number of measures available.
Comprehensiveness is a second criterion for the measure set related to balance. The measurement set should present a complete and thorough picture of the quality of care being delivered in the United States; it should reflect the spectrum of care for the population. The measures should be representative of different elements of the health care system, as well as of how they interact (Sofaer, 1995). There should be measures representing safety, effectiveness, patient centeredness, and timeliness for the various health care needs—staying healthy, getting better, living with illness or disability, and coping with the end of life. This means that there should be a reasonable distribution of measures across categories, rather than only in areas where data are presently available. Measures should also cover different health care settings, from hospital to home, drawing on measures of quality of care in both ambulatory and inpatient settings. In addition, the measure set should address the problems of diverse groups of the population over the entire life span.
To examine potential inequities in the quality of care, it will also be important that the measure set include information on health conditions or clinical areas more prevalent among certain vulnerable populations, such as lower-
income groups. For example, to examine equity, the report should include measures of the quality of care for people with tuberculosis (more prevalent among the poor) (Bock et al., 1998), as well as those receiving infertility treatment (more prevalent among higher-income people) (Stephen and Chandra, 2000).
It is important to distinguish between the number of measures that will be included in the National Health Care Quality Data Set and the number of measures in the National Health Care Quality Report. Given that one can rarely generalize from the quality of care for one condition to the quality of care for another (Brook et al., 1996), the number of measures in the data set will have to be quite large in order to fulfill the criteria. In contrast, as discussed in Chapter 5, the number of measures in any specific year's Quality Report will be more limited and much smaller than the number of measures included in the data set to facilitate understanding of the report.
Robustness is the third criterion for the measure set. A robust measure set is stable over time and reflects only true changes in the quality of care. Information drawn from the measure set should not be extremely sensitive to minor changes in the organization, financing, or delivery of health care services because these factors are not, in themselves, changes in the quality of care. The measure set should retain its value as processes of care evolve or the implementation of particular measures changes. For example, if data on the quality of medical care are being drawn from prescribing patterns, it would not be good to focus on medications that shift from prescription to over-the-counter drugs since this change would affect interpretation of the results. Rather, changes in the measures should reflect meaningful changes in the overall quality of health care delivered.
Measures included in the set should represent a cross section of care so that if one aspect of the quality of care changes, it does not unduly affect the total picture of quality captured by the measure set. In this sense, robustness is related to the comprehensiveness of the measure set because the larger and more varied the number of measures, the more likely is the set to be both comprehensive and robust. In general, it is better to have multiple measures for each category and subcategory of the framework, even if some measures might be used for more than one category. The measure set should not be unduly affected by minor changes in component measures. For example, the appraisal of safety of care in a given year should not change just because a new measure was added to the data set. Robustness means that changes in the measure set reflect true changes in the quality of care over time and that the measure set can be used over time. In order to do so, it would be appropriate to update the measure set to conform to evolving practice guidelines and standards of quality of care, as discussed below.
MEASURE SELECTION PROCESS
Steps in the Process of Measure Selection
Measure selection lies at the heart of the National Health Care Quality Report and the data set from which it will draw. To decide what to measure, the Agency for Healthcare Research and Quality should identify areas for measurement based on the framework and their importance. To decide how to measure quality, AHRQ should evaluate competing measures and determine the measure set for the report. These actions, in turn, involve several steps ( Box 3.2).
More specifically, to identify areas for measurement, AHRQ should examine the framework to identify the categories of measurement for the Quality Report and areas that should be included in the data set. The components of quality offer ready-made measurement categories: safety, effectiveness, patient centeredness, and timeliness. AHRQ should next evaluate specific areas for measurement within the framework, applying the criteria for importance presented earlier in this chapter. In other words, areas for measurement should have an impact on health status, be meaningful to policy makers and consumers, and be under the influence of the health care system. If national goals and standards for the quality of health care delivery based on the framework have been defined, AHRQ should include measures to evaluate progress in meeting them.
To identify a set of candidate measures, AHRQ should first define a pool of available measures. For example, to select measures for timeliness, AHRQ should identify possible data sources and existing databases such as HEDIS, the Consumer Assessment of Health Plans Survey (CAHPS), and measures proposed by FACCT, the Picker Institute, and others. AHRQ can examine the examples proposed by the IOM committee for general guidance on measures that may be more appropriate than others ( Box 2.1, Box 2.2, Box 2.3, Box 2.4 to Box 2.5, Chapter 2) (Agency for Healthcare Research and Quality, 2000; Foundation for Accountability, 1999c; National Committee for Quality Assurance, 2000a; Picker Institute, 2001). The agency should also consult provider and patient groups, and other interested parties, for feedback on candidate measures. After casting this broad net, AHRQ should identify those areas in which measures are lacking and should be developed.
To actually select individual measures for the data set, AHRQ should evaluate competing measures by examining evidence of their scientific soundness—or their validity, reliability, and explicit evidence base. In the long term, it should also assess their feasibility or the existence of measure prototypes; the cost or burden of measurement; the availability of requisite data across the system; and the capacity of the data and the measure to allow comparisons by populations or subpopulations.
BOX 3.2 Steps in the Process of Defining the National Health Care Quality Measure Set
1. Identify the areas for measurement.
2. Identify candidate measures for the National Health Care Quality Data Set.
3. Evaluate competing measures.
4. Determine the measure set for the National Health Care Quality Data Set.
5. Test and evaluate the measure set.
Thus, to determine the final set of measures, AHRQ should evaluate individual potential measures and the resulting measure set according to the criteria outlined above. AHRQ should also arrange to have the measures reviewed by the public. In doing so, the agency could learn of concerns it had not anticipated or gather other information that might be valuable. In addition, the agency may learn of possible objections to the report that it should be prepared to effectively address. The measure set will have to be revisited periodically in response to changes in the availability of data, the evaluation of the measure set, the development of new measures, and potential changes in the framework. Other aspects related to measure selection are addressed below.
Role of an Advisory Body
Although responsibility for producing the National Health Care Quality Data Set rests with AHRQ (Healthcare Research and Quality Act, 1999), the committee believes that the agency should establish a mechanism to solicit input from major stakeholders and technical experts engaged in health care quality improvement, measurement, and reporting. These should include representatives from the public-and private-sectors at both the national and the state levels, such as public and private purchasers, labor unions, consumer groups, providers, insurers, federal and state health policy makers, national accrediting organizations, and academia. The advisory body would provide a common venue for public and private sector health care quality measurement, quality improvement, and oversight organizations to coordinate and collaborate on the most important national quality measurement and reporting activity. In parallel, AHRQ should continue to sponsor a specific group within the agency, or within DHHS, that will ultimately be responsible for the design and production of the Quality Report.
The activities of the external advisory body would range from providing advice on measure selection to report production and eventual updates. This body could also play a role in setting national goals and standards for health care quality so as to facilitate quality measurement and reporting. It could also provide insights into the interpretation of any findings and the formulation of potential policy solutions. Another activity of the advisory body could be to promote research on new measures, the definition of summary measures, and other areas needed to update and improve the Quality Report.
Establishing an advisory body can be accomplished through a variety of alternative mechanisms. One avenue would be to build on the collaborative working relationship already in place between AHRQ and other organizations that serve this purpose, such as the National Quality Forum (Foster et al., 1999; Miller and Leatherman, 1999; National Quality Forum, 2000). Another would be for AHRQ to establish a body analogous to the National Committee on Vital and Health Statistics (NCVHS), which provides advice to DHHS in areas related to health data (U.S. Department of Health and Human Services, 2000a).
Reviewing and Updating the Measure Set
Once the initial set of measures has been defined, there will be a need for periodic review and updating. This could be one of the functions of the advisory body recommended by the committee. At defined intervals, it will be necessary to examine whether any changes are needed in what is measured or in the way it is measured. Changes in measures should be considered if new evidence becomes available on aspects being measured, if a new, priority has to be reflected, if new and relevant data become available, or if specific measures are improved. For example, the Consumer Price Index (CPI) produced by the Bureau of Labor Statistics is based on a market basket of goods and services that is updated periodically according to changes in consumption patterns (Bureau of Labor Statistics, 2000).
This periodic review of measures may also be useful to obtain support from key stakeholders. After an initial period of development, testing, and improvement, updating should tend to be conservative given the extensive process undertaken to define the initial measure set. In addition, only by keeping most measures the same from year to year will it be possible to analyze potential changes over time in the aspects being measured. The frequency of review and updating is also likely to decrease over time as the process of measure selection, the data set, and the report become established.
MEASURING HEALTH CARE QUALITY COMPREHENSIVELY
A selective approach to measuring quality relies on a limited number of measures thought to be representative of the general state of quality. The alternative is a comprehensive approach based on a large number of measures to assess the quality of the majority of care across both dimensions of the framework. Such an approach is necessary to examine the quality of care for most populations and problems ranging from the delivery of care to children with complex conditions to mental health care for the vulnerable elderly. The main advantage of a selective rather than a comprehensive approach to measurement is its economy. A smaller number of key indicators can be understood easily by a broader audience. The selective approach also tends to be more appealing to policy makers, who have limited time and resources.
Advocates of a comprehensive approach point to the great variability in the practice of medicine and argue that a limited set of measures cannot accurately reflect the wide differences in health care across conditions. A more comprehensive approach is also seen as less likely to be biased because the measures included would be more representative of the totality of care. The amount of information from the hundreds of indicators included can be made more manageable by combining them into a limited number of summary measures for reporting, purposes, as discussed below.
Although no comprehensive group of measures is available that would cover all quality components and consumer health care needs at this time, the committee recommends that this be the ultimate goal of the National Health Care Quality Data Set. Measurement systems for particular components of quality are now being developed that exemplify the potential of a comprehensive approach. One of these efforts is RAND's QA Tools system developed to measure the effectiveness of health care for populations across the most common conditions and clinical areas. Data are drawn from patient records on more than 1,000 indicators, which are combined into summary measures of compliance with guidelines for recommended care (see Appendix B). Evaluating the suitability of RAND's QA Tools is beyond the scope of the committee's work. However, AHRQ should examine this and other promising initiatives more closely to determine if they could be used as the basis for a more comprehensive measure set and reporting system on the quality of health care in the United States.
To be truly comprehensive, the Quality Report should also address the quality of health care delivery not just in the traditional settings of hospitals and clinician's offices but also in a growing number of other settings—patient's homes, hospices, nursing homes, and community health centers—where care is also being provided in increasing numbers. Strategies for improving measurement and data collection systems for many of these settings will have to be defined by AHRQ. The relative importance of each of these settings for improving health care delivery will also have to be examined.
TYPES OF MEASURES
Role of Summary Measures
Deciding whether or not the National Health Care Quality Report should feature summary measures as well as discrete individual measures is one of the most important measurement-related issues. This is a common dilemma that confronts those presenting data on complex subjects such as health care quality. In most cases, the objective will be to strike a balance between the two. It will also be necessary to determine if all individual measures that make up the summary measure are equally important and should therefore be weighted equally. The committee recommends that once individual measures for the data set have been defined AHRQ consider the possibility of defining and testing alternative summary measures for each of the components of health care quality (and other major categories and subcategories in the framework) when appropriate. It should, however, avoid using an overall summary measure of quality of care.
Admittedly, an overall summary measure would provide a single number representing the quality of health care delivery in the United States. Summary measures are generally easier for the public to grasp. However, summary measures can mask important differences and relationships among the individual
measures included in them and may make it difficult to identify which parts of the health care system contributed most to quality (Mulligan et al., 2000). For example, the quality score for a year when safety was extremely deficient, but other quality components were above expectations, would be about equal to the score for a year when all four quality components were average.
Adding the measures across the four components of quality for a single overall summary measure of quality or for a quality index is clearly problematic. Summary measures are useful only when measures using the same metric (that is, the same unit of measurement and the same denominator) or measures within a single category or subcategory can be combined in meaningful ways. Such summary measures would allow for an examination of high-level trends regarding each of the components of quality or their subcategories, for example, the safety of surgery or the quality of care for diabetes. Summary measures are also more understandable when they are presented with a benchmark or standard as a reference point.
Even when summary measures are not possible, the Quality Report should include sets of individual measures based on data from similarly defined populations. Including measures that share a common denominator (for example, rates of services per 1,000 persons per year) can facilitate comparisons across different aspects of the quality of care.
Individual measures provide more detailed information for policy action than summary measures, but relying exclusively on individual measures for the Quality Report would make it unmanageable. Many would be needed to convey important aspects of a complex topic such as health care quality. The sheer number of measures overwhelms even policy specialists, leading them to take cognitive shortcuts, such as emphasizing the importance of just one factor (Hibbard, 1998).
Therefore, well-tested summary measures, along with well-tested individual measures, are needed for the Quality Report. England's set of high-level indicators of the performance of the National Health Service (NHS) includes mostly individual measures along with a few summary measures. For example, it combines individual measures of cervical and breast cancer screening coverage into a summary indicator for early detection of cancer (Department of Health, 2000).
Transparency is essential when summary measures are used. Otherwise, the use of summary measures can potentially detract from the impact and credibility of the Quality Report (Kingdon, 1995). To reduce the possibility of misinterpretation and dispel any apparent arbitrariness, the way in which summary measures have been constructed should be explained clearly, and the information and data for the individual measures that make up a summary measure should be made available as well.
Measures of the Structure, Processes, and Outcomes of Health Care
Donabedian's framework for quality assessment (1966) retains viability as a major way to classify quality indicators because it parallels clinical and organizational perspectives on care. Structure, process, and outcome measures each provide a different piece of the quality picture. However, given the present limitations of each of these types of measures and the generally weak links among them, the committee recommends that the National Health Care Quality Data Set include a balance of outcome-validated process measures and disease-specific or procedure-specific outcome measures. It should include structural measures only rarely.
Providers tend to focus on measures of processes of care because processes are actionable. They are the ones most closely linked to health care delivery and reflect the actual practice of health care as it takes place. This is most evident in measures of compliance with practice guidelines; for example, effective care for diabetics includes measuring blood sugar levels at specific intervals, which is a process measure (Greenfield, in press). However, many different process measures are necessary to comprehensively assess quality and guide specific improvements (Palmer, 1997).
One of the main problems with process measures is that they are not always directly linked to a significant health-related outcome that is of interest to the policy community (Welch et al., 2000). This is due to several factors. Process measures are usually studied independently of the total context of care. Even when processes have a direct and important impact on outcomes (for example, immunizations, beta-blockers for a heart attack), their effect on routine practice may not be as strong as that recorded in clinical trials. For example, studies have shown that mortality following a heart attack is similar for patients under the care of generalists and patients under the care of cardiologists, despite the fact that the latter perform more of the processes that have been shown to reduce mortality in randomized clinical trials (Ayanian et al., 1997). In the field, factors such as competing diseases and patient characteristics can dilute the effect of specific processes on health (Greenfield et al., 1993).
Many process measures refer to diagnostic tests, such as performing a Pap smear. Although important, correct diagnoses do not necessarily lead to proper treatment. Over time, some process measures can stifle innovation because they may focus attention on processes that quickly become obsolete or they become the maximum required, rather than the minimum standard that is acceptable. For example, a guideline for diabetes care that calls for an annual eye exam by an ophthalmologist could retard the use of digital computerized interpretation of fundal pictures because ophthalmologists are written into the process. These caveats should be considered when analyzing potential quality measures based on processes of care.
Outcome measures are used to examine the levels of health and disability in the population that are associated with the quality of health care delivery. Consumers often relate more to outcome than to process measures, although they are interested in both. Outcome measures are very important, but data are limited and they are often expensive to produce. Outcome measures should be used as quality measures when they refer to specific conditions or procedures so that the link between processes and outcomes can be established more clearly. In addition, they should be used when there are good statistical adjustment models, when stratification or other ways can be used to address population differences, when the time intervals between treatment and outcome are not too long, and when the events are not too rare. Unlike process and structural measures, outcome measures can foster innovation.
Structural measures reflect the organizational, technological, and human resources infrastructure of the system necessary for high-quality care (Donabedian, 1966). For example, the use of computerized order entry systems by hospitals is a structural aspect that may foster safety in prescription practices and reduce errors that could result in injury or death (Bates et al., 1998). However, the committee recommends measuring outcomes (in this case, adverse drug events and deaths due to prescribing errors) or processes (medication error rates) rather than structure because of the weak links between structure and processes and between structure and outcomes of care (Evans et al., 1998). Using structural measures of technology could stifle innovation by “locking in” any structures sanctioned for use in the National Health Care Quality Report.
Recent research has attempted to relate the volume of certain procedures— another structural measure—to outcomes. In a review of the literature on the subject, the authors concluded that “there can be little doubt that for a wide variety of medical conditions and surgical procedures, patients treated at higher volume hospitals or by higher volume physicians experience on average lower mortality rates than those treated by low-volume hospitals and physicians” (Halm et al., 2000:31). However, others have pointed out that at least at this time, volume cannot be used as an independent measure of quality of care because it appears to be a proxy for more direct indicators of quality of care, such as the use of appropriate procedures, that require further study to be defined and should be used instead (Institute of Medicine, 2000).
Given the limitations of structural measures, the committee recommends that when available, outcome-validated process measures and disease- or process-specific outcome measures be used rather than structural measures. Ultimately, the number and type of measures will depend partly on the aspect of quality of care being examined and the information available. Regardless, process measures (and any structural ones) will have to be revised as medical technology advances and practice guidelines are updated.
BOX 3.3 Toward an Ideal Measure Set
This chapter has set forth the committee's ten criteria for the selection of measures for the National Health Care Quality Data Set. Some of the criteria apply to the area of measurement and others to the measures themselves. The chapter also describes selection criteria for the measure set and the steps in the process of measure selection. These are criteria for the ideal measures and measure set. The committee is aware that many existing measures will not fulfill them, but this is what AHRQ should strive to achieve. As the examples proposed by the committee in Chapter 2 indicate, new measures of both processes and outcomes of care will have to be developed, and others will have to be improved, in order to assess quality in all of the dimensions proposed in the framework. For example, new measures will be necessary to assess the quality of care for children with special needs.
The chapter closes with a discussion of other measurement-related aspects that should be considered by AHRQ for the Quality Report. These include the role of an advisory body, revisions to the measure set, measuring quality comprehensively, the role of summary measures, and the use of measures of the structure, processes, and outcomes of care. These are all complex issues, and most do not have clear-cut answers
The advisory body will have a very important role in measure selection and revision and in any eventual reassessment of the framework. The composition and adequate functioning of the advisory body will be determining factors in the success of the Quality Report In the short term, a comprehensive approach to measurement as proposed by the committee will not be feasible, but study and further development of existing efforts in the area of effectiveness can make it a reality. AHRQ will have to weigh the advantages and disadvantages of using summary measures in the Quality Report. When these are used, they should refer
to the same subject, be based on the same metric, and increase understanding by the public. The committee recommends against the use of a single quality index or summary measure. At the risk of oversimplification, Box 3.3 lists factors that should be considered in moving toward an ideal measure set. This chapter opens with a set of recommendations that should facilitate reaching that ideal.
1998. Quality First: Better Health Care for All Americans. Washington, D.C.: U.S. Government Printing Office..
1997. Treatment and outcomes of acute myocardial infarction among patients of cardiologists and generalist physicians. Archives of Internal Medicine 157(22): 2570–2576., , , , and .
1998. Effect of a computerized physician order entry and a team intervention on prevention of serious medication errors. Journal of the American Medical Association 280(15): 1311–1316., , , , , , , , , , , , and .
1996. Why we need observational studies to evaluate the effectiveness of health care. British Medical Journal 312: 1215–1218..
1998. Few opportunities found for tuberculosis prevention among the urban poor. International Journal of Tuberculosis and Lung Disease 2(2): 124–129., , and .
1994. Clinical Practice Guideline Development: Methodology Perspectives. Rockville, Md.: U.S. Department of Health and Human Services, Agency for Health Care Policy and Research., . The RAND/UCLA Appropriateness Method.
1996. Quality of health care. Part 2: Measuring quality of care. New England Journal of Medicine 335(13): 966–969., , , and .
2000. Consumer Price Indexes [on-line]. Available at: http://stats.bls.gov/cpihome.htm [Dec. 5, 2000]..
1991. Reliability and Validity Assessment. Newbury Park, Calif.: Sage Publications., , and .
2000. Cochrane Reviewers' Handbook 4.1 [updated June 2000]. In: Review Manager (RevMan) [computer program]. Ver. 4.1. Oxford, England: The Cochrane Collaboration. Available at: http://www.cochrane.dk/cochrane/handbook/handbook.htm., and , , eds.
1999. Quality and Performance in the NHS: High Level Performance Indicators. London: NHS Executive. Available at: www.doh.gov.uk/nhshlpi.htm..
2000. NHS Performance Indicators. Leeds, England: NHS Executive. Available at: http://www.doh.gov.uk/nhsperformanceindicators..
1966. Evaluating the quality of medical care. Milbank Memorial Fund Quarterly 44: 166–203., .
1966. Evaluating the quality of medical care. Milbank Memorial Fund Quarterly 44: 166–203. , . 1982. The Criteria and Standards of Quality. Ann Arbor, Mich.: Health Administration Press.
1998. A computer-assisted management program for antibiotics and other antiinfective agents. New England Journal of Medicine 338(4): 232–238., , , , , , , , and .
1999. Improving the Nation's Health Care Quality: The President's Quality Commission Report, the Quality Forum and the Quality Interagency Task Force. U.S. Department of Health and Human Services. Unpublished., , , and .
1999a. Key Questions and Decision Making Criteria, Child and Adolescent Health Measurement Initiative. Living with Illness Task Force Meeting. Portland, Ore..
1999b. Sharing the Quality Message with Consumers , Portland, Ore.: FACCT..
1999c. FACCT/ONE [on-line]. Available at: http://www.facct.org/measures/Develop/FACCTONE.htm [Mar. 13, 2001]..
1993. The importance of co-existent disease in the occurrence of postoperative complications and one-year recovery in patients undergoing total hip replacement Comorbidity and outcomes after hip replacement. Medical Care 31(2): 141–154., , , , and .
In press . Evaluating practice differences in the delivery of diabetes care: Effects of specialty, patient characteristics and individual practice variation. Annals of Internal Medicine., , , , , and .
2000. Commissioned paper for the Institute of Medicine Committee on the Quality of Health Care in America and the National Cancer Policy Board. Unpublished., , , and .
1999. Vol.113, Sec. 1653. Statutes at Large. , 1998. Use of outcome data by purchasers and consumers: new strategies and new dilemmas. International Journal for Quality in Health Care 10(6): 503–508..
1999. Leading Health Indicators for Healthy People 2010. eds. Carole A. Chrvala and Roger J. Bulger. Washington, D.C.: National Academy Press..
2000. Interpreting the Volume–Outcome Relationship in the Context of Health Care Quality: Workshop Summary. ed. Maria Hewitt. Washington, D.C.: National Academy Press..
1999. Facts about ORYX: The Next Evolution in Accreditation [on-line]. Available at: http://www.jcaho.org/perfmeas/nextevol.html [Dec. 7, 2000]..
1995. Agendas, Alternatives, and Public Policies. New York: Harper Collins.,
2000. The National Quality Forum enters the game. International Journal for Quality in Health Care 12(2): 85–87.,
1995. The social construction of validity. Qualitative Inquiry 1(1): 19–40., . .
1999. Assessing “best evidence”: Issues in grading the quality of studies for systematic reviews. Joint Commission Journal on Quality Improvement 25(9): 470–479., , and .
2000. QA Tools and the National Quality Report. Presentation to the Institute of Medicine Committee on the National Quality Report on Health Care Delivery , Oakland, Calif., August 18.,
1998. Developing a clinical performance measure. American Journal of Preventive Medicine 14(3S): 14–21., , and .
1999. The National Quality Forum: A ‘me-too' or a breakthrough in quality measurement and reporting. Health Affairs 18(6): 233–237., , and .
2000. Measuring the performance of health systems. British Medical Journal 321: 191–192., , and .
2000. The Nation's Report Card: National Assessment of Educational Progress [on-line]. Available at: http://nces.ed.gov/ nationsreportcard/site/home.asp [Dec. 8, 2000]..
2000. Health, United States, 2000 with Adolescent Health Chartbook, Hyattsville, Md.: U.S. Government Printing Office..
2000. HEDIS 2001, Vol. 1. Washington, D.C.: NCQA..
2000a. Quality Compass 2000. Washington, D.C.: NCQA..
2000. National Quality Forum Mission [on-line]. Available at: http://www.qualityforum.org/mission/home.htm [Jul. 10, 2000]..
1999. Health Performance Measurement in the Public Sector: Principles and Policies for Implementing an Information Network. eds. Edward B. Perrin, , and . Washington, D.C.: National Academy Press..
1978. Psychometric Theory. 2nd ed. New York: McGraw-Hill.
1997. Quality of care. Journal of the American Medical Association 277(23): 1896–1897., .
2001. Research Services [on-line]. Available at: http://www.picker.org/Research/Default.htm [Mar. 13, 2001]..
2000. Should health plan quality measures be adjusted for case mix? Medical Care 38(10): 977–980.,
2001. The quality of health care in the United States: A review of articles since 1987. Crossing the Quality Chasm: A New Health System for the 21st Century, Appendix A. Washington, D.C.: National Academy Press., , , , , and .
1995. Performance Indicators: A Commentary from the Perspective of an Expanded View of Health. Washington, D.C.: Center for the Advancement of Health. Available at: http://www.cfah.org/website2/16.htm., .
1998. Issues Related to Patient Medical Record Information: Testimony to National Committee on Vital and Health Statistics, Subcommittee on Standards and Security [on-line]. Available at: http://www.mc.vanderbilt.edu/infocntr/stead/med-rec.html [Nov. 9, 2000].,
2000. Use of infertility services in the United States: 1995. Family Planning Perspectives 32(3): 132–137., and .
2000a. NCVHS Charter 1999 [on-line]. Available at: http://ncvhs.hhs.gov/99charter.htm [Dec. 6, 2000a]..
2000b. Healthy People 2010, Washington, D.C.: U.S. Government Printing Office..
2000. Strategies for improving comorbidity measures based on Medicare and Medicaid claims data. Journal of Clinical Epidemiology 53(6): 571–578., , , , and .
2000. Are increasing 5-year survival rates evidence of success against cancer? Journal of the American Medical Association 283(22): 2975–2978., , , and .
1973. Small area variations in health care delivery. Science 182(117): 1102–1108., , and .
2000. Impact of sociodemographic case mix on the HEDIS measures of health plan quality. Medical Care 38(10): 981–992., , , , , , , , , and .