10
Critical Attributes of Quality-of-Care Criteria and Standards

The rapid growth in health care expenditures has prompted third-party payers, both governmental and private, to institute programs that try to control costs by restraining the use of health care services. These programs range from direct efforts to identify and discourage specific unnecessary services (e.g., prior review of proposed care) to financial incentives for providers and consumers to reduce services (e.g., capitated payments to health care providers and cost-sharing by patients). These steps, if successful, can not only control costs but also improve the quality of care by reducing exposure to iatrogenic illness and injury. However, these programs could also over-reach to discourage the provision or use of needed services.

Good criteria for assessing quality of care and for distinguishing appropriate from inappropriate care can operate at the intersection between cost and quality concerns in two major ways. On the one hand, they can strengthen the clinical basis for prior review activities aimed at detecting and avoiding unnecessary care. On the other hand, they can help to identify or to prevent the underuse of care that might be an undesirable side effect of review programs, financial incentives, and other methods of controlling costs. It was against this background that Congress mandated the Medicare quality assurance study and specified as one task the “development of prototype criteria and standards” for defining and measuring quality of care.

Developing quality-of-care criteria is not a simple task, and the results are not uniformly helpful. Criteria sets vary considerably in their method of development and their substance, depending on the objectives, focus, skills, and experience of their creators. Even when criteria sets have a basic approach and specific application in common (as described in the next sections of this chapter), their formulations may differ substantially in scope,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I 10 Critical Attributes of Quality-of-Care Criteria and Standards The rapid growth in health care expenditures has prompted third-party payers, both governmental and private, to institute programs that try to control costs by restraining the use of health care services. These programs range from direct efforts to identify and discourage specific unnecessary services (e.g., prior review of proposed care) to financial incentives for providers and consumers to reduce services (e.g., capitated payments to health care providers and cost-sharing by patients). These steps, if successful, can not only control costs but also improve the quality of care by reducing exposure to iatrogenic illness and injury. However, these programs could also over-reach to discourage the provision or use of needed services. Good criteria for assessing quality of care and for distinguishing appropriate from inappropriate care can operate at the intersection between cost and quality concerns in two major ways. On the one hand, they can strengthen the clinical basis for prior review activities aimed at detecting and avoiding unnecessary care. On the other hand, they can help to identify or to prevent the underuse of care that might be an undesirable side effect of review programs, financial incentives, and other methods of controlling costs. It was against this background that Congress mandated the Medicare quality assurance study and specified as one task the “development of prototype criteria and standards” for defining and measuring quality of care. Developing quality-of-care criteria is not a simple task, and the results are not uniformly helpful. Criteria sets vary considerably in their method of development and their substance, depending on the objectives, focus, skills, and experience of their creators. Even when criteria sets have a basic approach and specific application in common (as described in the next sections of this chapter), their formulations may differ substantially in scope,

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I explicitness, flexibility, and scientific support. Not surprisingly, criteria sets vary in their utility and acceptability. A prerequisite for developing useful and acceptable quality-of-care criteria is a consensus on the characteristics of sound criteria sets and acceptable methods for constructing them. The immediate goal in this chapter is to propose a basis for such a consensus. The actual development and implementation of sound guidelines will require a commitment of considerable time, resources, and expertise over a period of years. The Institute of Medicine (IOM) study committee believed that the best way for it to move toward a framework for developing sound criteria was to convene a panel of respected experts in guideline formulation from various organizations active in this field. The main purpose of the panel was to reach agreement on the desirable attributes of quality-of-care criteria. These attributes would be standards against which old or newly developed criteria could be compared and evaluated. The panel’s focus was thus on the formulation of “criteria for judging criteria” rather than on the endorsement of specific sets of criteria. Appendix A describes the composition and activities of the panel. The remainder of this chapter discusses the conceptual issues presented to the panel. These included: three types of quality-of-care criteria sets, the range of attributes and characteristics that might be considered desirable or necessary for such criteria sets to have, the uses to which such criteria sets can be put in a quality assurance context (such as education or quality review), and key attributes for such criteria sets. Producing criteria sets that meet the standards proposed by the panel calls for a complex and sophisticated development strategy, or perhaps several strategies depending on the type of criteria set in question. Later sections of this chapter briefly discuss methods for developing criteria with particular emphasis on stages in the development process, priority-setting, and affordability. TYPES OF CRITERIA SETS Different kinds of criteria sets have evolved to meet different needs. The expert panel identified three broad types: appropriateness guidelines, patient care evaluation and management criteria, and case-finding screens. Each of these is discussed below. Appropriateness Guidelines Appropriateness guidelines describe accepted indications for using particular medical interventions and technologies, ranging from surgical procedures to diagnostic studies. Some guidelines specify under what circum-

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I stances a particular service is appropriate (indicated). For instance, one indication for colonoscopy might be lower gastrointestinal tract bleeding. Guidelines may also describe when an intervention is not indicated. One example might be performing a carotid endarterectomy on an asymptomatic patient when the carotid angiography shows stenosis of less than 50 percent (Merrick et al., 1986). Finally, guidelines may identify equivocal indications or areas of uncertainty where consideration might be given to complex or hard-to-enumerate patient factors or where different clinicians simply disagree. For example, the indications for an exercise test to detect coronary artery disease may be “equivocal” for asymptomatic male patients over age 40 in special occupations involving public transportation or safety, including pilots, railroad engineers, and police officers (American College of Cardiology, 1986a). Appropriateness is an integral part of quality health care (Brook, 1988b; Greenfield, 1988). In this context appropriateness generally means that the service in question has demonstrated clinical benefit for a particular indication and that the likelihood of benefits outweighs the likelihood of harm. Good quality care does not include surgery or other services that are technically flawless but not indicated or necessary. Thus, some experts have begun to explore whether economic costs ought to be factored into definitions of appropriateness, but there is no clear agreement on this point (Paterson, 1988; see also the discussion in Chapter 1 of this report). Many organizations formulate appropriateness guidelines. The National Institutes of Health (NIH) consensus conference represents one forum for guideline development (Kanouse et al., 1987; Kosecoff et al., 1987; PPRC, 1988a). Particularly active in recent years have been technology assessment committees of several medical specialty societies. The best known effort may be that of the Clinical Efficacy Assessment Project (CEAP) of the American College of Physicians (ACP) (Sox, 1987; Steinberg, 1988). In addition, endoscopy guidelines have been developed by the American Society for Gastrointestinal Endoscopy (ASGE, 1986) and guidelines for various cardiovascular procedures by the American College of Cardiology (ACC) and American Heart Association (AHA) Task Force (ACC, 1986a, 1986b).1 Third-party payers such as Blue Cross and Blue Shield Association (BCBSA) and Medicare also develop standards identifying appropriate indications for various medical procedures and technologies. Their primary purpose is to serve as bases for making coverage or utilization review decisions that are intended to control costs by reducing payments for inappropriate or unnecessary services. Examples of such criteria include the preadmission review criteria developed or adopted by the Medicare Peer Review Organizations (PROs) for specified procedures and the guidelines for appropriate use of selected medical technologies developed by the BCBSA

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I Medical Necessity Program (IOM, 1988; Schaffarzick, 1988). The BCBSA has also supported the CEAP work and has worked with different specialty and research groups to develop guidelines that are disseminated to hospitals and physicians for educational and quality assurance purposes (not payment decisions). Independent research organizations also have been involved in developing guidelines for the appropriate use of various medical technologies. One example is represented by The RAND Corporation’s appropriateness criteria for coronary angiography, coronary artery bypass surgery, carotid endarterectomy, cholecystectomy, and diagnostic upper gastrointestinal endoscopy and colonoscopy2 (Chassin et al., 1986a, 1986b; Kahn et al., 1986a, 1986b; Merrick, 1986; Park et al., 1986; Solomon et al., 1986; Chassin et al., 1987; Chassin, 1988; Winslow et al., 1988a, 1988b). Patient Care Evaluation and Patient Management Criteria Sets A second type of criteria set has evolved to help assess or guide the management of particular outpatient or inpatient medical problems rather than use of a specific service or technology.3 These criteria sets often involve medical conditions that are characterized by ill-defined symptom complexes or that require multiple discrete clinical decisions over time. For example, they may define the range of appropriate services and care for problems such as hypertension, right lower quadrant pain, or post-operative fever, or they may specify various screening and preventive services. A major challenge for evaluation and management criteria is variability in patients’ clinical status, sociodemographic characteristics, and treatment preferences. For example, the appropriate diagnostic work-up of right lower quadrant pain may differ for a young male with fever, a young woman with an intrauterine device, or a middle-aged woman with a history of irritable bowel syndrome. Similarly, the appropriate management strategy for an elderly person with Type II (adult onset) diabetes who has difficulty checking fingersticks (home tests for blood sugar levels) may differ from that for a more adept, medically sophisticated younger diabetic. Traditionally, quality assurance criteria developed for evaluating patient care have dealt with this complexity and variability by identifying the minimum process elements for managing a particular condition.4 Beyond this minimum, the criteria allow for substantial clinical judgment about patient management activities. This strategy reflects, in part, a dearth of clinical research that would permit greater specificity and, in part, a lack of resources that would allow the developers of the criteria to be more precise. Recently, computerized software systems have helped extend the use of patient management criteria by making it faster and simpler to match spe-

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I cific diagnostic or treatment steps to a variety of medical conditions. For example, the management of hypertension for a particular patient might be evaluated through on-line scoring of the patient’s medical record for compliance with criteria calling for documentation of a funduscopic examination of the eye, urinalysis, potassium measurement, dietary instruction, and medication for patients with diastolic blood pressure consistently over 100 mm Hg. A related approach is represented by detailed algorithms, decision trees, or criteria maps that more comprehensively specify the steps for managing a problem (Greenfield et al., 1975, 1977, 1981; Stulbarg et al., 1985). Patient variability is addressed by constructing a “network,” diagram, or flow chart that helps the practitioner choose which of several alternate pathways provides the best fit between treatment options and patient characteristics. These algorithms represent optimal rather than minimal standards. Compared to the latter, they tend to be more difficult to develop and validate, and consensus may be harder to achieve. Complex algorithms may be difficult for practicing physicians to understand or accept. Even when physicians do understand the algorithms, they may not find them practical in normal clinical or (especially) crisis situations or for routine quality-of-care evaluations. Case-Finding Screens Case-finding screens identify potential quality-of-care problems that warrant further evaluation. These screens are objective, easily used, and often related to outcomes such as surgical complications. They trigger more in-depth analysis and peer review to confirm the presence of the problem and to detect remediable defects in processes of care at a particular institution or by a particular provider. Their relative ease of application makes them appealing for monitoring the effects of changes in provider organizational features, process of care, or payment methods. One variety of case-finding screen is represented by hospital generic screens, sometimes called “occurrence screens.” These screens have traditionally focused on single, adverse, “sentinel” events, such as an unplanned return to an operating room (OTA, 1988). The PROs have used generic screens for several years (see Chapter 6). Their set includes occurrences or specific “flags” such as nosocomial infections, unexpected death, or a return to the intensive care unit (ICU) within 24 hours of discharge from the ICU. Hospital-wide process or outcome criteria are intended to be broadly applicable across clinical departments and specialties rather than specific to, say, a clinical department, the emergency room, or pediatric care. They have been adopted by the AHA as part of its “Integrated Quality Assur-

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I ance” (IQA) program (Longo et al., 1989), which in turn is based on a complex IQA model developed by the Hospital Association of New York State. Another variety of case-finding screen is represented by specialty-specific clinical indicators such as those being developed by the Joint Commission on Accreditation of Healthcare Organizations (Joint Commission) (Lehmann, 1989; Marder, 1989; Winchester, 1989). Like generic screens, these indicators can consist of sentinel events that trigger more in-depth review. Unlike generic screens, however, they are specific to a particular specialty, type of procedure, or clinical system for delivering care. One such sentinel event in obstetrics, for instance, is the delivery by planned cesarean section of an infant weighing less than 2500 grams or one with hyaline membrane disease. The indicator may be either an adverse outcome that is linked to a process under the practitioner’s or institution’s control or a process than has been clearly associated with an adverse outcome (Lehmann, 1989). More complicated, less easily applied versions of screens also exist. With “threshold” criteria, the trigger is not a specific event, but a rate of events above or below a defined level; for example, more than a 10-percent rate of appendectomies where the appendix is normal. Other screens involve failure to follow up abnormal results of laboratory tests or diagnostic studies (for example, positive blood cultures, suspicious shadows on radiographic films, or abnormal Papanicolaou [Pap] smears). Hospital admissions for conditions that could indicate poor ambulatory care are a newer focus. The 13 sentinel conditions discussed in Chapter 6 (for example, diabetic complications and malignant neoplasm of the genitourinary organ) in the Third Scope of Work for PROs constitute another example. Since the release of hospital mortality rates by the Health Care Financing Administration (HCFA) beginning in 1986, researchers and others have focused on using aggregate mortality rates or aggregate rates of other adverse occurrences to screen institutions or patient populations (with adjustments for severity of case mix) and flag possible institutional quality-of-care problems. This approach has generated a considerable literature in a comparatively short time (Dubois et al., 1987a, 1987b; Dubois, 1989; Daley et al., 1988; Jencks et al., 1988; Kahn et al., 1988; OTA, 1988; Chassin et al., 1989; Ente and Lloyd, 1989; Fink et al., 1989; Hannan et al., 1989) and is reviewed more thoroughly in Chapter 6, Volume II. Relationships Among Criteria Sets The above classifications do not imply that these groupings are mutually exclusive or in conflict. Criteria sets can be difficult to categorize, and it is probably not productive to draw distinctions too finely. The labels are less

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I important than the purposes for which criteria sets are used, individually or together. Case-finding screens, for instance, can be used in conjunction with either appropriateness guidelines or patient evaluation and management criteria. Screens are an initial, easily applied mechanism to locate cases for more detailed review. Items included in such screens could be selected from the more easily identified or discrete elements of a set of appropriateness or patient management guidelines. Cases failing the screen would then receive in-depth review against the more detailed guidelines, thus linking these different types of criteria sets in a review continuum. For example, one element of an evaluation-management criteria set for hypertension, such as documentation in the medical record of a funduscopic examination of the eye, might serve as a case-finding screen that nonphysician reviewers could apply. Regardless of the element or elements used as screens, the in-depth review might draw on the complete set of evaluation criteria. Traditionally, in-depth assessment has consisted of subjective “implicit” review by peer physician reviewers, but evaluation guidelines might well serve as an objective aid to their efforts. Traditional screens, whether based on sentinel adverse occurrences or elements of the process of care, have focused more on misuse of medical technology in the sense of poor technical quality than on problems of overuse or underuse of technology. Likewise, outcome data used to screen for statistical outliers are directed primarily at poor technical quality of services rather than at overuse or underuse of care. Screens adapted from appropriateness guidelines might complement existing screens by focusing on services performed for a clearly inappropriate indication. EFFECT OF THE TYPE AND USE OF CRITERIA ON SPECIFICATION OF DESIRABLE ATTRIBUTES All these criteria sets can be used in different contexts for different purposes. Three major purposes are to educate practitioners, to educate and empower consumers, and to establish minimum standards of care for use in quality-of-care review. Such reviews may be prospective, concurrent, or retrospective. Third-party payers and others hope that certain types of criteria, especially appropriateness guidelines, can be used to reduce the costs of medical care. This study, however, differentiates quality of care from cost containment. To the extent that using such criteria reduces overuse of care, costs may be lower. The application of criteria also may identify underuse of services, and this could increase expenditures, at least in the short term. This chapter and this report focus on the quality-of-care applications of criteria sets, not their uses for cost control.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I The desirable characteristics of a criteria set may vary somewhat according to its use. For example, guidelines used to educate health professionals almost surely need to be different from those used to review care. Complex, comprehensive algorithmic criteria useful for educational purposes might be difficult to apply in the emergency care of acutely ill patients (when speed and parsimony are important) or in retrospective review (where brevity is desirable).5 Criteria for retrospective review may need to be different in some respects from those used for concurrent or prospective review. The same may be true for criteria for internal versus external review. Greenfield (1989) discusses the differences between prospective and retrospective algorithms. Prospective algorithms are directive because care has not yet been rendered. They must be logically complete, include rare diagnoses and unlikely events, have a narrower range of options, and be independent of medical records. Retrospective algorithms, by contrast, review care already delivered. They tend to be used as screens for further review and thus do not need to be logically complete. They have a more extensive range of options to allow for variation in clinical practice, and they depend on information documented in the medical record. The features of a criteria set may also differ by level of review, such as whether they are to be used for the initial screening by nonphysician reviewers or for in-depth physician review. If a criteria set is intended to support making judgments about individuals, individual cases, or individual episodes of care, then several attributes such as sensitivity, specificity, reliability, and validity are much more important than if the criteria are simply going to physicians or to patients for educational purposes. The desirable attributes of a criteria set will also vary according to the type of criteria set. For example, whether they are manageable for nonphysician reviewers or whether they are easy to adapt for use by computer may be especially important for case-finding screens. By contrast, whether criteria sets have built-in flexibility or are demonstrably acceptable to professionals may be more important for technology-specific or patient management guidelines. GENERAL ATTRIBUTES OF CRITERIA SETS As a starting point for discussing desirable attributes of criteria sets, the IOM staff prepared an extensive list of possible attributes based on review of the limited literature on guideline development and technology assessment (Eddy, 1987, 1988, forthcoming; Brook, 1988a, 1988b; Greenfield, 1988; Lewin and Erickson, 1988; PPRC, 1988a, 1988b, 1989; Brook et al., 1989). The final list of general attributes as modified by the expert panel appears in Table 10.1. This section defines the basic concepts behind the short labels for attributes that are used to simplify discussion.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I Attributes can usefully be divided into two basic categories: substantive (or structural) attributes and implementation (or process) attributes. Substantive attributes relate to inherent characteristics of a criteria set. Implementation attributes focus on the processes of developing and applying a criteria set. Substantive Attributes In the category of substantive attributes are concepts such as sensitivity, specificity, and predictive value. Sensitivity refers technically to the likelihood that a case will be identified as deficient given that it really is deficient, where deficient care is measured by some outside “gold” standard that reviews all care provided. Specificity refers to the likelihood that truly good care will be identified. The term predictive value is defined as the proportion of cases identified by screens or other criteria as presenting quality problems that subsequently prove to be true quality problems. It takes into account the prevalence of the quality problem being investigated as well as the screen’s sensitivity.6 The traditional computational definitions are shown Figure 10.1. These terms have generally been used in the context of case-finding screens to measure how frequently the screen detects cases of deficient care for further review (sensitivity) while passing over cases of adequate care without triggering review (specificity). A screen or criterion has poor specificity if it flags a lot of cases for review when the care was satisfactory. This wastes time and money and leads to considerable frustration on the part of reviewers. Conversely, a screen or criterion has low sensitivity if it misses a lot of cases where care was poor. This means it is ineffective for its intended purpose. (Both these criticisms have been leveled at the case-finding generic screens used by the Medicare PROs; see Chapter 6.) Sensitivity and specificity are also important attributes for technology-specific or evaluation-management guidelines. Indeed, with some modifications, these concepts can be applied to all three types of criteria sets. The sensitivity of technology-specific and patient management guidelines refers to their ability to detect and deal with all potential cases of inappropriate or deficient care. Their application should lead to the identification of most cases of inappropriate or poor quality care with high sensitivity. For instance, in retrospective review of care of patients with chest pain, sensitivity refers to the likelihood that the quality measure correctly identifies deficient care if the physician does not follow an indicated step in the guidelines, such as admitting a patient in cardiac shock. Reliability requires that a criteria set be appropriate, and generate consistent results for all user groups for which it is intended and that it do so time and time again. Reliable criteria relating to, for instance, a cardiovascular procedure or problem must produce the same decisions or evaluation

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I TABLE 10.1 General Attributes of Criteria Sets: Final List Attribute Definition or Explanation Substantive and Structural Attributes Sensitivity High “true positive rate” in detecting deficient or inappropriate care Specificity High “true negative rate” in passing over cases of adequate care Reliability Known to produce same decisions or evaluations when applied by the user groups for which the criteria set is intended Validity Based on outcome studies or other scientific evidence of effectiveness Documentation A. Documents methods of development and cites literature (including estimates of outcomes) B. Documents how reliability was established Patient Responsiveness Allows for eliciting or taking account of patient preferences Flexibility Respects the role of clinical judgment, with “clinical judgment” explicable Clinical Adaptability Allows for or takes into consideration clinically relevant differences among different classes of patients; population to which criteria apply is specified Inclusiveness Covers all major foreseeable clinical situations and full range of clinical problems Concordance Reflects consensus of professionals with extensive experience in field, with input from academic and nonacademic practitioners, generalists and specialists Acceptability Acceptable to majority of professionals Clarity Written in unambiguous language; terms, populations, data elements, and collection approach clearly defined Appropriateness Specifies appropriate, inappropriate, and equivocal indications (procedure and technology appropriateness guidelines) Implementation and Process Attributes Pretesting Guidelines are tested before implementation Dynamism Mechanism and commitment exists for reviewing and updating criteria sets to incorporate new information and cover new situations Evaluation Mechanism exists to review and evaluate outcome or impact of guidelines Comprehendability A. Format understood by nonphysician reviewers B. Format understood by practitioners C. Format easily understood by patients/consumers

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I Attribute Definition or Explanation Manageability A. Not unduly burdensome for nonphysician reviewers to apply B. Not unduly burdensome for physician reviewers to apply C. Not unduly burdensome for professional to follow Nonintrusiveness Minimizes inappropriate direct interaction with treating physicians Appealability Allows for appeals process by professionals and patients Feasibility Ease of obtaining information Computerization Has been or could easily be computerized Executability A. Includes instructions for implementation B. Includes instructions for scoring and quantification   Standard Screen or Criterion Poor Care Good Care Poor Care a True Positive b False Positive Good Care c Flase Negative d True Nagative NOTE: Sensitivity = True Positive/(True Positive + Flase Negative), or a/(a + c) Specificity = True Nagative/(Flase Positive + True Negative), or d/(b + d) Predictive value (positive) = True Positive/(True Positive + Flase Positive), or a/(a + b) Predictive value (nagative) = True Nagative/(Flase Nagative + True Nagative), or d/(c + d) FIGURE 10.1 Computational Definitions of Sensitivity, Specificity, and Predictive Value

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I Eddy, D.H. Methods for Designing Guidelines. Paper prepared for the Physician Payment Review Commission. Durham, N.C.: Duke University, 1988. Eddy, D.H. A Manual for Assessing Health Practices and Designing Practice Policies. (In collaboration with the Council of Medical Specialty Societies Task Force on Practice Policies.) Forthcoming. Ente, B.H. and Lloyd, J.S. Taking Stock of Mortality Data. An Agenda for the Future . Proceedings of a 1988 Conference. Chicago, Ill.: Joint Commission on Accreditation of Healthcare Organizations, 1989. Fink, A., Yano, E.M., and Brook, R.M. The Condition of the Literature on Differences in Hospital Mortality. Medical Care 27:315–336, 1989. Greenfield, S. The Challenges and Opportunities that Quality Assurance Raises for Technology Assessment. Pp. 134–141 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988. Greenfield, S. Measuring the Quality of Office Practice. Pp. 183–200 in Providing Quality Care: The Challenge to Clinicians. Goldfield, N. and Nash, D.B., eds. Philadelphia, Pa.: American College of Physicians, 1989. Greenfield, S., Cretin, S., Worthman, L.G., et al. Comparison of a Criteria Map to a Criteria List in Quality-of-Care Assessment for Patients with Chest Pain: The Relation of Each to Outcome. Medical Care 19:255–272, 1981. Greenfield, S., Lewis, C.E., Kaplan, S., et al. Peer Review by Criteria Mapping: Criteria for Diabetes Mellitus. Annals of Internal Medicine 83:761–770, 1975. Greenfield, S., Nadler, M.A., Morgan, M.T., et al. The Clinical Investigation and Management of Chest Pain in an Emergency Department: Quality Assessment by Criteria Mapping. Medical Care 12:807–904, 1977. Hannan, E.L., Bernard, H.R., O’Donnel, J.F., et al. A Methodology for Targeting Hospital Cases for Quality of Care Record Reviews. American Journal of Public Health 79:430–436, 1989. HCHP (Harvard Community Health Plan). Criteria for Critiquing a Clinical Algorithm. Unpublished mimeo. Boston, Mass.: HCHP, 1986. IOM (Institute of Medicine). Medical Technology Assessment Directory. Washington, D.C.: National Academy Press, 1988. Jencks, S.F., Daley, J., Draper, D., et al. Interpreting Hospital Mortality Data. The Role of Clinical Risk Adjustment. Journal of the American Medical Association 260:3611–3616, 1988. Kahn, K.L., Brook, R.H., and Draper, D. Interpreting Hospital Mortality Data. How Can We Proceed? Journal of the American Medical Association 260:3625–3628, 1988. Kahn, K.L., Roth, C.P., Kosecoff, J., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Diagnostic Upper Gastrointestinal Endoscopy. R-3204/4-CWF/HF/HCFA/ PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986a. Kahn, K.L., Roth, C.P., Fink, A., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Colonoscopy. R-3204/5-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986b.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I Kanouse, D.E., Brook, R.H., Winkler, J.D., et al. Changing Medical Practice Through Technology Assessment: An Evaluation of the NIH Consensus Development Program. R-3452-NIH. Santa Monica, Calif.: The RAND Corporation, 1987. Kosecoff, J., Kanouse, D.E., Rogers, W.H., et al. Effects of the National Institutes of Health Consensus Development Program on Physician Practice. Journal of the American Medical Association 258:2708–2713, 1987. Lehmann, R. Joint Commission Forum: Forum on Clinical Indicator Development: A Discussion of the Use and Development of Indicators. Quality Review Bulletin 15:223–227, 1989. Lewin, L.S. and Erickson, J.E. Leadership in the Development of Practice Guidelines: The Role of the Federal Government and Others. Paper prepared for the Physician Payment Review Commission. Washington, D.C.: LEWIN/ICF, October 1988. Lohr, K.N. Quality of Care for Respiratory Illness in Disadvantaged Populations. P-6570. Santa Monica, Calif.: The RAND Corporation, 1980a. Lohr, K.N. Quality of Care in the New Mexico Medicaid Program (1971–1975). Medical Care 18:1–129 (January Supplement), 1980b. Longo, D.R., Ciccone, K.R., and Lord, J.T. Integrated Quality Assessment. A Model for Concurrent Review. Chicago, Ill.: American Hospital Association, 1989. Marder, R.J. Joint Commission Plans for Clinical Indicator Development for Oncology. Cancer 64:310–313, 1989 (Supplement). Merrick, N.J., Fink, A., Brook, R.H., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Carotid Endarterectomy. R-3204/6-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986. OTA (Office of Technology Assessment). The Quality of Medical Care: Information for Consumers. Chapter 5: Adverse Events. Washington, D.C.: U.S. Government Printing Office, 1988. Palmer, R.H., Louis, T.A., Thompson, M.A., et al. Final Report of the Ambulatory Care Medical Audit Demonstration Project (ACMAD). Boston, Mass.: Harvard Community Health Plan and Harvard University, March 1984. Palmer, R.H. The Challenges and Prospects for Quality Assessment and Assurance in Ambulatory Care. Inquiry 25:119–131, 1988. Park, R.E., Fink, A., Brook, R.H., et al. Physician Ratings of Appropriate Indications for Six Medical and Surgical Procedures. R-3280-CWF/HF/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986. Paterson, M.L. The Challenge to Technology Assessment: An Industry Viewpoint. Pp. 106–125 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988. PPRC (Physician Payment Review Commission). Improving the Quality of Care: Clinical Research and Practice Guidelines. Draft Background Paper for Conference. Washington, D.C.: Physician Payment Review Commission, September 28, 1988a. PPRC. Chapter 13. Increasing Appropriate Use of Services: Practice Guidelines and Feedback of Practice Patterns. Annual Report to Congress. Washington, D.C.: Physician Payment Review Commission, March 1988b.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I PPRC. Chapter 12. Effectiveness Research and Practice Guidelines. Annual Report to Congress. Washington, D.C.: Physician Payment Review Commission, April 1989. RTI (Research Triangle Institute). Nationwide Evaluation ofMedicaid Competition Demonstrations. Final Report. Research Triangle Park, N.C.: RTI, 1988. Schaffarzick, R.W. Technology Assessment: Perspective of a Third-Party Payer. Pp. 98–105 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988. Solomon, D.H., Brook, R.H., Fink, A., et al. Indications for Selecting Medical and Surgical Procedures—A Literature Review and Ratings of Appropriateness: Cholecystectomy. R-3204/3-CWF/HF/HCFA/PMT/RWJ. Santa Monica, Calif.: The RAND Corporation, 1986. Sox, H.C., Jr., ed. Common Diagnostic Tests. Use and Interpretation. Philadelphia, Pa.: American College of Physicians, 1987. Steinberg, E.P. Technology Assessment: A Physician Perspective. Pp. 79–88 in Quality of Care and Technology Assessment. Lohr, K.N. and Rettig, R.A., eds. Washington, D.C.: National Academy Press, 1988. Stulbarg, M.S., Gerbert, B., Kemeny, M.E., et al. Outpatient Treatment of Chronic Obstructive Pulmonary Disease—A Practitioner’s Guide. Western Journal of Medicine 142:842–846, 1985. Winchester, D.P. Assuring Quality Cancer Care in an Evolving Health Care Delivery System. CA—A Cancer Journal for Clinicians 39:201–205, 1989. Winslow, C.M., Kosecoff, J.B., Chassin, M., et al. The Appropriateness of Performing Coronary Artery Bypass Surgery. Journal of the American Medical Association 260:505–509, 1988a. Winslow, C.M., Solomon, D.H., Chassin, M.R., et al. The Appropriateness of Carotid Endarterectomy. New England Journal of Medicine 318:721–727, 1988b. APPENDIX A CRITERIA-SETTING EXPERT PANEL ACTIVITY The Institute of Medicine (IOM) study committee and staff determined that conducting an expert panel activity could be the best way to discharge the study’s congressional request to develop prototype criteria and standards. The panel members are listed in Table A.1. The remainder of this Appendix describes this activity, which included a literature review, a homework exercise for the panelists, a two-day meeting in June 1989, and staff analysis of all products of these steps; it also provides more details about the results of the homework task and meeting discussions. Homework Exercise As a starting point for discussion of desirable attributes of different types of criteria sets, the IOM staff reviewed the existing literature on guideline development (see Chapter 10 reference list); on the basis of that review, the staff prepared an extensive list of possible general attributes. Three catego-

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I TABLE A.1 Criteria-Setting Expert Panel for Study to Design a Strategy for Quality Review and Assurance in Medicare William A. Causey, M.D., F.A.C.P. Jackson Medical Association Jackson, Mississippi (Representing American College of Physicians) Mark R.Chassin, M.D. Value Health Sciences, Inc. Santa Monica, California Arthur J.Donovan, M.D., F.A.C.S. University of Southern California Los Angeles, California (Representing American College of Surgeons) Leonard S.Dreifus, M.D., F.A.C.C. Lankenau Hospital Philadelphia, Pennsylvania (Representing American College of Cardiology) David M.Eddy, M.D., Ph.D.a Duke University Durham, North Carolina and Jackson, Wyoming Lesley Fishelman, M.D. Harvard Community Health Plan Boston, Massachusetts Sheldon Greenfield, M.D. New England Medical Center and Tufts University School of Medicine Boston, Massachusetts Robert J.Marder, M.D. Joint Commission on Accreditation of Healthcare Organizations Chicago, Illinois Jane L.Neumann, M.D. Wisconsin Peer Review Organization and Waukesha Hospital Waukesha, Wisconsin Bruce Perry, M.D., M.P.H. Group Health Cooperative of Puget Sound Seattle, Washington Ralph W.Schaffarzick, M.D. Center for Quality Health Care of the Blue Cross and Blue Shield Association Auburn, California aWas unable to attend meeting

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I ries of criteria were identified for special attention: (1) procedure- and technology-specific appropriateness guidelines, (2) criteria for evaluation of patient care and patient management, and (3) case-finding screens. This list was incorporated into a homework exercise questionnaire, Possible General Attributes of Criteria Sets, which was mailed to panel members for response before the panel meeting. Panel members were requested to rate the listed attributes on a scale of 1 to 5, with 1 signifying not important attributes and 5 as very important attributes. Space was provided at the end of the questionnaires for respondents to suggest additional attributes or modification of listed attributes and to make any other comments. To help determine in what ways the attributes and their ratings might differ for the three types of criteria sets, the staff provided the questionnaire in triplicate. The results of the first round of the homework exercise (done at home) are given in Table A.2. As reflected in the large number of 4 or 5 ratings and absence of very low ratings, the panel considered all of the listed attributes important in varying degrees. To obtain more spread in subsequent ratings, we revised the 1 to 5 scale for the second round of balloting (at the meeting) to read least important (1) and most important (5). Several attributes were rated of less importance for all types of criteria sets; these included simplicity from the patient standpoint and generalizability-compatibility with existing quality assurance approaches (with suggestions that the latter attribute be deleted from the list). Several attributes were rated as more important for some types of criteria sets than others. For example, ease of computerization, feasibility (ease of obtaining data), and reviewer manageability were rated as more important for case-finding screens than for appropriateness guidelines and evaluation-management criteria. The homework exercise suggested that various modifications in the list of proposed attributes and their definitions warranted further consideration at the expert panel meeting. It also identified several important underlying issues, such as the impact of differences in use on the definition and ratings of attributes of criteria sets. These issues were introduced at the meeting (and were discussed in Chapter 10). Meeting The expert panel meeting opened with a general discussion of attributes of criteria sets (Table A.3). The panel first discussed some fundamental issues raised by the homework exercise regarding standards for judging criteria sets, in particular the impact of use or purpose when considering desirable attributes of criteria sets. It then returned to the original list and proposed many modifications and clarifications for the items on it. These

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I are embodied in the final list of general attributes of criteria sets in Table 10.1 of Chapter 10. In subsequent sessions, the panel considered the proposed general attributes in the context of each of the three major types of criteria sets. Specific examples of each type of criteria set were used as a mechanism for examining the proposed attributes more closely; the advantages and limitations identified for these illustrative criteria sets helped the process of defining important attributes. In each session, the attributes for the particular type of criteria set under consideration were reformulated and re-rated (Table A.4). The highest-rated attributes (separately, for substantive and for implementation attributes) for each type of criteria set were extracted from these data and summarized in Table 10.4A of Chapter 10; the selection criterion was a mean score on the second round of voting of 4.5 or greater. Table 10.4B shows those attributes that were given an average rating of at least 4.0 but less than 4.5. In the final session, methods and strategies for guideline formulation were discussed.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I TABLE A.2 Expert Panel Homework Exercise: First Round Ratings of Attributes   Appropriateness Evaluation/Management Case-Finding Attributes na Meanb SDc n Mean SD n Mean SD Sensitivity 9 4.9 .33 8 4.5 .53 10 4.7 .67 Specificity 9 4.7 .71 8 4.6 .52 10 3.9 1.10 Reliability 10 4.4 .84 10 4.5 .71 10 4.6 .52 Validity 10 4.7 .48 10 4.8 .42 10 4.4 .70 Dynamism 10 4.5 .53 10 4.3 .48 10 4.0 .67 Flexibility 10 4.4 .70 10 4.3 .82 10 3.8 1.14 Clinical Adaptability 10 4.4 .70 10 4.5 .71 10 4.0 1.05 Responsiveness 10 3.4 1.17 10 3.6 1.17 9 2.9 1.27 Inclusiveness 9 3.3 1.10 9 2.9 1.17 10 2.7 .95 Concordance 10 4.0 .94 10 4.1 1.10 10 3.9 .88 Acceptability 10 4.0 1.05 10 4.1 .99 10 4.0 1.05 Clarity 10 4.6 .52 10 4.6 .52 10 4.5 .71 Simplicity (non-MDs) 10 4.4 .70 10 4.1 .99 10 4.3 .82 Simplicity (MDs) 10 4.5 .71 10 4.6 .52 10 3.9 .99 Simplicity (patients, consumers) 10 3.5 1.27 10 2.9 .88 10 2.7 1.25

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I Manageability (reviewers) 10 4.4 .70 9 3.7 1.22 10 4.2 1.03 Manageability (professionals) 10 4.6 .52 9 4.6 .53 10 4.3 .82 Feasibility 9 4.2 .67 8 4.1 .64 10 4.3 .67 Computerization 10 4.0 1.05 9 3.7 1.00 10 4.4 .84 Priority (hi-risk, hi-cost) 10 4.8 .42 9 4.6 .73 10 4.4 .70 Priority (consensus exists) 10 3.8 .63 10 4.1 .88 10 4.1 .74 Generalizability 10 3.6 1.07 10 3.1 1.20 10 3.4 1.17 Affordability 10 4.2 .63 10 4.1 .99 10 4.2 .63 Appealability 10 4.5 .53 10 4.0 1.33 10 4.0 1.63 Documentation (outcome estimates) 10 4.0 .94 9 3.8 .83 9 3.8 .83 Documentation (methods) 10 4.1 .88 10 4.2 .92 9 3.9 .93 Executability 10 4.2 .63 9 4.4 .88 10 4.4 .70 an is the number of respondents. There were 10 respondents to the homework exercises. Where n<10, the attribute was rated not applicable. by one or more respondents. One non-response (to inclusiveness on the evaluation/management criteria homework) was also coded as not applicable. bMean is the mean rating or score among those rating the attributes. cSD is the standard deviation of the mean.

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I TABLE A.3 Activities and Discussion Topics for Criteria-Setting Expert Panel General Discussion of Proposed Attributes for Criteria Sets   Presentation: Results of homework exercise Discussion:     Extent to which attributes of criteria sets might differ according to their use or purpose Definitions of listed attributes and of any additional attributes Application of Attributes to Three Types of Criteria Sets   Discussion:     Proposed attributes for each type of criteria set   Examine illustrative criteria sets:     Technology- or procedure-specific appropriateness guidelines       (a) American Society of Gastroenterology’s upper endoscopy guidelines (b) Pre-procedure criteria for carotid endarterectomy from Delmarva PRO,a New York PRO, and St. Luke’s Hospital, Houston     Criteria for evaluation and management of problems and conditions       (a) UCLA/McCoy versus Medicaid hypertension management review criteria (b) Stulberg chronic obstructive pulmonary disease management algorithm     Case-finding screens       (a) Ear, nose and throat screening criteria from Medical Management Analysis (b) Hospital Association of New York State hospital-wide indicators   Reformulate attributes for each type of criteria set General attributes: revisited Discussion of Methods for Criteria Development   Literature review, expert panel, and other approaches Stages of guideline development process and the appropriate forum for each stage Differences in methodology for formulation according to type and purpose of criteria set aUtilization and Quality Control Peer Review Organization (PRO).

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I TABLE A.4 Expert Panel Homework Exercise: Second Round Ratings of Attributes   Appropriateness Evaluation/Management Case-Finding Attributes na Meanb SDc n Mean SD n Mean SD Sensitivity 10 4.5 .53 9 4.4 .73 10 4.9 .32 Specificity 10 4.0 .67 9 3.7 .83 10 3.0 .94 Predictive Value 10 3.7 .82 9 3.8 1.17 10 3.9 .88 Reliability 10 4.2 .79 10 4.5 .71 9 4.1 .78 Validity 10 4.6 .84 10 4.5 .71 9 4.0 .87 Documentation-Ad 10 4.6 .70 10 4.3 .67 8 3.9 .83 Documentation-Be 10 4.2 .92 10 3.9 .99 9 3.9 .78 Flexibility 9 4.3 1.12 10 4.8 .42 7 4.1 .90 Clinical Adaptability 10 4.4 .84 10 4.8 .42 7 4.3 .95 Responsiveness 10 2.7 .95 10 2.9 .88 6 2.7 1.03 Inclusiveness 10 2.8 1.55 10 3.4 1.17 9 4.2 .97 Acceptability 10 2.8 1.40 10 3.6 .97 8 2.9 .99 Clarity 10 4.5 .71 10 4.8 .42 9 4.8 .44 Appropriateness 10 4.2 .63 9 4.0 1.12 4 2.6 1.71 Pretesting 10 4.2 .92 10 4.0 1.15 9 4.2 1.09 Dynamism 10 4.4 .70 10 4.6 .70 10 4.4 .70 Evaluation 10 4.1 .74 9 4.3 .5 10 4.5 .53 Comprehendability (non-MD) 10 4.3 1.25 10 4.2 .63 10 4.6 .52 Comprehendability (MD) 10 4.3 .82 10 4.3 .48 10 4.2 .63

OCR for page 303
Medicare: A Strategy for Quality Assurance - Volume I   Appropriateness Evaluation/Management Case-Finding Attributes na Meanb SDc n Mean SD n Mean SD Comprehendability-Patient 10 3.2 1.48 10 2.9 1.20 9 3.0 1.41 Manageability (non-MD) 10 3.7 1.06 10 3.6 .84 10 4.0 .67 Manageability (MD) 10 3.8 1.03 10 3.3 .95 10 4.2 .79 Manageability (Professional) 10 3.7 .82 10 3.6 .84 10 3.8 1.23 Nonintrusive 10 3.3 1.16 10 3.7 .82 10 3.8 1.03 Appealability 10 4.9 .32 10 4.4 .84 9 4.8 .44 Feasibility 10 3.4 1.07 10 4.0 1.05 10 4.1 .99 Computerization 10 3.3 1.16 10 3.8 .92 10 3.5 .85 Executability 10 3.8 .79 10 4.5 .53 10 4.0 .94 Concordance 10 4.3 1.06 10 4.5 .71 8 4.1 .83 Prioritization (high) 10 3.4 1.17 10 4.0 .67 9 4.1 1.05 Prioritization (consensus) 10 3.4 1.26 10 3.6 1.26 8 3.8 1.04 Affordability 10 2.8 1.14 10 2.8 .79 9 3.0 1.00 an is the number of respondents. There were 10 respondents to the second round of rating attributes. Where n<10, the attribute was rated not applicable by one or more respondents. bMean is the mean rating or score among those rating the attributes. cSD is the standard deviation of the mean. dDocuments methods of development and cites literature. eDocuments how reliability was established.