2
Improving Interpretive Performance in Mammography

Breast cancer is a significant cause of morbidity and mortality in the United States. Until it can be prevented, the best approach to the control of breast cancer includes mammography screening for early detection. Mammography, however, is not a perfect test, due to the complex architecture of the breast tissue being imaged, the variability of the cancers that may be present, and the technical limitations of the equipment and processing. The technical aspects of mammography are now less variable since the interim Mammography Quality Standards Act (MQSA) regulations went into effect in 1994. At this point, the focus is shifting to the quality of mammography interpretation. The available evidence indicates that interpretive performance is quite variable, but the ambiguities of human decision making, the complexities of clinical practice settings, and the rare occurrence of cancer make measurement, evaluation, and improvement of mammography interpretation a much more difficult task.

The components of current MQSA regulations pertinent to interpretive performance include: (1) medical audit; (2) requirements related to training, including initial training and Continuing Medical Education (CME), and (3) interpretive volume, including initial and continuing experience (minimum of 960 mammograms/2 years for continuing experience). The purpose of this chapter is to explore current evidence on factors that affect the interpretive quality of mammography and to recommend ways to improve and ensure the quality of mammography interpretation. The primary questions that the Committee identified as currently relevant to interpretive performance include whether the current audit procedures are likely to ensure or improve the quality of interpretive performance, and whether any audit procedures applied to the current delivery of U.S. health care will allow for accurate and meaningful estimates of performance. In addition, the Committee questioned whether the current CME and volume requirements enhance performance. These issues will be described fully and the current state of research on these topics will be described in the sections that follow. The current state of knowledge about existing measures and standards is described first in order to define the terms needed to assess the medical audit requirement of MQSA.

CURRENT STATE OF KNOWLEDGE REGARDING APPROPRIATE STANDARDS OR MEASURES

Effectively measuring and analyzing interpretive performance in practice presents many challenges. For example, data must be gathered regarding whether a woman has breast cancer diagnosed within a specified timeframe after a mammogram and whether the finding(s) corresponds with the location in which the cancer is found. Other challenges include reaching agreement regarding the definition of positive and negative interpretation(s), standardizing the patient populations so that comparisons are meaningful, and deciding which measures are the most important reflection of an interpreting



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 24
Improving Breast Imaging Quality Standards 2 Improving Interpretive Performance in Mammography Breast cancer is a significant cause of morbidity and mortality in the United States. Until it can be prevented, the best approach to the control of breast cancer includes mammography screening for early detection. Mammography, however, is not a perfect test, due to the complex architecture of the breast tissue being imaged, the variability of the cancers that may be present, and the technical limitations of the equipment and processing. The technical aspects of mammography are now less variable since the interim Mammography Quality Standards Act (MQSA) regulations went into effect in 1994. At this point, the focus is shifting to the quality of mammography interpretation. The available evidence indicates that interpretive performance is quite variable, but the ambiguities of human decision making, the complexities of clinical practice settings, and the rare occurrence of cancer make measurement, evaluation, and improvement of mammography interpretation a much more difficult task. The components of current MQSA regulations pertinent to interpretive performance include: (1) medical audit; (2) requirements related to training, including initial training and Continuing Medical Education (CME), and (3) interpretive volume, including initial and continuing experience (minimum of 960 mammograms/2 years for continuing experience). The purpose of this chapter is to explore current evidence on factors that affect the interpretive quality of mammography and to recommend ways to improve and ensure the quality of mammography interpretation. The primary questions that the Committee identified as currently relevant to interpretive performance include whether the current audit procedures are likely to ensure or improve the quality of interpretive performance, and whether any audit procedures applied to the current delivery of U.S. health care will allow for accurate and meaningful estimates of performance. In addition, the Committee questioned whether the current CME and volume requirements enhance performance. These issues will be described fully and the current state of research on these topics will be described in the sections that follow. The current state of knowledge about existing measures and standards is described first in order to define the terms needed to assess the medical audit requirement of MQSA. CURRENT STATE OF KNOWLEDGE REGARDING APPROPRIATE STANDARDS OR MEASURES Effectively measuring and analyzing interpretive performance in practice presents many challenges. For example, data must be gathered regarding whether a woman has breast cancer diagnosed within a specified timeframe after a mammogram and whether the finding(s) corresponds with the location in which the cancer is found. Other challenges include reaching agreement regarding the definition of positive and negative interpretation(s), standardizing the patient populations so that comparisons are meaningful, and deciding which measures are the most important reflection of an interpreting

OCR for page 24
Improving Breast Imaging Quality Standards TABLE 2–1 Terms Used to Define Test Positivity/Negativity in BI-RADS 1st and 4th Editions ACR Category BI-RADS Assessment 1st Edition 4th Edition 0 Need additional imaging Need additional imaging evaluation and/or prior mammograms for comparison 1 Negative Negative 2 Benign finding Benign finding(s) 3 Probably benign Probably benign finding—short-interval follow-up suggested 4 Suspicious abnormality Suspicious abnormality—biopsy should be considered (4a, 4b, 4c may be included to reflect increasing suspicion) 5 Highly suggestive of malignancy Highly suggestive of malignancy—appropriate action should be taken 6 NA Known, biopsy-proven malignancy—appropriate action should be taken   SOURCE: American College of Radiology (2003). physician’s skill. In this section, current well-established performance measures are reviewed and their strengths and weaknesses are discussed. These measures should be made separately for screening examinations (done for asymptomatic women) and diagnostic examinations (done for women with breast symptoms or prior abnormal screening mammograms) because of the inherent differences in these two populations and the pretest probability of disease (Dee and Sickles, 2001; American College of Radiology, 2003). However, for simplicity, in the discussion below “examinations” or “mammograms” are used without designating whether they are screening or diagnostic because the mechanics of the measures are similar in either case. Before describing the measures, it is important to clearly define a positive and negative test. The Breast Imaging Reporting and Data System (BI-RADS) was developed by the American College of Radiology (ACR), in collaboration with several federal government agencies and other professional societies in order to create a standardized and objective method of categorizing mammography results. The BI-RADS 4th Edition identifies the most commonly used and accepted definitions, which are based on a standard set of assessments first promulgated by the ACR in 1992 and modified slightly in 2003. Table 2–1 outlines terms used to define test positivity/negativity as found in the 1st and 4th editions of BI-RADS. The assessments are intended to be linked to specific recommendations for care, including continued routine screening (Category 1, 2), immediate additional imaging such as additional mammographic views and ultrasound or comparison with previous films (Category 0), short-interval (typically 6 months) follow-up (Category 3), or biopsy consideration (Category 4) and biopsy/surgical consult recommended (Category 5).

OCR for page 24
Improving Breast Imaging Quality Standards Based on these assessments and recommendations, definitions of a positive mammography interpretation have also been suggested by the ACR BI-RADS Committee, as follows: Screening Mammography: Positive test=Category 0, 4, 5 Negative test=Category 1, 2 Diagnostic Mammography: Positive test=Category 4, 5, 6 Negative test=Category 1, 2, 3 MQSA regulations, in contrast, define a positive mammogram as one that has an overall assessment of findings that is either “suspicious” or “highly suggestive of malignancy.” BI-RADS also now allows a single overall final assessment for the combined mammography and ultrasound imaging. Facilities that perform ultrasound online, at the time of diagnostic evaluation for an abnormal mammogram or palpable mass, will not have outcome statistics comparable to facilities where mammograms are reported without including the ultrasound evaluation. For example, a patient with a palpable finding may go to a facility and be found to have a negative mammogram and positive ultrasound, and the assessment will be reported as positive. While there has been much improvement in mammography reporting since the adoption of BI-RADS, there is still inter- and intraobserver variability in how this reporting system is used (Kerlikowske et al., 1998). Some variability in calculated performance measures can, therefore, be attributed to variance among interpreting physicians on what constitutes an abnormal mammogram. Moreover, though the intent is clear, the linkage between assessment and recommendations is not always maintained in clinical practice. Indeed, Food and Drug Administration (FDA) rules require use of the overall assessments listed in Table 2–1, but the recommendations associated with each category are not mandated or inspected by FDA. Thus, considerable variability in recommendations exists. For example, 38 percent of women with “probably benign” assessments had recommendations for immediate additional imaging in one national evaluation (Taplin et al., 2002). Some analyses include Category 3 assessments associated with recommendations for performance of additional imaging as positive tests (Barlow et al., 2002). In addition, some women with mammograms interpreted as Category 1 or 2 have received recommendations for biopsy/surgical consult due to a physical finding not seen on the mammogram because mammography cannot rule out cancer (Poplack et al., 2000). Therefore, these standard definitions serve as a starting point, but in practice, adaptations may be needed to accommodate the reality of clinical care. It is also important to define what constitutes “cancer.” In the context of mammography practice, the gold standard source for breast cancer diagnosis is tissue from the breast, obtained through needle sampling or open biopsy. This tissue sample then leads to the identification of invasive carcinoma or noninvasive ductal carcinoma in situ (DCIS). Breast cancers are labeled invasive because the cells are invading surrounding normal tissue. Invasive cancers account for most (80 percent) of breast cancers found at the time of screening in the United States. DCIS is included as a cancer diagnosis primarily because standard treatment for DCIS currently entails complete excision, similar to invasive cancers. Approximately 20 percent of breast cancer diagnoses are DCIS (Ernster et al.,

OCR for page 24
Improving Breast Imaging Quality Standards TABLE 2–2 Possible Results for a Screening Test   Cancer Outcome + − Test + TP—True positive FP—False positive Result − FN—False negative TN—True negative 2002). Lobular carcinoma in situ (LCIS) also is occasionally reported in the tissue, but should not be counted as cancer because it is not currently treated. Interpretive performance can also vary as a function of the time since the prior mammogram (Yankaskas et al., 2005). Recognizing that differences exist among screening guidelines regarding the appropriate screening interval (annual recommended by the American Cancer Society [ACS] and the American College of Obstetricians and Gynecologists [ACOG], every 1 to 2 years recommended by the U.S. Preventative Services Task Force [USPSTF]) (U.S. Preventive Services Task Force, 2002; Smith and D’Orsi, 2004; Smith et al., 2005), the specification of the period of follow-up after a mammogram is needed to observe women for the occurrence of cancer and calculate performance indices that can be compared in a meaningful way. With the above definitions, it is possible to identify several measures of interpretive performance. The measures of performance available to assess interpreting physician’s interpretation all build from a basic 2×2 table of test result and cancer outcome as noted in Table 2–2. A one-year interval should be used to calculate the performance indices so that they are comparable. Standard definitions of these measures are well summarized in the ACR BI-RADS 4th Edition, and are highlighted here along with some of the strengths and weaknesses of each measure. Separation of the data of screening from diagnostic indications for mammography is absolutely essential if performance measures are to be meaningful. Sensitivity Sensitivity refers to the ability of a test to find a cancer when it is present [TP/(TP+FN)]. The challenge with this measure is determining whether a cancer has been diagnosed, particularly if a woman was given a negative mammogram interpretation. Those women are not necessarily seen back in the same facility for their next examination. Therefore it is not possible to know with certainty whether they have cancer or not. This problem is called verification bias. Because only those women sent to biopsy within a facility have their true cancer status known, verification bias may lead to an overestimation of sensitivity (Zheng et al., 2005). Relatively complete ascertainment of cancer cases can be expected only if a mammography facility is able to link its examinations to those breast cancer cases compiled in a regional tumor registry, and this is practical only for a very small minority (likely fewer than 5 percent) of mammography facilities in the United States.

OCR for page 24
Improving Breast Imaging Quality Standards Because the ultimate purpose of screening is to reduce disease-specific mortality by detecting and treating early-stage cancers, the sensitivity of mammography is important. However, sensitivity is affected by many factors, including whether it is a first (prevalent1) mammogram or subsequent (incident) mammogram, the distribution of patient ages and tumor sizes in the population of women being screened by the interpreting physician, the length of time since prior mammograms, the density of the breast tissue among women with cancer, and the number of women with cancer found by an interpreting physician (Carney et al., 2003; Yankaskas et al., 2005). Most screening populations have between 2 and 10 cancers per 1,000 women screened, and among women undergoing periodic screening on a regular basis, the cancer incidence rate is 2 to 4 per 1,000 (American College of Radiology, 2003). Under current MQSA regulations, a single interpreting physician must interpret 960 mammograms over 2 years to maintain accreditation. If he or she is reading only screening (and not any diagnostic) mammograms, he or she may, on average, see two to four women with cancer per year. Estimating sensitivity among such a small set of cancers affects the reliability of the measures. Random variation will be large for some measures, making comparisons among interpreting physicians very difficult, even if the interpreting physician has complete knowledge regarding the cancer status of all the women examined. Because most interpreting physicians do not have that complete information (no linkage to regional tumor registry) or the volumes to create stable estimates, measurement of sensitivity will be of very limited use for individual interpreting physicians in practice. Specificity Specificity is the ability of the test to determine that a disease is absent when a patient is disease-free [TN/(TN+FP)]. Because most screened women (990 to 998 per 1,000) are disease free, this number will be quite high even if a poorly performing interpreting physician gives nearly every woman a negative interpretation. But interpreting physicians must interpret some mammograms as positive in order to find cancers, so false-positive examinations occur. Estimates of the cumulative risk of a false-positive mammogram over a 10-year period of annual mammography vary between 20 and 50 percent (Elmore et al., 1998; Hofvind et al., 2004), and the risk of a negative invasive procedure may be as high as 6 percent (Hofvind et al., 2004). High specificity of a test is therefore important to limit the harms done to healthy women as a result of screening. Although one study of nearly 500 U.S. women without a history of breast cancer found that 63 percent thought 500 or more false-positive mammograms per life saved was reasonable (Schwartz et al., 2000), the cost and anxiety associated with false-positive mammograms can be substantial. Studies have shown that anxiety usually diminishes soon after the episode, but in some women anxiety can endure, and in one study anxiety was greater prior to the next screening mammogram for women who had undergone biopsy on the previous occasion of screening compared with women who had normal test results (Brett and Austoker, 2001). One study has shown that immediate interpretation of mammograms was associated with reduced levels of anxiety (Barton et al., 2004). 1   The prevalent screen refers to the first time a woman undergoes a screening test. Incident screens refer to subsequent screening tests performed at regular intervals. One useful index of screening mammography performance is that the number of cancers per 1,000 women identified by prevalent screens should be at least two times higher than by incident screens.

OCR for page 24
Improving Breast Imaging Quality Standards Like sensitivity, specificity is a difficult measure to obtain for most interpreting physicians because it requires knowing the cancer status of all women examined (linkage to a regional tumor registry). Because it is difficult to ascertain the status of all women who undergo mammography with respect to the presence or absence of cancer, it is important to be clear about who is being included in the measure and what the follow-up period is. This has led to three levels of false-positive measurement (Bassett et al., 1994): FP1: No known cancer within one year of a Category 0, 4, or 5 assessment (screening). FP2: No known cancer within one year of a Category 4 or 5 assessment (usually diagnostic). FP3: No known cancer within one year of a Category 4 or 5 assessment, for which biopsy was actually performed. If each of these measures is estimated for a year, they can also be called rates. The limitation in choosing only one of the three rates is that there is a trade-off between the accuracy of the measure and the insight it provides regarding an interpreting physician’s performance. Although FP3 involves the most accurate measure of cancer status, it reflects only indirectly on the interpreting physician’s choice to send women to biopsy. Interpreting physicians’ ability to make that choice, and to make the recall versus no-recall decision at screening, are important characteristics. The most accurate estimate of FP (FP3) is therefore not necessarily the measure that provides the best insight into the interpreting physician’s performance. Conversely, FP1 includes BI-RADS 0’s, a high percentage of which have a low index of suspicion. Furthermore, measuring FP1 involves knowing the cancer status of all women for whom additional imaging was recommended (defined in BI-RADS as Category 0—incomplete, needs additional imaging). This is challenging because results of the subsequent evaluation may not be available. Currently, MQSA does not require that Category 0 examinations be tracked to determine the final overall assessment. The Committee recommends that for women who need additional imaging, mammography facilities must attempt to track these cases until they resolve to a final assessment. Although studies indicate that some interpreting physicians inappropriately assign women who need additional imaging a Category 3 BI-RADS assessment (Poplack et al., 2000; Taplin et al., 2002), this practice should be discouraged, and all women needing additional imaging should be tracked. Positive Predictive Value (PPV) There are three positive predictive values (PPV) that can be measured in practice, derived from the three false-positive measures described above. Again, these different measures are used to accommodate the challenges of data collection in practice. For example, though an interpreting physician may recommend a biopsy, it may not be done, and therefore the true cancer status may not be known. Thus, one must clearly state which PPV or PPVs are being monitored (Bassett et al., 1994), as recommended by the ACR. PPV1: The proportion of all women with positive examinations (Category 0, 4, or 5) who are diagnosed with breast cancer [TP/(TP +FP1)].

OCR for page 24
Improving Breast Imaging Quality Standards PPV2: The proportion of all women recommended for biopsy after mammography (Category 4 or 5) that are diagnosed with breast cancer [TP/(TP+FP2)]. PPV3: The proportion of all women biopsied due to the interpreting physician’s recommendation who are diagnosed with cancer at the time of biopsy [TP/(TP +FP3)]. MQSA requires that interpreting physicians have an established mechanism to ascertain the status of women referred for biopsy. With these data interpreting physicians can measure their PPV2, but it is still subject to verification bias because not all women recommended for biopsy will have it done and because ascertainment of procedures is never 100 percent. The limitation of PPV2 or PPV3 is that many more women are referred for additional imaging (8 percent) than biopsy (1.5 percent) (Taplin et al., 2002). An important skill in interpretation involves sorting who needs additional imaging versus biopsy; PPV2 and PPV3 do not account for this because they only focus on women referred for biopsy. The ACR recommends that interpreting physicians who choose to perform one of the two types of audits described in the BI-RADS atlas should track all women referred for additional imaging for their subsequent cancer status (PPV1) (American College of Radiology, 2003). Because measuring PPV1 may not be possible in the absence of an integrated health system and registry, the Committee recommends use of PPV2. Another limitation of PPV that influences its usefulness is that it is affected by the rate of cancer within the population examined. The PPV will be higher in populations with higher cancer rates. For example, an interpreting physician practicing among older populations of women versus younger will have a higher PPV, just because the risk of breast cancer is higher among older women. PPV1 will vary depending on the proportion of patients who are having an incident versus prevalent screen. Unfortunately, a high PPV does not necessarily correlate with better performance. For example, the interpreting physician who recommends biopsy for only larger, more classic lesions will have a higher PPV, but will miss the smaller, more subtle, and less characteristic lesions that may be more important to patient outcomes (Sickles, 1992). Therefore the Committee recommends measuring the cancer detection rate in addition to PPV2 in order to facilitate interpretation of the measure. A higher PPV2 should occur in a population with a higher cancer detection rate (see section below on Cancer Detection Rate). Negative Predictive Value (NPV) Negative predictive value (NPV) is the proportion of all women with a negative result who are actually free of the disease [TN/(FN+TN)]. Monitoring NPV is not a requirement of MQSA, and in practice, the NPV is rarely used because it involves tracking women with negative examinations (linkage to regional tumor registry is required). Cancer Detection Rate Cancer detection rate is the number of women found to have breast cancer per 1,000 women examined. This rate is meaningless unless screening mammograms are assessed separately from diagnostic evaluations. This measure is similar to sensitivity, but includes all examinations (not just cancer cases) in the denominator. The advantage is that interpreting physicians know the total number of examinations they have interpreted and can identify the cancers resulting from biopsies they recommended or performed.

OCR for page 24
Improving Breast Imaging Quality Standards The disadvantage is that differences in the cancer detection rate may reflect not only differences in performance, but also differences in the rate and risk of cancer in the population served. A high cancer detection rate relative to other interpreting physicians may simply indicate that the interpreting physician is caring for an older population of women who are at higher risk for cancer, not that he or she is necessarily highly skilled at finding cancer. This difference can be mitigated by adjusting the cancer rate to a standard population age distribution if adequate numbers exist in each age group to allow rate estimates. For radiologists comparing their own measures over time, these kinds of adjustments are less important if the population characteristics are stable. Other factors that could influence the cancer detection rate include the proportion of women having their first (prevalent) screen and the proportion having a repeat (incident) screen, the interval since the prior screen, differing practices with respect to who is included in screenings, whether practices read examinations individually as they are completed or in batches at a later time (mode of interpretation), and how long a physician has been in practice (van Landeghem et al., 2002; Harvey et al., 2003; Smith-Bindman et al., 2003). Interpretive sensitivity and specificity are higher on first screens compared to incident screens, presumably due to slightly larger tumors being found at prevalent screens (Yankaskas et al., 2005). For incident screens, the longer the time since the prior mammogram, the better interpretative performance appears, again because tumors will be slightly larger (Yankaskas et al., 2005). Some practices offer only diagnostic mammography to high-risk women with a history of breast cancer, while others will offer screening. Excluding such women from the screening population will reduce the number of cancers at the time of screening and affect positive predictive values, but may also change a physician’s threshold for calling a positive test. Changes in the threshold for a positive test can affect performance, and this threshold seems to change with experience (Barlow et al., 2004). Abnormal Interpretation Rate The abnormal interpretation rate is a measure of the number of women whose mammogram interpretation leads to additional imaging or biopsy. For screening mammography, the term “recall rate” is often used. The recall rate is the proportion of all women undergoing screening mammography who are given a positive interpretation that requires additional examinations (Category 0 [minus the exams for which only comparison with outside films is requested], 4, or 5). Desirable goals for recall rates for highly skilled interpreting physicians were set at less than or equal to 10 percent in the early 1990s (Bassett et al., 1994). This measure is easy to calculate because it does not rely on establishing the cancer status of women. The disadvantage is that differences in this measure may not reflect differences in skill except when the rate is extraordinarily high or low. Again, this will depend on the proportion of prevalent to incident screens (Frankel et al., 1995), on the availability of previous films for comparison (Kan et al., 2000), and on the mode of interpretation (Sickles, 1992, 1995a; Ghate et al., 2005). Cancer Staging Cancer staging is performed after a breast cancer is diagnosed. Stage, along with other tumor prognostic indicators (e.g., tumor grade, hormone receptor status, and other factors), is used to determine the patient’s prognosis, and the combination of tumor

OCR for page 24
Improving Breast Imaging Quality Standards markers and stage influences treatment. Cancer staging takes into account information regarding the tumor histological type and size, as well as regional lymph node status and distant metastases. Staging information, which is generally derived from pathology reports in varying forms, is useful for the mammography audit because women with advanced, metastatic tumors are more likely to die from the disease. However, tumor staging information is not always easily available to the imaging facility, and thus, may be more of a burden to acquire. Tumor Size The size of the breast cancer at the time of diagnosis is relevant only for invasive cancers. All patients with only DCIS are Stage 0, despite the extent of the DCIS. An interpreting physician who routinely detects smaller invasive tumors is likely to be more skilled at identifying small abnormalities in a mammogram. The proportion of invasive tumors less than 1.5 or 1.0 cm could be used as one measure. Using tumor size as a performance measure has several limitations; measurement of a tumor is an inexact science and may vary depending on what is recorded in a patient record or tumor registry (e.g., clinical size based on palpation, size based on imaging, size based on pathology), and who is doing the measuring. SEER (Surveillance, Epidemiology and End Results) registries use a hierarchy to choose which measurement to include. Heterogeneity will occur because not all measurements are available. Furthermore, the proportion of small tumors will be affected by the population of tumors seen by a given interpreting physician; for example, a physician reading more prevalent screens will have a greater proportion of large tumors because there are more large tumors in the population. The screening interval is also important when tumor size is used as a performance measure. A shift toward smaller tumor size has been noted in screened populations such as those in the Swedish randomized trials of mammography (Tabar et al., 1992). A similar shift is expected in other screened populations. In one study of a National Screening Program, invasive breast cancer tumor size at the time of discovery decreased from 2.1–2.4 cm to 1.1–1.4 cm between 1983 and 1997, within which time period the national screening program had been implemented (Scheiden et al., 2001). Axillary Lymph Node Status The presence or absence of cancer cells in the axillary lymph nodes is one of the most important predictors of patient outcome. The prognosis worsens with each positive node (containing cancer cells) compared to women with histologically negative lymph nodes. Node positivity, however, is not necessarily a useful surrogate measure of an interpreting physician’s interpretive performance because inherently aggressive tumors may metastasize to the axillary lymph nodes early, when the tumor is still small, or even before the tumor becomes visible on a mammogram.

OCR for page 24
Improving Breast Imaging Quality Standards Area Under the Receiver Operating Curve2 (AUC) Interpreting physicians face a difficult challenge. While trying to find cancer they must also try to limit the number of false-positive interpretations. If the distribution of interpretations among women with cancer and women without breast cancer were graphed together on one x/y axis, it would look like Figure 2–1. Focusing on sensitivity simply indicates how an interpreting physician operates when cancer is present. Focusing on specificity simply indicates how an interpreting physician operates when cancer is not present. What is really needed to assess performance is the ability of the interpreting physician to simultaneously discriminate between women with and without cancer. This is FIGURE 2–1 Ideal (A) and actual common (B) distribution of mammography interpretation (BI-RADS Assessment Categories 1–5). 2   For a more detailed description of ROC curves, see Appendix C in Saving Women’s Lives (IOM, 2005).

OCR for page 24
Improving Breast Imaging Quality Standards reflected in the overlap between the two distributions of interpretations in Figure 2–1, and is measured by the area (AUC) under the receiver operating curve (ROC) (Figure 2–2). ROC analysis was developed as a methodology to quantify the ability to correctly distinguish signals of interest from the background noise in the system. The ROC curves map the effects of varying decision thresholds and demonstrate the relationship between the true-positive rate (sensitivity) and the false-positive rate (specificity). If a reader’s interpretation is no better than a flip of the coin, the distribution of BI-RADS assessments in Figure 2–1 will overlap completely and the AUC in Figure 2–2 will be 0.5. If an interpreting physician has complete discrimination, the distribution of BI-RADS assessments will be completely separated for women with and without cancer, as in Figure 2–1a, and the AUC will be 1.0. An interpreting physician’s AUC therefore usually falls between 0.5 and 1.0. Estimating the AUC is possible if the status of all examined women is known and the appropriate computer software is employed. It has the advantage of reflecting the discriminatory ability of the interpreting physician and incorporates both sensitivity and specificity into a single measure, accounting for the trade-offs between the two measures. FIGURE 2–2 ROC analysis. If a reader is guessing between two choices (cancer versus no cancer), the fraction of true positives will tend to equal the fraction of false negatives. Thus, the resulting ROC curve would be at a 45-degree angle and the area under the curve, 0.5, represents the 50 percent accuracy of the test. In contrast, the ROC curve for a reader with 100 percent accuracy will follow the y-axis at a false-positive fraction of zero (no false positives) and travel along the top of the plot area at a true-positive fraction of one (all true positives). The area under the curve, 1.0, represents the 100 percent accuracy of the test. The hypothetical result for a reader with an area under the curve of 0.85 is shown for comparison.

OCR for page 24
Improving Breast Imaging Quality Standards The Committee also considered a number of other approaches that could potentially improve interpretive performance, such as double reading, use of CAD, increased continuing experience (interpretive volume) requirements, and CME programs that focus on interpretation and self-assessment. While there is some evidence to suggest that these approaches could also improve the quality of mammography interpretation, the data available to date are insufficient to justify changes to MQSA legislation or regulations. However, the Committee recommends that additional studies be rapidly undertaken to develop a stronger evidence base for the effects of CME, reader volume, double reading, and CAD on interpretive performance. REFERENCES Adcock KA. 2004. Initiative to improve mammogram interpretation. The Permanente Journal 8(2):12–18. American College of Radiology. 2003. ACR BI-RADS®—Mammography. In: ACR Breast Imaging Reporting and Data System, Breast Imaging Atlas. 4th ed. Reston, VA: American College of Radiology. Andersson I, Aspegren K, Janzon L, Landberg T, Lindholm K, Linell F, Ljungberg O, Ranstam J, Sigfusson B. 1988. Mammographic screening and mortality from breast cancer: The Malmo Mammographic Screening Trial. British Medical Journal 297(6654):943–948. Anttinen I, Pamilo M, Soiva M, Roiha M. 1993. Double reading of mammography screening films—one radiologist or two? Clinical Radiology 48(6):414–421. Applied Vision Research Institute. 2004. PERFORMS: SA2003 Report to the National Coordinating Committee for QA Radiologists. Derby, England: University of Derby. August DA, Carpenter LC, Harness JK, Delosh T, Cody RL, Adler DD, Oberman H, Wilkins E, Schottenfeld D, McNeely SG. 1993. Benefits of a multidisciplinary approach to breast care. Journal of Surgical Oncology 53(3):161–167. Baker JA, Rosen EL, Lo JY, Gimenez EI, Walsh R, Soo MS. 2003. Computer-aided detection (CAD) in screening mammography: Sensitivity of commercial CAD systems for detecting architectural distortion. American Journal of Roentgenology 181(4):1083–1088. Ballard-Barbash R, Taplin SH, Yankaskas BC, Ernster VL, Rosenberg RD, Carney PA, Barlow WE, Geller BM, Kerlikowske K, Edwards BK, Lynch CF, Urban N, Chrvala CA, Key CR, Poplack SP, Worden JK, Kessler LG. 1997. Breast Cancer Surveillance Consortium: A national mammography screening and outcomes database. American Journal of Roentgenology 169(4):1001–1008. Barlow WE, Chi C, Carney PA, Taplin SH, D’Orsi C, Cutter G, Hendrick RE, Elmore JG. 2004. Accuracy of screening mammography interpretation by characteristics of radiologists. Journal of the National Cancer Institute 96(24):1840–1850. Barlow WE, Lehman CD, Zheng Y, Ballard-Barbash R, Yankaskas BC, Cutter GR, Carney PA, Geller BM, Rosenberg R, Kerlikowske K, Weaver DL, Taplin SH. 2002. Performance of diagnostic mammography for women with signs or symptoms of breast cancer. Journal of the National Cancer Institute 94(15):1151–1159. Barton MB, Morley DS, Moore S, Allen JD, Kleinman KP, Emmons KM, Fletcher SW. 2004. Decreasing women’s anxieties after abnormal mammograms: A controlled trial. Journal of the National Cancer Institute 96(7):529–538.

OCR for page 24
Improving Breast Imaging Quality Standards Bassett LW, Hendrick R, Bassford T, Butler PF, Carter D, DeBor M, D’Orsi CJ, Garlinghouse CJ, Jones RF, Langer AS, Lichtenfeld JL, Osuch JR, Reynolds LN, deParedes ES, Williams RE. 1994. Quality determinants of mammography. Clinical Practice Guideline No. 13. AHCPR Publication No. 95–0632. Rockville, MD: Agency for Health Care Policy and Research. Beam CA, Conant EF, Sickles EA. 2002. Factors affecting radiologist inconsistency in screening mammography. Academic Radiology 9(5):531–540. Beam CA, Conant EF, Sickles EA. 2003. Association of volume and volume-independent factors with accuracy in screening mammogram interpretation. Journal of the National Cancer Institute 95(4):282–290. Beam CA, Layde PM, Sullivan DC. 1996. Variability in the interpretation of screening mammograms by U.S. radiologists: Findings from a national sample. Archives of Internal Medicine 156(2):209–213. Bennett NL, Davis DA, Easterling WE, Friedmann P, Green JS, Koeppen BM, Mazmanian PE, Waxman HS. 2000. Continuing medical education: A new vision of the professional development of physicians. Academic Medicine 75(12):1167–1172. Berg WA, D’Orsi CJ, Jackson VP, Bassett LW, Beam CA, Lewis RS, Crewson PE. 2002. Does training in the Breast Imaging Reporting and Data System (BI-RADS) improve biopsy recommendations or feature analysis agreement with experienced breast imagers at mammography? Radiology 224(3):871–880. Berlin L. 2003. Breast cancer, mammography, and malpractice litigation: The controversies continue. American Journal of Roentgenology 180(5):1229–1237. Brem RF, Schoonjans JM. 2001. Radiologist detection of microcalcifications with and without computer-aided detection: A comparative study. Clinical Radiology 56(2):150–154. Brett J, Austoker J. 2001. Women who are recalled for further investigation for breast screening: Psychological consequences 3 years after recall and factors affecting re-attendance. Journal of Public Health Medicine 23(4):292–300. Brown ML, Fintor L. 1995. U.S. screening mammography services with mobile units: Results from the National Survey of Mammography Facilities. Radiology 195(2):529–532. Brown ML, Houn F, Sickles EA, Kessler LG. 1995. Screening mammography in community practice: Positive predictive value of abnormal findings and yield of follow-up diagnostic procedures. American Journal of Roentgenology 165(6):1373–1377. Buist DS, Porter PL, Lehman C, Taplin SH, White E. 2004. Factors contributing to mammography failure in women aged 40–49 years. Journal of the National Cancer Institute 96(19):1432–1440. Byrne C. 1997. Studying mammographic density: Implications for understanding breast cancer. Journal of the National Cancer Institute 89(8):531–533. Carney PA, Geller BM, Moffett H, Ganger M, Sewell M, Barlow WE, Stalnaker N, Taplin SH, Sisk C, Ernster VL, Wilkie HA, Yankaskas B, Poplack SP, Urban N, West MM, Rosenberg RD, Michael S, Mercurio TD, Ballard-Barbash R. 2000. Current medicolegal and confidentiality issues in large, multicenter research programs. American Journal of Epidemiology 152(4):371–378. Carney PA, Miglioretti DL, Yankaskas BC, Kerlikowske K, Rosenberg R, Rutter CM, Geller BM, Abraham LA, Taplin SH, Dignan M, Cutter G, Ballard-Barbash R. 2003. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Annals of Internal Medicine 138(3):168–175.

OCR for page 24
Improving Breast Imaging Quality Standards Casalino L, Gillies RR, Shortell SM, Schmittdiel JA, Bodenheimer T, Robinson JC, Rundall T, Oswald N, Schauffler H, Wang MC. 2003. External incentives, information technology, and organized processes to improve health care quality for patients with chronic diseases. JAMA 289(4):434–441. Cave DG. 1995. Profiling physician practice patterns using diagnostic episode clusters. Medical Care 33(5):463–486. Centers for Medicare and Medicaid Services. 2004. Physician Group Practice Demonstration. [Online]. Available: http://www.cms.hhs.gov/researchers/demos/pgpdemo.asp? [accessed September 29, 2004]. Chang JH, Vines E, Bertsch H, Fraker DL, Czerniecki BJ, Rosato EF, Lawton T, Conant EF, Orel SG, Schuchter L, Fox KR, Zieber N, Glick JH, Solin LJ. 2001. The impact of a multidisciplinary breast cancer center on recommendations for patient management: The University of Pennsylvania experience. Cancer 91(7):1231–1237. Coleman C. 2005. The breast cancer clinic: Yesterday, today, and tomorrow. In: Buchsel PC, Yarbro CH, eds. Oncology Nursing in the Ambulatory Setting. Sudbury, MA: Jones and Bartlett Publishers. Pp. 231–245. Coleman C, Lebovic GS. 1996. Organizing a comprehensive breast center. In: Harris JR, Lippman ME, Morrow M, eds. Diseases of the Breast. Philadelphia, PA: Lippincott-Raven Publishers. Pp. 963–970. Colorado Mammography Project. 2003. Colorado Mammography Project: Data. [Online]. Available: http://cmap.cooperinstden.org/data.htm [accessed December 16, 2004]. Davis D, O’Brien MA, Freemantle N, Wolf F, Mazmanian P, Taylor-Vaisey A. 1999. Impact of formal continuing medical education: Do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA 282(9):867–874. Davis DA, Thomson MA, Oxman AD, Haynes RB. 1992. Evidence for the effectiveness of CME. A review of 50 randomized controlled trials. JAMA 268(9):1111–1117. Davis DA, Thomson MA, Oxman AD, Haynes RB. 1995. Changing physician performance. A systematic review of the effect of continuing medical education strategies. JAMA 274(9):700–705. De Bruhl ND, Bassett LW, Jessop NW, Mason AM. 1996. Mobile mammography: Results of a national survey. Radiology 201(2):433–437. de Wolf C. 2003. The Need for EU Guidelines for Multidisciplinary Breast Care. Presentation at the meeting of the European Parliament, February 17, 2003, Brussels, Belgium. [Online]. Available: http://www.europarl.eu.int/workshop/breast_cancer/docs/de_wolf_en.pdf [accessed November 4, 2004]. Dee KE, Sickles EA. 2001. Medical audit of diagnostic mammography examinations: Comparison with screening outcomes obtained concurrently. American Journal of Roentgenology 176(3):729–733. Deloitte & Touche. 2000. Taking the Pulse: Physicians and the Internet. New York: Deloitte & Touche. Destounis SV, DiNitto P, Logan-Young W, Bonaccio E, Zuley ML, Willison KM. 2004. Can computer-aided detection with double reading of screening mammograms help decrease the false-negative rate? Initial experience. Radiology 232(2):578–584. Dinnes J, Moss S, Melia J, Blanks R, Song F, Kleijnen J. 2001. Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: Findings of a systematic review. Breast 10(6):455–463.

OCR for page 24
Improving Breast Imaging Quality Standards DiSalvo TG, Normand SL, Hauptman PJ, Guadagnoli E, Palmer RH, McNeil BJ. 2001. Pitfalls in assessing the quality of care for patients with cardiovascular disease. American Journal of Medicine 111(4):297–303. Duijm LE, Groenewoud JH, Hendriks JH, de Koning HJ. 2004. Independent double reading of screening mammograms in the Netherlands: Effect of arbitration following reader disagreements. Radiology 231(2):564–570. Egger JR, Cutter GR, Carney PA, Taplin SH, Barlow WE, Hendrick RE, D’Orsi CJ, Fosse JS, Abraham L, Elmore JG. In press. Mammographers’ perception of women’s breast cancer risk. Medical Decision Making. Egglin TK, Feinstein AR. 1996. Context bias. A problem in diagnostic radiology. JAMA 276(21):1752–1755. Elmore JG, Armstrong K, Lehman CD, Fletcher SW. 2005. Screening for breast cancer. JAMA 293(10):1245–1256. Elmore JG, Feinstein AR. 1992. A bibliography of publications on observer variability (final installment). Journal of Clinical Epidemiology 45(6):567–580. Elmore JG, Miglioretti DL, Reisch LM, Barton MB, Kreuter W, Christiansen CL, Fletcher SW. 2002. Screening mammograms by community radiologists: Variability in false-positive rates. Journal of the National Cancer Institute 94(18):1373–1380. Elmore JG, Nakano CY, Koepsell TD, Desnick LM, D’Orsi CJ, Ransohoff DF. 2003. International variation in screening mammography interpretations in community-based programs. Journal of the National Cancer Institute 95(18):1384–1393. Elmore JG, Taplin S, Barlow WE, Cutter G, D’Orsi C, Hendrick RE, Abraham L, Fosse J, Carney PA. In press. Community radiologists’ medical malpractice experience, concerns, and interpretive performance. Radiology. Elmore JG, Wells CK, Howard DH, Feinstein AR. 1997. The impact of clinical history on mammographic interpretations. JAMA 277(1):49–52. Elmore JG, Wells CK, Howard DH. 1998. Does diagnostic accuracy in mammography depend on radiologists’ experience? Journal of Women’s Health 7(4):443–449. Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. 1994. Variability in radiologists’ interpretations of mammograms. New England Journal of Medicine 331(22):1493–1499. Elwood JM, Cox B, Richardson AK. 1993. The effectiveness of breast cancer screening by mammography in younger women. Online Journal of Current Clinical Trials. Doc. No. 32. Epstein AM, Lee TH, Hamel M. 2004. Paying physicians for high-quality care. New England Journal of Medicine 350(18):1910–1912. Ernster VL, Ballard-Barbash R, Barlow WE, Zheng Y, Weaver DL, Cutter G, Yankaskas BC, Rosenberg R, Carney PA, Kerlikowske K, Taplin SH, Urban N, Geller BM. 2002. Detection of ductal carcinoma in situ in women undergoing screening mammography. Journal of the National Cancer Institute 94(20):1546–1554. Esserman L, Cowley H, Eberle C, Kirkpatrick A, Chang S, Berbaum K, Gale A. 2002. Improving the accuracy of mammography: Volume and outcome relationships. Journal of the National Cancer Institute 94(5):369–375. European Society of Mastology (EUSOMA). 2000. The requirements of a specialist breast unit. European Journal of Cancer 36(18):2288–2293. FDA (U.S. Food and Drug Administration). 1997. Quality Mammography Standards; Final Rule (Preamble). 21 C.F.R. Parts 16 and 900. FDA. 2004. HIPAA and Release of Information for MQSA Purposes. [Online]. Available: http://www.fda.gov/cdrh/mammography/mqsa-rev.html#HIPPA [accessed October 15, 2004].

OCR for page 24
Improving Breast Imaging Quality Standards Feig SA, Hall FM, Ikeda DM, Mendelson EB, Rubin EC, Segel MC, Watson AB, Eklund GW, Stelling CB, Jackson VP. 2000. Society of Breast Imaging residency and fellowship training curriculum. Radiologic Clinics of North America 38(4):xi, 915–920. Feig SA, Sickles EA, Evans WP, Linver MN. 2004. Re: Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. Journal of the National Cancer Institute 96(16):1260–1261; author reply, 1261. Feinstein AR. 1985. A bibliography of publications on observer variability. Journal of Chronic Diseases 38(8):619–632. Fletcher SW, Black W, Harris R, Rimer BK, Shapiro S. 1993. Report of the International Workshop on Screening for Breast Cancer. Journal of the National Cancer Institute 85(20):1644–1656. Fletcher SW, Elmore JG. 2003. Clinical practice. Mammographic screening for breast cancer. New England Journal of Medicine 348(17):1672–1680. Frankel SD, Sickles EA, Curpen BN, Sollitto RA, Ominsky SH, Galvin HB. 1995. Initial versus subsequent screening mammography: Comparison of findings and their prognostic significance. American Journal of Roentgenology 164(5):1107–1109. Freer TW, Ulissey MJ. 2001. Screening mammography with computer-aided detection: Prospective study of 12,860 patients in a community breast center. Radiology 220(3):781–786. Frisell J, Eklund G, Hellstrom L, Lidbrink E, Rutqvist LE, Somell A. 1991. Randomized study of mammography screening—preliminary report on mortality in the Stockholm trial. Breast Cancer Research & Treatment 18(1):49–56. Gale AG. 2003. PERFORMS: A self-assessment scheme for radiologists in breast screening. Seminars in Breast Disease 6(3):148–152. Garcia R. 2004. Interdisciplinary breast cancer care: Declaring and improving the standard. Review. Oncology (Huntington) 18(10):1268–1270. Geller BM, Barlow WE, Ballard-Barbash R, Ernster VL, Yankaskas BC, Sickles EA, Carney PA, Dignan MB, Rosenberg RD, Urban N, Zheng Y, Taplin SH. 2002. Use of the American College of Radiology BI-RADS to report on the mammographic evaluation of women with signs and symptoms of breast disease. Radiology 222(2):536–542. Ghate SV, Soo MS, Baker JA, Walsh R, Gimenez EI, Rosen EL. 2005. Comparison of recall and cancer detection rates for immediate versus batch interpretation of screening mammograms. Radiology 235(1):31–35. Gigerenzer G. 2002. Calculated Risks: How to Know When Numbers Deceive You. New York: Simon & Schuster. Greenfield S, Kaplan SH, Kahn R, Ninomiya J, Griffith JL. 2002. Profiling care provided by different groups of physicians: Effects of patient case-mix (bias) and physician-level clustering on quality assessment results. Annals of Internal Medicine 136(2):111–121. Gunn PP, Fremont AM, Bottrell M, Shugarman LR, Galegher J, Bikson T. 2004. The Health Insurance Portability and Accountability Act Privacy Rule: A practical guide for researchers. Medical Care 42(4):321–327. Gur D, Sumkin JH, Rockette HE, Ganott M, Hakim C, Hardesty L, Poller WR, Shah R, Wallace L. 2004. Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. Journal of the National Cancer Institute 96(3):185–190. Harvey SC, Geller B, Oppenheimer RG, Pinet M, Riddell L, Garra B. 2003. Increase in cancer detection and recall rates with independent double interpretation of screening mammography. American Journal of Roentgenology 180(5):1461–1467.

OCR for page 24
Improving Breast Imaging Quality Standards Hendee WR, Pattern JA, Simmons G. 1999. A hospital-employed physicist working in radiology should provide training to nonradiologists wishing to offer imaging services. Medical Physics 26(6):859–861. Hendrick RE, Cutter GR, Berns EA, Nakano C, Egger J, Carney PA, Abraham L, Taplin SH, D’Orsi CJ, Barlow W, Elmore JG. 2005. Community-based mammography practice: Services, charges, and interpretation methods. American Journal of Roentgenology 184(2):433–438. Henson RM, Wyatt SW, Lee NC. 1996. The National Breast and Cervical Cancer Early Detection Program: A comprehensive public health response to two major health issues for women. Journal of Public Health Management & Practice 2(2):36–47. Hofvind S, Thresen S, Tretli S. 2004. The cumulative risk of a false-positive recall in the Norwegian Breast Cancer Screening Program. Cancer 101(7):1501–1507. Hulka CA, Slanetz PJ, Halpern EF, Hall DA, McCarthy KA, Moore R, Boutin S, Kopans DB. 1997. Patients’ opinion of mammography screening services: Immediate results versus delayed results due to interpretation by two observers. American Journal of Roentgenology 168(4):1085–1089. Hutton B, Bradt E, Chen J, Gobrecht P, O’Connell J, Pedulla A, Signorelli T, Bisner S, Hoffman D, Lawson H. 2004. Breast cancer: Screening data for assessing quality of services: New York, 2000–2003. Morbidity & Mortality Weekly Report 53(21):455–457. Integrated Healthcare Association. 2004. IHA “Pay For Performance”. [Online]. Available: http://www.iha.org/Ihaproj.htm [accessed September 29, 2004]. IOM (Institute of Medicine). 2001a. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: National Academy Press. IOM. 2001b. Interpreting the Volume-Outcome Relationship in the Context of Cancer Care. Washington, DC: National Academy Press. IOM. 2005. Saving Women’s Lives: Strategies for Improving Breast Cancer Detection and Diagnosis. Washington, DC: The National Academies Press. Jamtvedt G, Young JM, Kristoffersen DT, Thomson O’Brien MA, Oxman AD. 2003. Audit and feedback: Effects on professional practice and health care outcomes [Update of Cochrane Database Syst Rev. 2000;(2)]. Cochrane Database of Systematic Reviews (3):CD000259. Kan L, Olivotto IA, Warren Burhenne LJ, Sickles EA, Coldman AJ. 2000. Standardized abnormal interpretation and cancer detection ratios to assess reading volume and reader performance in a breast screening program. Radiology 215(2):563–567. Karssemeijer N, Otten JDM, Verbeek ALM, Groenewoud JH, de Koning HJ, Hendriks JHCL, Holland R. 2003. Computer-aided detection versus independent double reading of masses on mammograms. Radiology 227(1):192–200. Kerlikowske K, Carney PA, Geller B, Mandelson MT, Taplin SH, Malvin K, Ernster V, Urban N, Cutter G, Rosenberg R, Ballard-Barbash R. 2000. Performance of screening mammography among women with and without a first-degree relative with breast cancer. Annals of Internal Medicine 133(11):855–863. Kerlikowske K, Grady D, Barclay J, Frankel SD, Ominsky SH, Sickles EA, Ernster V. 1998. Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. Journal of the National Cancer Institute 90(23):1801–1809. Kerlikowske K, Smith-Bindman R, Ljung BM, Grady D. 2003. Evaluation of abnormal mammography results and palpable breast abnormalities. Annals of Internal Medicine 139(4):274–284.

OCR for page 24
Improving Breast Imaging Quality Standards Khuri SF, Daley J, Henderson WG. 2002. The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Archives of Surgery 137(1):20–27. Kiefe CI, Allison JJ, Williams OD, Person SD, Weaver MT, Weissman NW. 2001. Improving quality improvement using achievable benchmarks for physician feedback: A randomized controlled trial. JAMA 285(22):2871–2879. Klabunde CN, Sancho-Garnier H, Broeders M, Thoresen S, Rodrigues VJL, Ballard-Barbash R. 2001. Quality assurance for screening mammography data collection systems in 22 countries. International Journal of Technology Assessment in Health Care 17(4):528–541. Kolb GR. 2000. Disease management is the future: Breast cancer is the model. Surgical Oncology Clinics of North America 9(2):217–232. Kossoff M, Brothers L, Cawson J, Osborne J, Wylie E. 2003. BreastScreen Australia: How we handle variability in interpretive skills. Seminars in Breast Disease 6(3):123–127. Landon BE, Normand SL, Blumenthal D, Daley J. 2003. Physician clinical performance assessment: Prospects and barriers. JAMA 290(9):1183–1189. Laya MB, Larson EB, Taplin SH, White E. 1996. Effect of estrogen replacement therapy on the specificity and sensitivity of screening mammography. Journal of the National Cancer Institute 88(10):643–649. Linver MN, Paster SB, Rosenberg RD, Key CR, Stidley CA, King WV. 1992. Improvement in mammography interpretation skills in a community radiology practice after dedicated teaching courses: 2-year medical audit of 38,633 cases. Radiology 184(1):39–43. Litherland JC, Evans AJ, Wilson AR. 1997. The effect of hormone replacement therapy on recall rate in the National Health Service Breast Screening Programme. Clinical Radiology 52(4):276–279. Mandelson MT, Oestreicher N, Porter PL, White D, Finder CA, Taplin SH, White E. 2000. Breast density as a predictor of mammographic detection: Comparison of interval- and screen-detected cancers. Journal of the National Cancer Institute 92(13):1081–1087. Mansel RE. 2000. Should specialist breast units be adopted in Europe? A comment from Europe. European Journal of Cancer 36(18):2286–2287. Mazmanian PE, Davis DA. 2002. Continuing medical education and the physician as a learner: Guide to the evidence. JAMA 288(9):1057–1060. Meyer JE, Eberlein TJ, Stomper PC, Sonnenfeld MR. 1990. Biopsy of occult breast lesions. Analysis of 1261 abnormalities. JAMA 263(17):2341–2343. Monsees BS, Destouet JM. 1992. A screening mammography program. Staying alive and making it work. Radiologic Clinics of North America 30(1):211–219. Multidisciplinary coordination expedites care, builds volumes. 2003 (October 3). Oncology Watch. National Committee for Quality Assurance. 2004. The State of Health Care Quality. Washington, DC: National Committee for Quality Assurance. National Consortium of Breast Centers, Inc. 2004. Quality: What Do YOU Mean By “Quality”? [Online]. Available: http://www.breastcare.org [accessed December 10, 2004]. National Health Service. 2003. NHS Breast Screening Programme Annual Review 2003. NHS Breast Cancer Screening Programmes, Sheffield, United Kingdom. National Radiographers Quality Assurance Coordinating Group. 2000. Quality Assurance Guidelines for Radiographers. 2nd ed. Publication No. 30. Sheffield, UK: NHSBSP Publications. Newstead GM, Schmidt RA, Chambliss J, Kral ML, Edwards S, Nishikawa RM. 2003. Are radiology residents adequately trained in screening mammography? Comparison of radiology resident performance with that of general radiologists in a simulated screening exercise. [Abstract]. Radiology 229:405.

OCR for page 24
Improving Breast Imaging Quality Standards Nodine CF, Kundel HL, Mello-Thoms C, Weinstein SP, Orel SG, Sullivan DC, Conant EF. 1999. How experience and training influence mammography expertise. Academic Radiology 6(10):575–585. Nystrom L, Rutqvist LE, Wall S, Lindgren A, Lindqvist M, Ryden S, Andersson I, Bjurstam N, Fagerberg G, Frisell J. 1993. Breast cancer screening with mammography: Overview of Swedish randomised trials. Lancet 341(8851):973–978. Palmer RH, Hargraves JL. 1996. The ambulatory care medical audit demonstration project. Research design. Medical Care 34(9 Suppl):SS12–SS28. Pankow JS, Vachon CM, Kuni CC, King RA, Arnett DK, Grabrick DM, Rich SS, Anderson VE, Sellers TA. 1997. Genetic analysis of mammographic breast density in adult women: Evidence of a gene effect. Journal of the National Cancer Institute 89(8):549–556. Perry NM. 2003. Interpretive skills in the National Health Service Breast Screening Programme: Performance indicators and remedial measures. Seminars in Breast Disease 6(3):108–113. Perry NM. 2004a (September 2). Mammography Quality and Performance in the National Health Service Breast Screening Programme. Presentation at the meeting of the Institute of Medicine Committee on Improving Mammography Quality Standards, Washington, DC. Perry NM. 2004b. Breast cancer screening—the European experience. International Journal of Fertility & Women’s Medicine 49(5):228–230. Persson I, Thurfjell E, Holmberg L. 1997. Effect of estrogen and estrogen-progestin replacement regimens on mammographic breast parenchymal density. Journal of Clinical Oncology 15(10):3201–3207. Physician Insurers Association of America. 2002. Breast cancer study. 3rd ed. Rockville, MD: Physician Insurers Association of America. Pisano ED, Yankaskas BC, Ghate SV, Plankey MW, Morgan JT. 1995. Patient compliance in mobile screening mammography. Academic Radiology 2(12):1067–1072. Poplack SP, Tosteson AN, Grove MR, Wells WA, Carney PA. 2000. Mammography in 53,803 women from the New Hampshire Mammography Network. Radiology 217(3):832–840. Porter PL, El-Bastawissi AY, Mandelson MT, Lin MG, Khalid N, Watney EA, Cousens L, White D, Taplin S, White E. 1999. Breast tumor characteristics as predictors of mammographic detection: Comparison of interval- and screen-detected cancers. Journal of the National Cancer Institute 91(23):2020–2028. Rabinowitz B. 2000. Psychologic issues, practitioners’ interventions, and the relationship of both to an interdisciplinary breast center team. Surgical Oncology Clinics of North America 9(2):347–365. Rabinowitz B. 2004. Interdisciplinary breast cancer care: Declaring and improving the standard. Oncology (Huntington) 18(10):1263–1268. Raza S, Rosen MP, Chorny K, Mehta TS, Hulka CA, Baum JK. 2001. Patient expectations and costs of immediate reporting of screening mammography: Talk isn’t cheap. American Journal of Roentgenology 177(3):579–583. Records SF. 1995. Female breast cancer is most prevalent cause of malpractice claims. Journal of the Oklahoma State Medical Association 88(7):311–312. Reis LAG, Miller BA, Hankey BF. 1994. SEER cancer statistics review, 1973–1991. Bethesda, MD: National Cancer Institute. Roberts MM, Alexander FE, Anderson TJ, Chetty U, Donnan PT, Forrest P, Hepburn W, Huggins A, Kirkpatrick AE, Lamb J. 1990. Edinburgh trial of screening for breast cancer: Mortality at seven years. Lancet 335(8684):241–246.

OCR for page 24
Improving Breast Imaging Quality Standards Robertson MK, Umble KE, Cervero RM. 2003. Impact studies in continuing education for health professions: Update. Journal of Continuing Education in the Health Professions 23(3):146–156. Roblin DW. 1996. Applications of physician profiling in the management of primary care panels. Journal of Ambulatory Care Management 19(2):59–74. Ross G, Johnson D, Castronova F. 2000. Physician profiling decreases inpatient length of stay even with aggressive quality management. American Journal of Medical Quality 15(6):233–240. Rutter CM, Taplin S. 2000. Assessing mammographers’ accuracy. A comparison of clinical and test performance. Journal of Clinical Epidemiology 53(5):443–450. Saftlas AF, Hoover RN, Brinton LA, Szklo M, Olson DR, Salane M, Wolfe JN. 1991. Mammographic densities and risk of breast cancer. Cancer 67(11):2833–2838. Scheiden R, Sand J, Tanous AM, Capesius C, Wagener C, Wagnon MC, Knolle U, Faverly D. 2001. Consequences of a national mammography screening program on diagnostic procedures and tumor sizes in breast cancer. A retrospective study of 1540 cases diagnosed and histologically confirmed between 1995 and 1997. Pathology, Research & Practice 197(7):467–474. Schwartz LM, Woloshin S, Sox HC, Fischhoff B, Welch HG. 2000. U.S. women’s attitudes to false-positive mammography results and detection of ductal carcinoma in situ: Cross-sectional survey. Western Journal of Medicine 173(5):307–312. Shapiro S, Venet W, Strax P, Venet L. 1988. Periodic Screening for Breast Cancer: The Health Insurance Plan Project and its Sequelae, 1963–1968. Baltimore, MD: Johns Hopkins University Press. Sickles EA. 1992. Quality assurance. How to audit your own mammography practice. Radiologic Clinics of North America 30(1):265–275. Sickles EA. 1995a. How to conduct an audit. In: Kopans DB, ed. Categorical Course in Breast Imaging. Oak Brook, IL: Radiological Society of North America. Pp. 81–91. Sickles EA. 1995b. Latent image fading in screen-film mammography: Lack of clinical relevance for batch-processed films. Radiology 194(2):389–392. Sickles EA. 2003. The American College of Radiology’s Mammography Interpretive Skills Assessment (MISA) examination. Seminars in Breast Disease 6(3):133–139. Sickles EA, Miglioretti DL, Ballard-Barbash R, Geller BM, Leung JW, Rosenberg RD, Smith-Bindman R, Yankaskas BC. In press. Performance benchmarks for diagnostic mammography. Radiology. Sickles EA, Weber WN, Galvin HB, Ominsky SH, Sollitto RA. 1986. Mammographic screening: How to operate successfully at low cost. Radiology 160(1):95–97. Sickles EA, Wolverton DE, Dee KE. 2002. Performance parameters for screening and diagnostic mammography: Specialist and general radiologists. Radiology 224(3):861–869. Silverstein MJ. 1973. The multidisciplinary breast clinic—a new approach. UCLA Cancer Bulletin 1:5. Silverstein MJ. 2000. State-of-the-art breast units—a possibility or a fantasy? A comment from the U.S. European Journal of Cancer 36(18):2283–2285. Smith RA, Cokkinides V, Eyre HJ. 2005. American Cancer Society guidelines for the early detection of cancer, 2005. CA: A Cancer Journal for Clinicians 55(1):31–44. Smith RA, D’Orsi C. 2004. Screening for breast cancer. In: Harris JR, Lippman ME, Morrow M, Osborne CK, eds. Diseases of the Breast. New York: Lippincott Williams & Wilkins. Pp. 103–130.

OCR for page 24
Improving Breast Imaging Quality Standards Smith-Bindman R, Chu P, Miglioretti D, Quale C, Rosenberg RD, Cutter G, Geller B, Bacchetti P, Sickles EA, Kerlikowske K. 2005. Physician predictors of mammographic accuracy. Journal of the National Cancer Institute 97(5):358–367. Smith-Bindman R, Chu PW, Miglioretti DL, Sickles EA, Blanks R, Ballard-Barbash R, Bobo JK, Lee NC, Wallis MG, Patnick J, Kerlikowske K. 2003. Comparison of screening mammography in the United States and the United Kingdom. JAMA 290(16):2129–2137. Spoeri RK, Ullman R. 1997. Measuring and reporting managed care performance: Lessons learned and new initiatives. Annals of Internal Medicine 127(8 Pt 2):726–732. Steinberg KK, Thacker SB, Smith SJ, Stroup DF, Zack MM, Flanders WD, Berkelman RL. 1991. A meta-analysis of the effect of estrogen replacement therapy on the risk of breast cancer. JAMA 265(15):1985–1990. Sung HY, Kearney KA, Miller M, Kinney W, Sawaya GF, Hiatt RA. 2000. Papanicolaou smear history and diagnosis of invasive cervical carcinoma among members of a large prepaid health plan. Cancer 88(10):2283–2289. Tabar L, Fagerberg G, Duffy SW, Day NE, Gad A, Grontoft O. 1992. Update of the Swedish two-county program of mammographic screening for breast cancer. Radiologic Clinics of North America 30(1):187–210. Taplin SH, Ichikawa LE, Kerlikowske K, Ernster VL, Rosenberg RD, Yankaskas BC, Carney PA, Geller BM, Urban N, Dignan MB, Barlow WE, Ballard-Barbash R, Sickles EA. 2002. Concordance of Breast Imaging Reporting and Data System (BI-RADS) assessments and management recommendations in screening mammography. Radiology 222(2):529–535. Taplin SH, Rutter CM, Lehman C. Submitted. Testing the effect of computer assisted detection upon interpretive performance in screening mammography. Theberge I, Hebert-Croteau N, Langlois A, Major D, Brisson J. 2005. Volume of screening mammography and performance in the Quebec population-based Breast Cancer Screening Program. CMAJ Canadian Medical Association Journal 172(2):195–199. Thomson-O’Brien MA, Oxman AD, Davis DA, Haynes RB, Freemantle N, Harvey EL. 2004. Audit and feedback versus alternative strategies: Effects on professional practice and health care outcomes. [Review]. Cochrane Database of Systematic Reviews (2):CD000260. Thurfjell EL, Lernevall KA, Taube AA. 1994. Benefit of independent double reading in a population-based mammography screening program. Radiology 191(1):241–244. Tosteson AN, Begg CB. 1988. A general regression methodology for ROC curve estimation. Medical Decision Making 8(3):204–215. Tripathy D. 2004. Interdisciplinary breast cancer care: Declaring and improving the standard. [Review]. Oncology (Huntington) 18(10):1270–1275. U.S. Preventive Services Task Force. 2002. Screening for breast cancer: Recommendations and rationale. Annals of Internal Medicine 137(5 Pt 1):344–346. van der Horst F, Hendriks JHCL, Rijken HJTM, Holland R. 2003. Breast cancer screening in the Netherlands: Audit and training of radiologists. Seminars in Breast Disease 6(3):114–122. van Landeghem P, Bleyen L, De Backer G. 2002. Age-specific accuracy of initial versus subsequent mammography screening: Results from the Ghent Breast Cancer-Screening Programme. European Journal of Cancer Prevention 11(2):147–151. Veterans Health Administration. 2004. Quality Management (QM) and Patient Safety Activities that Can Generate Confidential Documents. Department of Veterans Affairs, VHA Directive 2004–051. Washington, DC: Veterans Health Administration. Wang SJ, Middleton B, Prosser LA, Bardon CG, Spurr CD, Carchidi PJ, Kittler AF, Goldszer RC, Fairchild DG, Sussman AJ, Kuperman GJ, Bates DW. 2003. A cost-benefit analysis of electronic medical records in primary care. American Journal of Medicine 114(5):397–403.

OCR for page 24
Improving Breast Imaging Quality Standards Warren-Burhenne L. 2003. Screening Mammography Program of British Columbia standardized test for screening radiologists. Seminars in Breast Disease 6(3):140–147. Warren-Burhenne LJ, Wood SA, D’Orsi CJ, Feig SA, Kopans DB, O’Shaughnessy KF, Sickles EA, Tabar L, Vyborny CJ, Castellino RA. 2000. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 215(2):554–562. Waynant RW, Chakrabarti K, Kaczmarek RA, Dagenais I. 1999. Testing optimum viewing conditions for mammographic image displays. Journal of Digital Imaging 12(2 Suppl 1):209–210. Weiner JP, Parente ST, Garnick DW, Fowles J, Lawthers AG, Palmer RH. 1995. Variation in office-based quality. A claims-based profile of care provided to Medicare patients with diabetes. JAMA 273(19):1503–1508. Weiss KB, Wagner R. 2000. Performance measurement through audit, feedback, and profiling as tools for improving clinical care. Chest 118(2 Suppl):53S–58S. White E, Miglioretti DL, Yankaskas BC, Geller BM, Rosenberg RD, Kerlikowske K, Saba L, Vacek PM, Carney PA, Buist DS, Oestreicher N, Barlow W, Ballard-Barbash R, Taplin SH. 2004. Biennial versus annual mammography and the risk of late-stage breast cancer. Journal of the National Cancer Institute 96(24):1832–1839. Wolk RB. 1992. Hidden costs of mobile mammography: Is subsidization necessary? American Journal of Roentgenology 158(6):1243–1245. Wooding D. 2003. PERsonal perFORmance in Mammographic Screening. [Online]. Available: http://ibs.derby.ac.uk/performs/index.shtml [accessed May 12, 2004]. Yankaskas BC, Cleveland RJ, Schell MJ, Kozar R. 2001. Association of recall rates with sensitivity and positive predictive values of screening mammography. American Journal of Roentgenology 177(3):543–549. Yankaskas BC, Klabunde CN, Ancelle-Park R, Renner G, Wang H, Fracheboud J, Pou G, Bulliard JL. 2004. International comparison of performance measures for screening mammography: Can it be done? Journal of Medical Screening 11(4):187–193. Yankaskas BC, Taplin SH, Ichikawa L, Geller BM, Rosenberg RD, Carney PA, Kerlikowske K, Ballard-Barbash R, Cutter GR, Barlow WE. 2005. Association between mammography timing and measures of screening performance in the United States. Radiology 234 (2):363–373. Zapka JG, Taplin SH, Solberg LI, Manos MM. 2003. A framework for improving the quality of cancer care: The case of breast and cervical cancer screening. Cancer Epidemiology, Biomarkers & Prevention 12(1):4–13. Zheng Y, Barlow W, Cutter G. 2005. Assessing accuracy of mammography in the presence of verification bias and intrareader correlation. Biometrics 61(1):259–268.