For example, cases are classified as normal or abnormal according to a specific reader’s interpretation bias. As the decision point for identifying an abnormal result is shifted to the left or right, the proportions of true positives and true negatives change. Thus, the relationship reveals the tradeoff between sensitivity and specificity. For instance, using an enriched set of data with 100 exams that includes 10 true cancers, if a reader correctly identifies 8 cancers while missing 2 cancers (a sensitivity rating of 0.8 or 80 percent) of the 90 true negatives, the reader only correctly identified only 72 as normal (a specificity of 0.8 or 80 percent). However, a reader with more stringent criteria for an abnormal test may have a higher false-negative rate, increasing the number of missed cancers and decreasing the number of false alarms. For example, using the same distribution of 100 cases, the reader would correctly identify 6 cancers (a sensitivity of 0.7 or 70 percent). However, the reader would now correctly identify 85 true negatives as normal (a specificity of 0.94 or 94 percent). The opposite effect could also be obtained for a reader with less stringent criteria. Such a reader would have a higher false-positive rate but would find more cancers. Therefore, this example shows that these measures that depend on the true-positive rate and true-negative rate respectively are reader specific. In the case of interpreting mammograms, different radiologists with different decision thresholds can affect the clinical outcome in the assessment of non-obvious mammograms.

ROC CURVES ARE NECESSARY TO CHARACTERIZE DIAGNOSTIC PERFORMANCE

The ROC curve maps the effects of varying decision thresholds, accounting for all possible combinations of various correct and incorrect decisions.4 A ROC curve is a graph of the relationship between the true-positive rate (sensitivity) and the false-positive rate (1-specificity) (see Figure C-1). For example, a ROC curve for mammography would plot the fraction of confirmed breast cancer cases (true positives) that were detected against the fraction of false alarms (false positives). Each point on the curve might represent another test, for instance each point would be the result of a different radiologist reading the same set of 20 mammograms. Alternatively, each point might represent the results of the same radiologist reading a different set of 20 mammograms.

The ROC curve describes the overall performance of the diagnostic modality across varying conditions. Sources of variation for these conditions can include different radiologist’s decision thresholds, different amounts of time between interpreting mammograms, or variation within cases due to the inherent imprecision of breast compression. ROC analysis allows one to average the effect of different conditions on accuracy. There-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement