As a result of these issues, the only accepted study design using a definitive technique for evaluating a new screening test is a randomized trial of individuals at risk of cancer in which the endpoint is cancer mortality. Patients must be followed to ascertain and compare cancer-specific mortality rates, or total numbers of cancer deaths (if the same numbers of subjects are randomized to the comparison group).
These trials are necessarily large and expensive, and require many years of follow-up. The sample sizes for the breast screening trials have ranged from approximately 25,000 to more than 100,000 women, and the trials generally require in excess of 10 years of follow-up.1 To date, there have been only about a dozen or so definitive cancer prevention trials completed, several of them trials of mammography and breast cancer. However, these trials have validated the strategy that radiologic screening can reduce breast cancer mortality. The prevailing view among experts in the field of cancer prevention is that a definitive randomized trial of this nature (with cancer mortality as the endpoint) is necessary to validate strategies for any novel screening strategy.
Many techniques designed to enhance the accuracy of or complement mammography screening are under active development. These include digital mammography, computer-assisted detection (CAD), magnetic resonance imaging (MRI), and others. Demonstrations that any of these methods are successful in improving screening in a randomized trial of cancer mortality are prohibitively expensive, and so investigations focus on trials to demonstrate improved screening accuracy rather than improvements in mortality compared with mammography. Because we know that mammography saves lives, more accurate technologies must be presumed to save as many or more lives. Evaluating new diagnostic modalities with respect to accuracy is methodologically challenging, and can be affected by numerous biases. Resulting from a good deal of recent research on the appropriate methodological designs of these trials, a comprehensive summary of current thinking on the issue is contained in the recent Standards for Reporting Diagnostic Accuracy (STARD) guidelines for published articles.3 A related project by a team of experts to develop a quality assessment tool (QUADAS: Quality Assessment of Diagnostic Accuracy Studies) provides a concise tabulation of the key issues that challenge the validity of studies of diagnostic accuracy.71
The key issues from the STARD and QUADAS checklists that pertain to the design of studies to evaluate breast cancer screening technologies can be grouped broadly into four general categories: