of both the sensitivity and specificity of the new technology. Similarly, failure to enroll an appropriate spectrum of patients in a study of a new therapy can lead to an overestimate of the effectiveness of that therapy. This particular failure lies at the root of countless headlines announcing new breakthrough procedures or therapies that kindle excitement, but deliver only false hopes—and leave the public wondering why there are so few breakthroughs in their own treatment.

Failure to Use Appropriate Controls or Comparison Groups. The purpose of a control group is to allow the observer to conclude that any change observed in the “active treatment group” is due to the treatment being studied, rather than to other factors. Control groups are particularly important when factors in addition to the intervention under study can affect the outcome of interest, when the new technology of interest and some established technology are both effective, and when the natural course of untreated disease is not clear or consistent, as is the case with breast cancer. Failure to use a control group, or use of an inappropriate control group, can make it impossible to draw meaningful conclusions from a study.

Failure to Demonstrate the Comparability of Patients in Treatment and Control Groups. Given the purpose of a control group, it is important that patients in the treatment and control groups be similar in terms of baseline characteristics that can influence the outcome of the intervention under study. For example, if one study group included more women at high risk for breast cancer than another group, then a detection technology tested in the high-risk group would likely detect more cancer cases than a technology tested in the low-risk group, leading to the perception that the detection system was more sensitive than a system tested in lower risk patients.

Unclear Definition of Study Endpoints. Medical technologies can be assessed a multiple levels, depending upon whether they are diagnostic or therapeutic. The most basic level at which a diagnostic technology can be assessed is definition of its performance characteristics—sensitivity and specificity. Even this basic level of assessment is not easy to perform. It requires comparison of the performance of the new technology with that of a gold standard. And true gold standards (such as tissue obtained during surgery) are not always available.

Bias. The confidence that you can have that the results of using a technology described in a study are the same results you would get if you used the technology in a similar fashion depends on the absence of bias. Bias is systematic sources of variation that distort the results of a study in one direction or another. There are many types of bias that have been

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement