• Ten thousand relevant compounds were screened for biological activity.
• Five hundred passed the initial screen and were subjected to in vitro experiments.
• Twenty-five passed this screening and were studied in Phase I animal trials.
• One passed this screening and was studied in an expensive Phase II human trial.
These numbers are completely compatible with the presence of nothing but noise, assuming the screening was done based on statistical significance at the 0.05 level, as is common. (Even if none of the 10,000 compounds has any effect, roughly 5 percent of the compounds would appear to have an effect at the first screening; roughly 5 percent of the 500 screened compounds would appear to have an effect at the second screening, etc.)
This problem, often called the “multiplicity” or “multiple testing” problem, is of central importance not only in drug development, but also in microarray and other bioinformatic analyses, syndromic surveillance, high-energy physics, high-throughput screening, subgroup analysis, and indeed any area of science facing an inundation of data (which most of them are). The section of Chapter 2 on high-dimensional data indicates the major advances happening in statistics and mathematics to meet the challenge of multiplicity and to help restore the reproducibility of science.