hypotheses. The null is chosen to be consistent with the “status quo”—statistical independence or no association between exposure and outcome. The alternative is chosen to represent the opposite point of view: that an association exists between exposure and outcome (that is, they are not independent). A summary statistic, called a test statistic, is calculated that gauges how well the data “match” the null hypothesis. In general, small values of the test statistic reflect consistency with the null, and large values consistency with the alternative. The magnitude of the test statistic is compared with its expected size under the null hypothesis. The difference between the observed value of the statistic and its expected value under the null is evaluated while taking into consideration such factors as the size of the sample and variability of the measurements.
In reporting the results of a statistical test of association, researchers report a p value, or the probability of observing a test statistic as large as or larger than (in absolute value) that obtained from the sample if the null hypothesis is true. Small p values therefore indicate that the probability of observing a result as extreme as or more extreme than that obtained in the study is very unlikely if the null is true. By convention, most researchers use a p value of 0.05 as the threshold value for rejecting the null hypothesis. Therefore, if researchers observe p < 0.05, they state that a result is “statistically significant”; if the p value exceeds 5%, they state that the result is nonsignificant.
It is possible to make two types of errors in conducting a statistical test of association. First, the null hypothesis might be rejected when it is true, simply because of chance variation. That is called a type I error, or α. The second type of error is failure to reject the null hypothesis when the alternative is true. That is called a type II error. One minus the type II error, or the probability of rejecting the null when the alternative is true, is called the power of a test. In general, researchers want both error rates to be low. In practice, the type I error is usually set to an acceptable level (usually 5%, as indicated above), and a study is designed to obtain a suitably large value for the power. Power is a function of the size of the study sample, the duration of followup, and the strength of the exposure effect. Longer followup will also allow examination of a range of latent periods between exposure and diagnosis of disease.
Bias refers to systematic or nonrandom error. Bias causes an observed value to deviate from the true value. It can weaken an association or generate a spurious