treatment C.1 These are called potential outcomes. Children are viewed, then, as having potential outcomes, only some of which will ever be realized. Several conclusions follow from this definition.

First, the causal effect is defined uniquely for each child. The impact of the treatment can thus vary from child to child. Modern thinking about cause thus rejects the conventional assumption that a new treatment adds a constant effect for every child. This assumption, never realistic to scientists or practitioners, was historically made to simplify statistical analysis.

Second, the causal effect cannot be observed. If a given child is assigned to E, we will observe the outcome under E but not the outcome under C for that child. But if the child is assigned to C, we will observe the outcome under C but not the outcome under E. Holland (1986) refers to the fact that only one of two potential outcomes can be observed as the fundamental problem of causal inference.

Third, although a given child will ultimately receive only one treatment, say, treatment E, it must be reasonable at least to imagine a scenario in which that child could have received C. And similarly, even though another child received C, it must be reasonable to imagine a scenario in which that child had received E. If it is not possible to conceive of each child's response under each treatment, then it is not possible to define a causal effect. There must, then, be a road not taken that could have been taken, for each child. Thus, both the outcome under E and the outcome under C must exist in principle even if both cannot be observed in practice. Therefore, in current thinking about cause in statistical science, a fixed attribute of a child (say sex or ethnic background) cannot typically be a cause. We cannot realistically imagine how a girl would have responded if she had been a boy or how a black child would have responded if that child had been white. Epidemiologists referred to such attributes as fixed markers (Kraemer et al., 1997), unchangeable attributes that are statistically related to an outcome but do not cause the outcome.

This theory of causation provides new insights into why randomized experiments are valuable. It also provides a framework for how to think about the problem of causal inference when randomized experiments are not possible.

According to the RRH theory, the problem of causal inference is a problem of missing data. If both potential outcomes were observable, the causal effect could be directly calculated for each participant. But one of the potential outcomes is inevitably missing. If the data were missing completely at random, we could compute an unbiased estimate of the average


The causal effect could also be defined as the ratio Yi(E)/Yi(C), depending on the scale of Y, but we limit this discussion to causal effects as differences for simplicity.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement