Beginning in the 1970s, Rubin (1974) and many after him developed a formal theory of causal inference based on counterfactuals, which play an essential role in both individual- and population-level causal claims. On the individual level, for example, we might observe that a particular person was exposed to a chemical and later contracted a disease. The question we would like to answer is counterfactual: What would have happened had the same individual not been exposed to the chemical? Likewise, for other individuals who were not exposed, we might like to know what would have happened had they been exposed.
On the population level the questions are similar. We observe that a population of individuals, such as Gulf War veterans, were exposed to a variety of conditions and substances in their tour of duty in the Mideast and then exhibited a certain frequency of illness years later. The population question we would like to answer is counterfactual: What would the frequency of illnesses have been had the same population not been exposed to the conditions and substances they were exposed to in the Gulf War?
In Rubin’s framework, analyzing randomized clinical trials that involve assigning one group to treatment and another to control is a counterfactual missing data problem. For all the people in the treatment group, we are missing data on their response had they been assigned to the control group, and symmetrically for the control group.
Underneath these counterfactuals, however, is a subtle but crucial assumption about how the world should be imagined to have been different. Recall that “exposure to excessive radiation caused Mary to get leukemia” means that “had Mary not been exposed to excessive radiation she would not have gotten leukemia.” This makes sense if we envision a world identical to the one Mary did experience, but change it minimally by intervening to prevent her from being exposed to excessive radiation.
Consider a slightly different example, however. Suppose John smoked 30 roll-your-own cigarettes a day from age 25 to 50, had intensely tar-stained fingers during this period, and got lung cancer at the age of 51.
Sticking with common sense, we will assume that smoking caused John to have both tarstained fingers and lung cancer, but that having tar-stained fingers has no causal influence on getting lung cancer (Figure J-1). By the counterfactual theory, the following ought then to be true: “Had John not had tar-stained fingers, he would have gotten lung cancer anyway.” To make sense of this counterfactual, we might envision a world in which John still smoked, but either smoked packaged cigarettes that produced no finger stains or washed his hands with tar-