This perspective can be particularly useful for thinking about causal inference when X cannot be manipulated (e.g., age), but is typically less useful for studying the effects of interventions. The primary challenge is how one can establish nonspuriousness, and Granger’s perspective provides few guidelines in this regard relative to the Campbell and Rubin perspectives.

A second class of perspectives, founded in philosophical and computer science, takes a graph theoretic approach. In this approach, a complex model of the process is specified and is compared with data. If the model and its underlying assumptions are true, the approach can discern whether causal inferences can be supported. Within this approach, a computer program known as Tetrad (Spirtes et al., 2000) can identify any other models involving the set of variables in the system that provide an equally good account of the data, if they exist. In separate work, Pearl (2009) has also utilized the graph theoretic approach, developing a mathematical calculus for problems of causal inference. This approach offers great promise as a source of understanding of causal effects in complex systems in the future. Compared with the Campbell and Rubin approaches, however, to date it has provided little practical guidance for researchers attempting to strengthen the inferences about the effectiveness of interventions that can be drawn from evaluation studies.

How Well Do Alternative Designs Work?

Early attempts to compare the magnitude of the causal effects estimated from RCTs and nonrandomized designs used one of two approaches. First, the results from an RCT and a separate observational study investigating the same question were compared. For example, LaLonde (1986) found that an RCT and nonrandomized evaluations of the effectiveness of job training programs led to completely different results. Second, the results of interventions in a research area evaluated using randomized and nonrandomized designs were compared in a meta-analytic review. For example, Sacks and colleagues (1983) compared results of RCTs of medical interventions with results of nonrandomized designs using historical controls, finding that the nonrandomized designs overestimated the effectiveness of the interventions. A number of studies showing the noncomparability of results of RCTs and nonrandomized designs exist, although many of the larger meta-analyses in medicine (Ioannidis et al., 2001) and the behavioral sciences (Lipsey and Wilson, 1993) find no evidence of consistent bias.

More recently, Cook and colleagues (2008) compared the small set of randomized and nonrandomized studies that shared the same treatment group and the same measurement of the outcome variable. All cases in which an RCT was compared with a regression discontinuity or interrupted time series study design (see Appendix E for discussion of these study designs) showed no differences in effect size. Observational studies produced more variable results, but the results did not differ from those of an RCT given that (1) a control group of similar participants was employed or (2) the mechanism for selection into treatment and control groups was known. Hernán



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement