Establishing credible estimates of what the outcomes would have been without the program, all else equal, is the most demanding part of impact evaluation, but also the most critical. When those estimates are convincing, the effects found in the evaluation can be attributed to the program rather than to any of the many other possible influences on the outcome variables. In this case, the evaluation is considered to have high internal validity. For example, a simple comparison of recidivism rates for those sentenced to prison and those not sentenced would have low internal validity for estimating the effect of prison on reoffending. Any differences in recidivism outcomes could easily be due to preexisting differences between the groups. Judges are more likely to sentence offenders to prison who have serious prior records. Prisoners’ greater recidivism rates may not be the result of their prison experience but, rather, the fact that they are more serious offenders in the first place. The job of a good impact evaluation design is to neutralize or rule out such threats to the internal validity of a study.

Although numerous research designs are used to assess program effects, it is useful to classify them into three broad categories: randomized experiments, quasi-experiments, and observational designs. Each, under optimal circumstances, can provide a valid answer to the question of whether a program has an effect upon the outcomes of interest. However, these designs differ in the assumptions they make, the nature of the problems that undermine those assumptions, the degree of control the researcher must have over program exposure, the way in which they are implemented, the issues encountered in statistical analysis, and in many other ways as well. As a result, it is difficult to make simplistic generalizations about which is the best method for obtaining a valid estimate of the effect of any given intervention. We return to this issue later but first provide an overview of the nature of each of these types of designs.


In randomized experiments, the units toward which program services are directed (usually people or places) are randomly assigned to receive the program or not (intervention and control conditions, respectively). For example, in the Minneapolis Hot Spots Experiment (Sherman and Weisburd, 1995), 110 crime hot spots were randomly allocated to an experimental condition that received high levels of preventive patrol and a control condition with a lower “business as usual” level of patrol. The researchers found a moderate, statistically significant program effect on crime rates. Because the hot spots were assigned by a chance process that

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement