In theory, a program should be assessed against the stipulated outcomes it was meant to produce. A full program evaluation would include a process evaluation, which assesses the quality, consistency, and comprehensiveness of a program’s implementation, and an outcome assessment. The data for the assessment would include valid and reliable quantitative measures of the desired outcomes. For programs aimed at achieving a variety of results, metrics could be included for all of them. In theory, outcome data are available regularly, in time series, so that routine review of progress for both formative and summative evaluations can be undertaken.
Textbook evaluations presume a fully developed causal model that includes all the factors (including other public programs) that can contribute to the outcomes of concern. Only if all these influences are taken into account is it possible to determine the extent to which the program itself independently influences the results. The most convincing demonstrations of cause and effect depend on experimental and quasi-experimental research designs (see, e.g., Campbell and Stanley, 1966). When experimentation is not feasible, evaluations can measure a broad range of influences and statistically separate the effects of the program from the effects of other variables.
All these methods require a large number of cases, with the program applied in some and not others. They also depend on having policy objectives that are clear, unambiguous, and noncontradictory and on having all the required data. When these characteristics are not present, evaluation is much more complicated.
In fact, the textbook approach to evaluation has been possible only with some medical, public health, and social programs in which well-defined interventions are used in fairly large populations with well-defined objectives. And even in some of these programs, the evaluation has been difficult because the policy is vague or has multiple, partially incompatible goals (such as prison programs aimed simultaneously at punitive and rehabilitative outcomes). Outcome measures are also likely to be prone to multiple interpretations or to be controversial among the stakeholders.
Research programs are often particularly difficult to evaluate by the textbook model (see Bozeman and Melkers, 1993; National Research