5
How Should the Evaluation Be Implemented?

Many of the problems that result in unsuccessful impact evaluations come about because the evaluation plan was not carried out as intended, not because the evaluation was poorly designed. Some of the more common areas in which study designs break down in implementation are:

  • failure to obtain the necessary number of cases to construct treatment and control groups and/or attain sufficient statistical power;

  • failure to acquire a suitable comparison group in quasi-experimental studies;

  • attrition, especially when it affects the treatment and control groups differently;

  • dilution of the service delivery that weakens the program being tested; and

  • failure to identify essential covariates or obtain measures of them in observational studies.

Problems such as these undermine the validity of the conclusions an impact evaluation can support and, if serious, can keep the study from being completed in any useful form. This section describes procedures that can reduce the likelihood of implementation problems and determine when an evaluation that is not likely to yield useful results should be aborted. The discussion is divided into subsections for actions that can be taken prior to awarding and during the evaluation contract. The common theme across these subsections is that forethought, careful planning, and



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 45
Improving Evaluation of Anticrime Programs 5 How Should the Evaluation Be Implemented? Many of the problems that result in unsuccessful impact evaluations come about because the evaluation plan was not carried out as intended, not because the evaluation was poorly designed. Some of the more common areas in which study designs break down in implementation are: failure to obtain the necessary number of cases to construct treatment and control groups and/or attain sufficient statistical power; failure to acquire a suitable comparison group in quasi-experimental studies; attrition, especially when it affects the treatment and control groups differently; dilution of the service delivery that weakens the program being tested; and failure to identify essential covariates or obtain measures of them in observational studies. Problems such as these undermine the validity of the conclusions an impact evaluation can support and, if serious, can keep the study from being completed in any useful form. This section describes procedures that can reduce the likelihood of implementation problems and determine when an evaluation that is not likely to yield useful results should be aborted. The discussion is divided into subsections for actions that can be taken prior to awarding and during the evaluation contract. The common theme across these subsections is that forethought, careful planning, and

OCR for page 45
Improving Evaluation of Anticrime Programs informed monitoring can minimize problems associated with the implementation of an impact evaluation. STEPS THAT CAN BE TAKEN PRIOR TO AWARDING THE EVALUATION CONTRACT Developing an Effective Request for Proposals (RFP) As noted in Chapter 2, an initial step for ensuring a high-quality evaluation is a well-developed account of the questions that need to be answered and the form such answers should take to be useful to the intended audience. These considerations, in turn, have rather direct implications for the design and implementation of an impact evaluation. The usual vehicle for translating this critical background information into guidelines and expectations for the evaluation design and implementation is a Request for Proposal (RFP) circulated to potential evaluators. An RFP that is based on solid information about the nature and circumstances of the program to be evaluated should encourage prospective evaluators to plan for the likely implementation problems. For instance, a thorough RFP might prompt the applicant to provide (a) a power analysis to support the proposed number of cases; (b) evidence that supports the claim that a sufficient number of cases will be available (e.g., pilot study results or analysis of agency data showing that the number of cases that fit the selection criteria were available in a recent period); (c) a carefully considered plan for actually obtaining the necessary number of cases; and (d) a management plan for overseeing and correcting, if necessary, the process of recruitment of cases for the study. When such background information is not provided in the RFP, it will fall to the evaluation contractor to discover it and adapt the evaluation plans accordingly. In such circumstances, the RFP and the terms of the evaluation contract must allow such flexibility. In addition, consideration must be given to the possibility that the discovery process will reveal circumstances that make successful implementation of the evaluation unlikely. Where there is significant uncertainty about the feasibility of an impact evaluation, a two-step contracting process would be advisable, with the first step focusing on developing background information and formulating the evaluation plan and the second step, if warranted, being the implementation of that plan and completion of the evaluation. Funding agencies and evaluators have used a number of approaches to developing the information needed to formulate an instructive RFP or planning the evaluation directly. Site visits, for example, are one common way to assess whether essential resources such as space, equipment, and staff will be available to the evaluation project and to ensure that key local

OCR for page 45
Improving Evaluation of Anticrime Programs partners are on board. An especially probing version of a site visit is a structured evaluability assessment of the sort described in Chapter 2. The distinctive function of an evaluability assessment is to focus specifically on questions critical to determining if a program is appropriate for impact evaluation and how such an evaluation would be feasible (Wholey, 1994). Prior process evaluations, as described in earlier chapters, may also provide detailed program information useful for developing an RFP and planning the impact evaluation. When there are questions about the availability of a sufficient number of participants to meet the requirements of an evaluation study, a “pipeline” analysis may be appropriate (Shadish, Cook, and Campbell, 2002). Pipeline studies are conducted prior to the actual evaluation as a pilot test of the specific procedures for identifying the cases that will be selected for an evaluation according to the planned eligibility criteria. They address the unfortunately common situation in which what appears to be an ample number of potential participants in the evaluation sharply diminishes when the actual selection is made. An illustration of the need for a pipeline analysis is presented in Box 5-1. Similarly, pilot or feasibility studies can test important procedures such as randomization and consent, for example, to determine what effects they may have on sample attrition. A preliminary study of this sort also provides an opportunity to discover other aspects of the program circumstances that may present problems or have implications for how the evaluation is designed. The evaluation reported by Berk (2003) of a prison classification scheme and that reported by Chamberlain (2003) of Multidimensional Treatment Foster Care for delinquents, for instance, both built on preliminary studies conducted before the main evaluation. For complex evaluations, a design advisory group consisting of experts in evaluation methodology and study design might be funded to assist in developing an evaluation plan that is informed by the findings from whatever preliminary studies have been conducted. Development of the RFP and interpretation of available information about the program circumstances must also consider issues related to how the evaluation is organized. Common models include configuration of the evaluation through one or more local evaluation teams, a national evaluator working directly with the local site(s), or a national evaluator working with local teams. Local evaluation teams have the advantage of proximity and the opportunity to develop close working relationships with the program, factors that facilitate implementation of the evaluation plan and effective quality control monitoring. However, they are not always able to marshal the level of expertise and experience available to a national team and, in multisite evaluations, obtaining comparable designs and outcome data across different local teams is often difficult. Prelimi-

OCR for page 45
Improving Evaluation of Anticrime Programs BOX 5-1 Pipeline Analyses and Pilot Testing A recent randomized trial funded by the National Institute on Drug Abuse testing the effects of the Strengthening Families Program for reducing drug use and antisocial behavior in a large, urban population encountered major challenges with recruitment and retention of participants (Gottfredson et al., 2004). Of 1,403 families recruited, only 1,036 registered and, of those, only 715 showed up to complete the pretest. Then, only 68 percent of these pretested families who had been randomly assigned to the intervention attended at least one session of the program. Although the research plan anticipated some attrition, the actual rate was much higher. In this instance, a pipeline analysis that conducted preliminary focused assessments of the likely yield at each step of the process would have helped avoid these problems. Surfacing the recruitment and retention problems earlier would then have allowed them to be better anticipated in the evaluation design. This same study provides an example of how pilot-testing the randomization procedures might reveal problems that could weaken the study design. This evaluation design involved three intervention conditions (equal numbers of sessions of child skills training only, parent skills training only, and parent and child skills training plus family skills training) compared with a minimal treatment control condition. Part way into the study it was discovered that families assigned to the parent skills only condition were significantly less likely to attend the program than families assigned to the other conditions, probably because they thought that their children, rather than themselves, needed the help. This differential attendance potentially compromised the comparison across conditions because any difference favoring the child-only and family conditions might have been attributed to the greater number of contact hours rather than the content of the program. A preliminary year of funding for piloting study procedures and conducting pipeline analyses would have strengthened this study by alerting the investigators to the challenges so that they could refine the procedures before the study began. nary investigations and input from an advisory panel that attends directly to the question of how best to organize the evaluation may be especially important for large multisite projects. Site visits, evaluability assessments, pipeline analyses, and other such preliminary investigations, of course, add to the cost of an evaluation and are often used, if at all, only for large projects. Those costs, however, must be balanced against the potentially greater cost of funding an evaluation that ultimately fails to be implemented well enough to produce useful

OCR for page 45
Improving Evaluation of Anticrime Programs results. Preliminary studies cannot ensure that problems will not arise during the course of the actual evaluation project. Nonetheless, they do help surface some of the potentially more serious problems so they can be handled beforehand or a decision made about whether to go ahead with the evaluation. Reviewing Evaluation Proposals Knowledgeable reviewers can contribute not only to the selection of sound evaluation proposals but also to improving the methodological quality and potential for successful implementation of those selected. The comments and suggestions of reviewers experienced in designing and implementing impact evaluations may identify weak areas and needed revision in even the highest scoring evaluation proposals under review. An agency can reduce the likelihood of implementation problems by using these comments and suggestions to require changes in the evaluation design before a grant or contract is awarded. Obtaining good advice about ways to improve the design and implementation of the most promising evaluation proposals, of course, requires that those reviewing the proposals have relevant expertise. In areas like criminal justice where there are strong conflicting opinions about methods of evaluation, it is critical to develop and maintain balanced review panels. When it is necessary for these panels to deal with proposals involving widely different evaluation methodologies, the reviewers collectively must be broad minded and eclectic enough to make reasoned comparisons of the relative merits of different approaches. One advantage of an agency process that produces RFPs that are well-developed and specific with regard to the relevant questions and preferred design is that review panels can be configured to represent expertise distinctive to the stipulated methods. Under these circumstances, a specialized panel will be more likely to provide advice that will improve the design and implementation plans of the more attractive proposals as well as better judge their initial quality. Agencies often struggle to design and carry out review processes that meet high standards of scientific quality while maintaining fairness and representation of diverse views. They may, for instance, include practitioners as well as scientific reviewers to ensure that the research funded has policy relevance. Diversity that extends much beyond research expertise in impact evaluation, however, will dilute rather than strengthen the ability of a review panel to select and improve evaluation proposals. This is an especially important consideration if impact evaluations that meet high scientific standards are desired. Practitioners rarely have the training and experience necessary to provide sound judgments on research methods and implementation, though their input may be very helpful for

OCR for page 45
Improving Evaluation of Anticrime Programs defining agency priorities and identifying significant programs for evaluation. If practitioner views on the policy relevance of specific evaluation proposals are desired, a two-stage review would be the best approach. The policy relevance of the programs under consideration for evaluation would be first judged by knowledgeable policy makers, practitioners, and researchers. Proposals that pass this screen would then receive a scientific review from a panel of well-qualified researchers. The review panels at this second stage could then focus solely on the scientific merit and likelihood of successful implementation of the proposed research. For purposes of obtaining careful reviews and sound advice for improving proposals, standing review committees rather than ad hoc ones have much to recommend them. The National Institutes of Health (NIH), for example, utilizes standing review committees with a rotating membership. This contrasts with other agencies, such as the National Institute of Justice, whose review committees are composed anew for each competition. A higher level of prestige is often associated with membership on a standing committee, making it more attractive to senior researchers. Members of standing panels also learn from each other and from prior proposals in ways that may improve the quality of their reviews and advice. In addition, standing panels become part of the infrastructure of the agency and develop an institutional memory helpful in maintaining consistency in reviews over time. Regardless of the form of the review panel, reviewers benefit from structure in the review process. A helpful aid, for instance, is a checklist or code sheet that includes guidelines for the level of rigor expected for different features of the research methods (e.g., basic design, measurement, etc.) and characteristic implementation issues (e.g., adequate samples, availability of data) for different types of studies. Such a list helps ensure thorough and consistent reviews and, if revised to incorporate prior experience, becomes a comprehensive guide to potential shortcomings in the design or implementation plans under consideration. Also, if included in the request for proposal, this list will encourage proposal authors to address the known problem areas and include sufficient detail for the resulting plans to be judged. Formulating a Management Plan Although agencies do not always require a detailed list of tasks to be completed by certain dates as part of an evaluation proposal, a clear plan in advance of the award can facilitate later project management. Such a plan could be required as a first step by a contractor or grantee selected to conduct an evaluation project. This plan would spell out specific milestones in the evaluation that must be reached by certain dates in order for

OCR for page 45
Improving Evaluation of Anticrime Programs the evaluation to proceed on schedule, for example, the successful recruitment of sites, configuration of experimental groups, and enrollment of subjects. A sound management plan would also identify critical benchmarks or events that must occur in order for the project to proceed toward successful implementation, e.g., letters of commitment from crucial local partners. Written memoranda of understanding (MOUs) with key partners are another strategy that can help keep a project on track during the implementation phase. Such MOUs might be required with all critical partners who have committed important resources (such as personnel to screen potential participants or to provide certain data). In many cases, the evaluator does not have the clout necessary to obtain the needed commitments. The funding agency may be in a better position to approach local agencies (e.g., police, corrections, schools) to obtain their cooperation. Despite the best efforts to ensure a sound and feasible plan for the evaluation, some impact evaluations will encounter major problems. However, some of those evaluations may nonetheless be salvageable if additional resources are available for the efforts required to overcome the problems. For example, in a multisite trial of domestic violence programs, one site may experience major difficulties unrelated to the study and be forced to close or considerably reduce its services. Potential replacement sites might be available, but the investigator may not have funds for recruitment and start-up in a new site. In this situation, augmenting the award with the funds necessary to add the replacement sites may be a more cost-effective option than allowing a diminished study to go forward. To cover such eventualities, agencies must maintain an emergency fund as a component of their budgeting for evaluation projects with well-specified procedures and guidelines for using it. Such a fund will be counterproductive, however, if it is not carefully directed toward solvable problems that obstruct what otherwise is a high probability of a successful evaluation project. STEPS THAT CAN BE TAKEN AFTER AWARDING THE EVALUATION CONTRACT The typical grant monitoring process requires periodic reporting by the grantee. For larger projects, more intensive monitoring is often used. This process is greatly facilitated when there is a detailed management plan (as described earlier) against which the agency staff can compare actual progress. When such a plan exists, agency staff can take a proactive approach to project monitoring by having telephone conferences at critical times to track the achievement of important milestones and benchmarks. The scale of criminal justice evaluation research is small enough

OCR for page 45
Improving Evaluation of Anticrime Programs that even one failed evaluation that could have been salvaged through early detection of problems and corrective actions is an important lost opportunity. For larger and more complex impact evaluations, technical advisory panels incorporated into the monitoring process may expand the range of expertise for anticipating and resolving implementation problems that arise. Agencies might, for instance, use standing committees of researchers—perhaps the same committees that review proposals—to periodically review the scientific aspects of the work and recommend agency responses. Site visits by a technical advisory panel could, for instance, offer valuable advice about recruitment strategies and data collection. As a last resort, the technical panel may suggest early termination of an evaluation to conserve resources for more promising research. Such visiting panels are a standard tool in NIH multisite clinical trial management. Properly conceived and constructed they can be perceived as helpful rather than threatening. It is common practice to monitor evaluation projects more carefully in the first year than in later years. Although it is clearly important to watch such projects closely in the critical early stages, it is also important to recognize that serious problems can develop in later stages. It is not unusual for evaluation procedures to be circumvented as those associated with a program become more familiar with them. For example, the program staff may learn over time how to manipulate a randomization procedure by altering the order in which cases are presented for randomization. Also, selective reporting to favor the program and even outright falsification of records may slowly creep in. Vigilance throughout the course of the evaluation project is required to catch such changes. Other mechanisms that can be used to enhance project success after funding include meetings of evaluators of similar projects and cluster conferences for evaluators. Several agencies may use such meetings to provide a forum in which challenges and potential solutions can be discussed. These interactions may be especially helpful when the programs being evaluated are similar, as in multisite projects with different local evaluators. An extension of this idea is the inclusion of outside expert researchers who are well respected in meetings with the evaluators. Such experts can comment on the progress of the effort and offer helpful advice. These researchers might be members of a standing review committee such as that described earlier who are already familiar with the work. Or, evaluators can simply be put in contact with veteran researchers who have experienced similar challenges in other projects. Of course, many veteran researchers have social networks on which they depend for such advice.

OCR for page 45
Improving Evaluation of Anticrime Programs But less experienced researchers or even experienced researchers who are new to a certain type of research would often benefit from consultation with others. Agencies might maintain a directory of experienced researchers who could be called upon to consult with grantees as situations arise. Advisory boards are often created for this purpose and may be especially helpful on large and complex projects.