3
When Is an Impact Evaluation Appropriate?

Of the many evaluation questions that might be asked for any criminal justice program, the one that is generally of most interest to policy makers is, “Does it work?” That is, does the program have the intended beneficial effects on the outcomes of interest? Policy makers, for example, might wish to know the effects of a “hot spots” policing program on the rate of violent crime (Braga, 2003) or whether vigorous enforcement of drug laws results in a decrease in drug consumption. As described in the previous chapter, answering these types of questions is the main focus of impact evaluation.

A valid and informative impact evaluation, however, cannot necessarily be conducted for every criminal justice program whose effects are of interest to policy makers. Impact evaluation is inherently difficult and depends upon specialized research designs, data collections, and statistical analysis (discussed in more detail in the next chapter). It simply cannot be carried out effectively unless certain minimum conditions and resources are available no matter how skilled the researchers or insistent the policy makers. Moreover, even under otherwise favorable circumstances, it is rarely possible to obtain credible answers about the effects of a criminal justice program within a short time period or at low cost.

For policy makers and sponsors of impact evaluation research, this situation has a number of significant implications. Most important, it means that to have a reasonable probability of success, impact evaluations should be launched only with careful planning and firm indications that the prerequisite conditions are in place. In the face of the inevitable limited resources for evaluation research, how programs are selected for



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 22
Improving Evaluation of Anticrime Programs 3 When Is an Impact Evaluation Appropriate? Of the many evaluation questions that might be asked for any criminal justice program, the one that is generally of most interest to policy makers is, “Does it work?” That is, does the program have the intended beneficial effects on the outcomes of interest? Policy makers, for example, might wish to know the effects of a “hot spots” policing program on the rate of violent crime (Braga, 2003) or whether vigorous enforcement of drug laws results in a decrease in drug consumption. As described in the previous chapter, answering these types of questions is the main focus of impact evaluation. A valid and informative impact evaluation, however, cannot necessarily be conducted for every criminal justice program whose effects are of interest to policy makers. Impact evaluation is inherently difficult and depends upon specialized research designs, data collections, and statistical analysis (discussed in more detail in the next chapter). It simply cannot be carried out effectively unless certain minimum conditions and resources are available no matter how skilled the researchers or insistent the policy makers. Moreover, even under otherwise favorable circumstances, it is rarely possible to obtain credible answers about the effects of a criminal justice program within a short time period or at low cost. For policy makers and sponsors of impact evaluation research, this situation has a number of significant implications. Most important, it means that to have a reasonable probability of success, impact evaluations should be launched only with careful planning and firm indications that the prerequisite conditions are in place. In the face of the inevitable limited resources for evaluation research, how programs are selected for

OCR for page 22
Improving Evaluation of Anticrime Programs impact evaluation may also be critical. Broad priorities that spread resources too thinly may reduce the likelihood that any evaluation can be carried out well enough to produce credible and useful results. Focused priorities that concentrate resources in relatively few impact evaluations may be equally unproductive if the program circumstances for those few are not amenable to evaluation. There are no criteria for determining which programs are most appropriate for impact evaluation that will ensure that every evaluation can be effectively implemented and yield valid findings. Two different kinds of considerations that are generally relevant are developed here—one relating to the practical or political significance of the program and one relating to how amenable it is to evaluation. SIGNIFICANCE OF THE PROGRAM Across the full spectrum of criminal justice programs, those that may be appropriate for impact evaluation will not generally be identifiable through any single means or source. Participants in different parts of the system will have different interests and priorities that focus their attention on different programs. Sponsors and funders of programs will often want to know if the programs in which they have made investments have the desired effects. Practitioners may be most interested in evaluations of the programs they currently use and of alternative programs that might be better. Policy makers will be interested in evaluations that help them make resource allocation decisions about the programs they should support. Researchers often focus their attention on innovative program concepts with potential importance for future application. It follows that adequate identification of programs that may be significant enough to any one of these groups to be candidates for impact evaluation will require input from informed representatives of that group. Sponsors of evaluation research across the spectrum of criminal justice programs will need input from all these groups if they wish to identify the candidates for impact evaluation likely to be most significant for the field. Two primary mechanisms create programs for which impact evaluation may contribute vital practical information. One mechanism is the evolution of innovative programs or the combination of existing program elements into new programs that have great potential in the eyes of the policy community. Such programs may be developed by researchers or practitioners and fielded rather narrowly. The practice of arresting perpetrators of domestic violence when police were called to the scene began in this fashion (Sherman, 1992). With the second mechanism, programs spring into broad acceptance as a result of grassroots enthusiasm but may

OCR for page 22
Improving Evaluation of Anticrime Programs lack an empirical or theoretical underpinning. Project DARE, with its use of police officers to provide drug prevention education in schools, followed that path. Programs stemming from both sources are potentially significant, though for different reasons, and it would be shortsighted to focus on one to the exclusion of the other. Given a slate of candidate programs for which impact evaluation may have significance for the field from the perspective of one concerned group or another, it may still be necessary to set priorities among them. A useful conceptual framework from health intervention research for appraising the significance of an intervention is summarized in the acronym, RE-AIM, for Reach, Effectiveness, Adoption, Implementation, and Maintenance (Glasgow, Vogt, and Boles, 1999). When considering whether a program is a candidate for impact evaluation these elements can be thought of as a chain with the potential value of an evaluation constrained by the weakest link in that chain. These criteria can be used to assess a program’s significance and, correspondingly, the value of evaluation results about its effects. We will consider these elements in order. Reach. Reach is the scope of the population that could potentially benefit from the intervention if it proves effective. Other things equal, an intervention validated by evaluation that is applicable to a larger population has more practical significance than one applicable to a smaller population. Reach may also encompass specialized, hard-to-serve populations for which more general programs may not be suitable. Drug courts, from this perspective, have great reach because of the high prevalence of substance abuse among offenders. A culture-specific program to reduce violence against Native American women, however, would also have reach because there are currently few programs tailored for this population. Effectiveness. The potential value of a program is, of course, constrained by its effectiveness when it is put into practice. It is the job of impact evaluation to determine effectiveness, which makes this a difficult criterion to apply when selecting programs for impact evaluation. Nonetheless, an informed judgment call about the potential effectiveness of a program can be important for setting evaluation priorities. For some programs, there may be preliminary evidence of efficacy or effectiveness that can inform judgment. Consistency with well-established theory and the clinical judgment of experienced practitioners may also be useful touchstones. The positive effects of cognitive-behavioral therapies demonstrated for a range of mental health problems, for instance, supports the expectation that they might also be effective for sex offenders. Adoption. Adoption is the potential market for a program. Adoption is a complex constellation of ideology, politics, and bureaucratic prefer-

OCR for page 22
Improving Evaluation of Anticrime Programs ences that is influenced by intellectual fashion and larger social forces as well as rational assessment of the utility of a program. Given equal effectiveness and ease of implementation, some programs will be less attractive and acceptable to potential users than others. The assessment of those factors by potential adopters can thus provide valuable information for prioritizing programs for impact evaluation. The widespread adoption of bootcamps during the 1990s, for instance, indicated that this type of paramilitary program had considerable political and social appeal and was compatible with the program concepts held by criminal justice practitioners. Implementation. Some programs are more difficult to implement than others, and for some it may be more difficult to sustain the quality of the service delivery in ongoing practice. Other things equal, a program that is straightforward to implement and sustain is more valuable than a program that requires a great deal of effort and monitoring to yield its full potential. Mentoring programs as a delinquency prevention strategy for at-risk juveniles, for instance, are generally easier and less costly to implement than family counseling programs with their requirements for highly trained personnel and regular meetings with multiple family members. Maintenance. Maintenance, in this context, refers to the maintenance of positive program effects over time. The more durable the effect of a program, the greater is its value as a beneficial social intervention. For instance, if improved street lighting reduces street crimes by making high crime areas more visible (Farrington and Welsh, 2002), the effects are not likely to diminish significantly as long as criminals prefer to conduct their business away from public view. Making good judgments on such criteria in advance of an impact evaluation will rarely be an easy task and will almost always have to be done on the basis of insufficient information. To assess the potential significance of a criminal justice program and, hence, the potential significance of an impact evaluation of that program, however, requires some such assessment. Because it is a difficult task, expert criminal justice professionals, policy makers, and researchers should be employed to review candidate programs, discuss their significance for impact evaluation, and make recommendations about the corresponding priorities. EVALUABILITY OF THE PROGRAM A criminal justice program that is significant in terms of the criteria described above may, nonetheless, be inappropriate for impact evaluation. The nature of the program and its circumstances, the prerequisites for credible research, or the available resources may fall short of what is

OCR for page 22
Improving Evaluation of Anticrime Programs required to conduct an adequate assessment of program effects. This is an unfortunate circumstance, but one that must be recognized in any process of decision making about where to invest resources for impact evaluation. The number of impact evaluations found to be inadequately implemented in the GAO reports reviewed in Chapter 1 of this report is evidence of the magnitude of the potential difficulties in completing even well-designed projects of this sort. At issue is the evaluability of a program—whether the conceptualization, configuration, and situation of a program make it amenable to evaluation research and, if so, what would be required to conduct the research. Ultimately, effective impact evaluation depends on four basic preconditions: (a) a sufficiently developed and documented program to be evaluated, (b) the ability to obtain relevant and reliable data on the program outcomes of interest, (c) a research design capable of distinguishing program effects from other influences on the outcomes, and (d) sufficient resources to adequately conduct the research. Item (c), relating to research design for impact evaluation, poses considerable technical and practical challenges and, additionally, must be tailored rather specifically to the circumstances of the program being evaluated. It is discussed in the next chapter of this report. The other preconditions for effective impact evaluation are somewhat more general and are reviewed below. The Program At the most basic level, impact evaluation is most informative when there is a well-defined program to evaluate. Finding effects is of little value if it is not possible to specify what was done to bring about those effects, that is, the program’s theory of change and the way it is operationalized. Such a program cannot be replicated nor easily used by other practitioners who wish to adopt it. Moreover, before beginning a study, researchers should be able to identify the effects, positive and negative, that the program might plausibly produce and know what target population or social conditions are expected to show those effects. Programs can be poorly defined in several different ways that will create difficulties for impact evaluation. One is simply that the intended program activities and services are not documented, though they may be well-structured in practice. It is commonplace for many medical and mental health programs to develop treatment protocols—manuals that describe what the treatment is and how it is to be delivered—but this is not generally the case for criminal justice programs. In such instances, the evaluation research may need to include an observational and descriptive component to characterize the nature of the program under consideration. As mentioned in Chapter 2, a process evaluation to determine how well

OCR for page 22
Improving Evaluation of Anticrime Programs the program is implemented and how completely and adequately it delivers the intended services is also frequently conducted along with an impact evaluation. These procedures allow any findings about program effects to be accompanied by a description of the program as actually delivered as well as of the program as intended. Another variant on the issue of program definition occurs for programs that provide significantly different services to different program participants, whether inadvertently or by intent. A juvenile diversion project, for instance, may prescribe quite different services for different first offenders based on a needs assessment. A question about the impact of this diversion program may be answered in terms of the average effect on recidivism across the variously treated juveniles served. The mix of services provided to each juvenile and the basis for deciding on that mix, however, may be critical to any success the program shows. If those aspects are not well-defined in the program procedures, it can be challenging for the evaluation to fully specify these key features in a way that adequately describes the program or permits replication and emulation elsewhere. One of the more challenging situations for impact evaluation is a multisite program with substantial variation across sites in how the program is configured and implemented (Herrell and Straw, 2002). Consider, for example, a program that provides grants to communities to better coordinate the law enforcement, prosecutorial, and judicial response to domestic violence through more vigorous enforcement of existing laws. The activities developed at each site to accomplish this purpose may be quite different, as well as the mix of criminal justice participants, the roles designated for them in the program, and the specific laws selected for emphasis. Arguably under such circumstances each site has implemented a different program and each would require its own impact evaluation. A national evaluation that attempts to encompass the whole program has the challenge of sampling sites in a representative manner but, even then, is largely restricted to examining the average effects across these rather different program implementations. With sufficient specification of the program variants and separate effects at each site, more differentiated findings about impact could be developed, but at what may be greatly increased cost. Outcome Data Impact evaluation requires data describing key outcomes, whether drawn from existing sources or collected as part of the evaluation. The most important outcome data are those that relate to the most policy-relevant outcomes, e.g., crime reduction. Even when we observe relevant

OCR for page 22
Improving Evaluation of Anticrime Programs outcomes, there may be important trade-offs between the sensitivity and scope of the measure. For example, when evaluating the minimum drinking age laws, Cook and Tauchen (1984) considered whether to use “fatal nighttime single-vehicle accidents” (which has a high percentage of alcohol-related cases, making it sensitive to an alcohol-oriented intervention) or an overall measure of highway fatalities (which should capture the full effect of the law, but is less sensitive to small changes). In some instances, the only practical measures may be for intermediate outcomes presumed to lead to the ultimate outcome (e.g., improved conflict-resolution skills for a violence prevention program or drug consumption during the last month rather than lifetime consumption). There are several basic features that should be considered when assessing the adequacy and availability of outcome data for an impact evaluation. In particular, the quality of the evaluation will depend, in part, on the representativeness, accuracy, and accessibility of the relevant data (NRC, 2004). Representativeness A fundamental requirement for outcome data is that they represent the population addressed by the program. The standard scheme for accomplishing this when conducting an impact evaluation is to select the research participants with a random sample from the target population, but other well-defined sampling schemes can also be used in some instances. For example, case-control or response-based sampling designs can be useful for studying rare events. To investigate factors associated with homicide, a case-control design might select as cases those persons who have been murdered, and then select as controls a number of subjects from the same population with similar characteristics who were not murdered. If random sampling or another representative selection is not feasible given the circumstances of the program to be evaluated, the outcome data, by definition, will not characterize the outcomes for the actual target population served by the program. Similar considerations apply when the outcome data are collected from existing records or data archives. Many of the data sets used to study criminal justice policy are not probability samples from the particular populations at which the policy may be aimed (see NRC, 2001). The National Crime Victimization Survey (NCVS), for example, records information on nonfatal incidents of crime victims but does not survey offenders. Household-based surveys such as the NCVS and the General Social Survey (GSS) are limited to the population of persons with stable residences, thereby omitting transients and other persons at high risk for crime and violence. The GSS is representative of the United States and the nine census regions, but it is too sparse geographically to support conclusions at the finer levels of geographical

OCR for page 22
Improving Evaluation of Anticrime Programs aggregation where the target populations for many criminal justice programs will be found. Accuracy The accuracy of the outcome data available is also an important consideration for an impact evaluation. The validity of outcome data is compromised when the measures do not adequately represent the behaviors or events the program is intended to affect, as when perpetrators understate the frequency of their criminal behavior in self-report surveys. The reliability of the data suffers when unsystematic errors are reflected in the outcome measures, as when arrest records are incomplete. The bias and noise associated with outcome data with poor validity or reliability can easily be great enough to distort or mask program effects. Thus credible impact evaluation cannot be conducted with outcome data lacking sufficient accuracy in either of these ways. Accessibility If the necessary outcome data are not accessible to the researcher, it will obviously not be possible to conduct an impact evaluation. Data on individuals’ criminal offense records that are kept in various local or regional archives, for instance, are usually not accessible to researchers without a court order or analogous legal authorization. If the relevant authorities are unwilling to provide that authorization, those records become unavailable as a source of outcome data. The programs being evaluated may themselves have outcome data that they are not willing to provide to the evaluator, perhaps for ethical reasons (e.g., victimization reported to counselors) or because they view it as proprietary. In addition, researchers may find that increasingly stringent Institutional Review Board (IRB) standards preclude them from using certain sources of data that may be available (Brainard, 2001; Oakes, 2002). Relevant data collected and archived in existing databases may also be unavailable even when collected with public funding (e.g., Monitoring the Future; NRC, 2001). Still another form of inaccessible data is encountered when nonresponse rates are likely to be high for an outcome measure, e.g., when a significant portion of the sampled individuals decline to respond at all or fail to answer one or more questions. Nonresponse is an endemic problem in self-report surveys and is especially high with disadvantaged, threatened, deviant, or mobile populations of the sort that are often involved in criminal justice programs. An example from the report on illicit drug policy (NRC, 2001:95-96) illustrates the problem:

OCR for page 22
Improving Evaluation of Anticrime Programs Suppose that 100 individuals are asked whether they used illegal drugs during the past year. Suppose that 25 do not respond, so the nonresponse rate is 25 percent. Suppose that 19 of the 75 respondents used illegal drugs during the past year and that the others did not. Then the reported prevalence of illegal drug use is 19/75 = 25.3 percent. However, true prevalence among the 100 surveyed individuals depends on how many of the nonrespondents used illegal drugs. If none did, then true prevalence is 19/100 = 19 percent. If all did, then true prevalence is [(19 + 25)/100] = 44 percent. If between 0 and 25 nonrespondents used illegal drugs, then true prevalence is between 19 and 44 percent. Thus, in this example, nonresponse causes true prevalence to be uncertain within a range of 25 percent. Resources The ability to conduct an adequate impact evaluation of a criminal justice program will clearly depend on the availability of resources. Relevant resources include direct funding as a major component, but also encompass a range of nonmonetary considerations. The time available for the evaluation, for instance, is an important resource. Impact evaluations not only require that specialized research designs be implemented but that outcomes for relatively large numbers of individuals (or other affected units) be tracked long enough to determine program effects. Similarly, the availability of expertise related to the demanding technical aspects of impact evaluation research, cooperation from the program to be evaluated, and access to relevant data that has already been collected are important resources for impact evaluation. The need for these various resources for an impact evaluation is a function of the program’s structure and circumstances and the evaluation methods to be used. For example, evaluations of community-based programs, with the community as the unit of analysis, will require participation by a relatively large numbers of communities. This situation will make for a difficult and potentially expensive evaluation project. Evaluating a rehabilitation program for offenders in a correctional institution with outcome data drawn from administrative records, on the other hand, might require fewer resources. SELECTING PROGRAMS APPROPRIATE FOR IMPACT EVALUATION No agency or group of agencies that sponsor program evaluation will have the resources to support impact evaluation for every program of potential interest to some relevant party. If the objective is to optimize the practical and policy relevance of the resulting knowledge, programs

OCR for page 22
Improving Evaluation of Anticrime Programs should be selected for evaluation on the basis of (a) the significance of the program, e.g., the scope of practice and policy likely to be affected and (b) the extent to which the circumstances of the program make it amenable to sound evaluation research. The procedures for making this selection should not necessarily be the same for both these criteria. Judging the practical importance of a program validated by impact evaluation requires informed opinion from a range of perspectives. The same is true for identifying new program concepts that are ripe for evaluation study. Surveys or expert review procedures that obtain input from criminal justice practitioners, policy makers, advocacy groups, researchers, and the like might be used for this purpose. With a set of programs judged significant identified, assessment of how amenable they are to sound impact evaluation research is a different matter. The expertise relevant to this judgment resides mainly with evaluation researchers who have extensive field experience conducting impact evaluations of criminal justice programs. This expertise might be marshaled through a separate expert review procedure, but there are inherent limits to that approach if the expert informants have insufficient information about the programs at issue. Trustworthy assessments of program evaluability depend upon rather detailed knowledge of the nature of the program and its services, the target population, the availability of relevant data, and a host of other such matters. More informed judgments about the likelihood of successful impact evaluation will result if this information is first collected in a relatively systematic manner from the programs under consideration. The procedure for accomplishing this is called evaluability assessment (introduced in Chapter 2). The National Institute of Justice has recently begun conducting evaluability assessments as part of its process for selecting programs for impact evaluation. Their procedure1 involves two stages: an initial screening using administrative records and telephone inquiries plus a site visit to programs that survive the initial screening. The site visit involves observations of the project as well as interviews with key project staff, the project director, and (if appropriate) key partners and members of the target population. Box 3-1 lists some of the factors assessed at each of these stages. The extent to which the results of such an assessment are informative when considering programs for impact evaluation is illustrated by NIJ’s 1   There are actually two different assessment tools—one for local and another for national programs. This description focuses on the local assessment instrument.

OCR for page 22
Improving Evaluation of Anticrime Programs experience with this procedure. In the most recent round of evaluability assessments, a pool of approximately 200 earmarked programs was reduced to only eight that were ultimately judged to be good candidates for an impact evaluation that would have a reasonable probability of yielding useful information. BOX 3-1 Factors Considered in Each Stage of NIJ Evaluability Assessments Initial Project Screening What do we already know about projects like these? What could an evaluation of this project add to what we know? Which audiences would benefit from this evaluation? What could they do with the findings? Is the grantee interested in being evaluated? What is the background/history of this project? At what stage of implementation is it? What are the project’s outcome goals in the view of the project director? Does the proposal/project director describe key project elements? Do they describe how the project’s primary activities contribute to goals? Can you sketch the logic by which activities should affect goals? Are there other local projects providing similar services that could be used for comparison? Will samples that figure in outcome measurement be large enough to generate statistically significant findings for modest effect sizes? Is the grantee planning an evaluation? What data systems exist that would facilitate evaluation? What are the key data elements contained in these systems? Are there data to estimate unit costs of services or activities? Are there data about possible comparison samples? In general, how useful are the data systems to an impact evaluation?

OCR for page 22
Improving Evaluation of Anticrime Programs Site Visit Is the project being implemented as advertised? What is the intervention to be evaluated? What outcomes could be assessed? By what measures? Are there valid comparison groups? Is random assignment possible? What threats to a sound evaluation are most likely to occur? Are there hidden strengths in the project? What are the sizes and characteristics of the target populations? How is the target population identified (i.e., what are eligibility criteria)? Who/what gets excluded as a target? Have the characteristics of the target population changed over time? How large would target and comparison samples be after one year of observation? What would the target population receive in a comparison sample? What are the shortcomings/gaps in delivering the intervention? What do recipients of the intervention think the project does? How do they assess the services received? What kinds of data elements are available from existing data sources? What specific input, process, and outcome measures would they support? How complete are data records? Can you get samples? What routine reports are produced? Can target populations be followed over time? Can services delivered be identified? Can systems help diagnose implementation problems? Does staff tell consistent stories about the project? Are their backgrounds appropriate for the project’s activities? What do partners provide/receive? How integral to project success are the partners? What changes is the director willing to make to support the evaluation?