these two interventions to effectively compound their individual benefits. We therefore recommend that future research efforts should focus on using multidimensional RCT designs to assess how much these two interventions work separately and work in concert with each other and with other approaches, such as hospital and primary screening programs. Because of the multitude of IPV etiologies and patterns, the most likely path to eliminating IPV entirely, once started, is represented by efforts that bring multiple agencies together so they can identify, assess, and respond appropriately when needed. Such partnerships are the most likely solution for addressing the range of systematic issues facing people who experience violence and abuse within their intimate relationships.
Anthony Petrosino, Ph.D.8
Questions like “What works to prevent violence?” require a careful examination of the research evidence. The evidence is composed of the studies that have been conducted to test the effects of an intervention, a policy, or a practice on violence outcomes.
Integrating evidence is necessary because many programs and policies have been evaluated, across many countries and with different populations within the same nations, and using many different methods and measures. How can we even begin to make sense of these studies to respond to the question, “What works to prevent violence?” How can we do it in a way that is systematic and explicit, and convinces the skeptics (and there are at least a few of those) that the answers are reasonable and to be trusted, especially when decisions about what to do often take place in a highly politicized and contentious context?
There have been several developments to integrate evidence in violence prevention. Two of the more common approaches are referred to in this paper as systematic reviews and evidence-based registries. This paper provides a brisk overview of both.
8 The author thanks Trevor Fronius and Claire Morgan for their comments on earlier drafts of this paper.
A terrific scenario would be if every study was conducted in the same way and came to the same conclusions. Then it would not matter what study we pulled out of a file drawer or what bundle of studies we presented; they would represent the evidence quite well.
But, as it turns out, life is not so simple. Studies usually vary on all sorts of dimensions, including the quality of the methods and the confidence we have in the conclusions. Another way that studies vary is on the results that are reported. Some studies report a positive impact for an intervention, others report little or no effect at all, and still others report harmful effects for it. This variation in results presents a problem, as zealots and advocates on both sides of a public policy question can selectively use the evidence (“cherry picking”) to support their particular position. This was the point made by Furby and her colleagues in 1989 (p. 22) when reviewing the impact of treatment of sex offenders on their subsequent reoffending:
The differences in recidivism across these studies are truly remarkable; clearly by selectively contemplating the various studies, one can conclude anything one wants.
Apart from the variation across studies and how this might be intentionally exploited by advocates and zealots for particular positions, there are some other issues about evidence that need to be addressed. An important one is that there are potential biases in where studies are reported and how they are identified. What does this mean? Research has shown in some fields that researchers are more likely to submit papers to peer-reviewed academic journals, and editors are more likely to publish them, if they report statistically significant and positive effects for treatment. So any integration of evidence that relies only on peer-reviewed journals could be potentially biased toward positive results for the treatment(s) being examined. How true this is in the violence prevention area has not been implicitly tested, but it is considered good practice now for any integration of evidence to take into account studies published outside of the academy.
Another issue is how “success” for a program is determined. Traditional scientific norms generally mean that we use “statistical significance” to determine whether a result for an intervention is trustworthy. If the observed effect is so large that the result is very likely not due to the “play of chance,” we say it is statistically significant. Traditionally, we are willing to say a result is statistically significant if the result would be expected by the “play of chance” 5 times or fewer in 100 (the .05 criterion). But statistical significance is very influenced by sample size; large samples can result in rather trivial differences being statistically significant, and very large effects may not be significant if the sample sizes are modest. Research has found
that relying solely on statistical significance as a criterion for determining success of a program can bias even the well-intentioned and non-partisan reviewer toward concluding a program is ineffective when it may very well have positive and important effects.
Issues about where results are reported and how success is determined are but a few of the issues that can challenge evidence integration efforts. What is the conscientious person to do? Fortunately, in the past half-century or so, there has been considerable attention to the way reviews of evidence are done. Under the label of meta-analysis, research synthesis, and more recently, systematic reviews, a “science of reviewing” has emerged that essentially holds reviews of evidence to the same standards for scientific rigor and explicitness that we demand of survey studies and experimental studies. In some sense, we have moved from experts doing traditional reviews and saying “trust me” to researchers doing systematic reviews and saying “test me.”
Systematic reviews can be done in several ways, but most follow a similar set of procedures. An example of a very timely systematic review in the violence prevention area may illustrate the point. Koper and Mayo-Wilson (2012) conducted a systematic review of research for the Campbell Collaboration on the effects of police strategies to reduce illegal possession and carrying of firearms. Following the mass shootings in the United States the past few years, and particularly following the massacre of elementary schoolchildren in Connecticut in December 2012, there is much attention on whether these strategies work. The procedures Koper and Mayo-Wilson (2012) followed were as follows:
- Like any study, a good objective or research question that can be responded to by a systematic review is needed. In this review, the authors wanted to identify the impacts, if any, of police strategies to reduce illegal possession and carrying of firearms on gun crime.
- Once the question of interest is settled, the reviewers need to set out explicit criteria to determine which studies will be included in the review and which will be excluded. Koper and Mayo-Wilson (2012) included only those studies that used a randomized or quasi-experimental design. The studies had to include measures of gun crime (e.g., gun murders, shootings, gun robberies, gun assaults) before and after intervention.
- The review team needs to conduct and document a search for the eligible studies. The search must be comprehensive and designed to reduce the potential for bias described above by including those published in peer-reviewed journals and those reported in other sources (e.g., government reports, dissertations). The authors searched 11 abstracting databases for published and unpublished
literature; examined reviews and compilations of relevant research; and searched key websites. They found four studies that included seven outcome analyses.
- A structured instrument is designed and then used to carefully code or extract information from each study to form a dataset. David Wilson of George Mason University has a wonderful phrase for this: “interviewing the studies.” Koper and Mayo-Wilson (2012) interviewed the studies to collect information on the research design used, the participants included in the study, the exact nature of the treatment, and the outcomes used in the evaluation.
- If a quantitative review or meta-analysis is possible, the outcomes of interest are “quantified” if possible into a common metric known as “effect size.” In this particular review, no meta-analysis (quantitative synthesis) was attempted because there were a small number of studies, and they varied so extensively that attempting a statistical synthesis made little sense.
- Results are reported. If quantitative or statistical analyses are done, this will take a number of forms. This usually includes a description of the included studies, an estimate of the overall impact of the treatment(s) under investigation (average effect size across all studies) and how that overall impact (the average effect size) varies based on characteristics of the treatment(s), the populations, the methods, etc. But if no quantitative synthesis is done, the results are reported qualitatively. Koper and Mayo-Wilson (2012) produced the latter. Six of the seven tests indicated that directed patrols reduced gun crime in high-crime places at high-risk times, ranging from 10 to 71 percent. The authors concluded that although the evidence base is weak, the studies do suggest that directed patrols focused on illegal gun carrying prevent gun crime.
- A structured and detailed report is produced, explicitly detailing every step in the review. Koper and Mayo-Wilson (2012) conducted their study with the Campbell Collaboration, an international organization that prepares, updates, and disseminates high-quality reviews of evidence on topics such as violence prevention. Campbell Collaboration reports are structured to uniformly present necessary details on every step in the review process.
Many public agencies do not have staff that can spend the time necessary to do a systematic review, and they generally rely on external and trusted sources for evidence. The advent of electronic technology has meant that summaries of evidence from systematic reviews can be provided quickly so long as the intended user has Internet access and can download documents. Groups such as the Campbell Collaboration’s Crime and Justice
Group not only prepare and update reviews of evidence, but make them freely available to any intended user around the world. The rigor and transparency of such reviews have made them a trusted source of evidence, particularly in the politicized and contentious environment that surrounds government response to violence.
Campbell Collaboration and other systematic reviews tend to be broad summaries of “what works” for a particular problem (e.g., gun violence) and classes of interventions (police-led strategies for policing illegal guns). They are not usually focused on brand name programs or very specific, fine-grained definitions of an intervention. Because decision makers often need evidence on particular interventions, other approaches to providing evidence that is more fine grained have been developed.
During the past 10 to 15 years, a common approach across a variety of public policy fields can be classified under the heading of “evidence-based registries.” They are also referred to as “best practice registries” and “best practice lists.” In the violence prevention area, quite a few are relevant, including the University of Colorado’s Blueprints for Violence and Substance Abuse Prevention, DOJ’s Crime Solutions effort, the Coalition for Evidence-based Policy’s “Social Programs That Work,” and the U.S. Substance Abuse and Mental Health Administration’s National Registry of Effective Programs and Practice. Table II-3 provides a list of some important registries across different fields.
These registries differ in terms of scope and focus, but they all have a similar framework: An external group of scientists examines the evidence for a very specific intervention or policy, such as Life Skills Training or Gang Resistance Education and Awareness Training (G.R.E.A.T.). The external group gathers the evidence on that specific program. Generally, though the standards are different for each registry, evidence is only included if it is based on randomized or quasi-experimental designs. Whatever evidence on the intervention is then screened to determine if it meets minimum evidentiary standards, and those studies meeting the screen are used to assess its effectiveness. Most registries attempt to distinguish between (1) model or exemplary programs that have two or more studies demonstrating positive impacts and (2) promising interventions that have only one study indicating positive impacts. Many of the registries include a stunning amount of material on the intervention so that those interested in adopting it can do so. The registry is made available electronically so it is available instantly to the busy professionals who need it. There is also no charge to access the registry, so it is free to all who can benefit from it.
TABLE II-3 Evidence-Based Registries Across Different Areas
|Evidence-Based Registry||Area||Evidence Standards|
|What Works Clearinghouse||Education||Randomized experiments Quasi-experiments with evidence of equating|
|CrimeSolutions.gov||Criminal justice||Randomized experiments Quasi-experiments (but those with evidence of equating are rated highest)|
|Coalition for Evidence-based Policy Top-Tier Evidence||Federal policy (Office of Management and Budget/Congress)||Randomized experiments|
|What Works in Reentry Clearinghouse||Offender Reentry/reintegration Programs/policies||Randomized experiments Quasi-experiments with evidence of equating|
|HHS Evidence-based Teen Pregnancy Prevention Models||Teen pregnancy prevention||Randomized experiments “Strong” quasi-experiments|
|SAMSHA National Registry of Evidence-based Programs and Practices (NREPP)||Prevention, broadly||Randomized experiments Quasi-experiments|
NOTE: HHS = Department of Health and Human Services; SAMHSA = Substance Abuse and Mental Health Services Administration.
SOURCE: Anthony Petrosino.
An example may serve to also illustrate the evidence-based registry. The Coalition for Evidence-Based Policy is a not-for-profit group based in Washington, DC, that advocates for the use of evidence in policy decision making, particularly at the U.S. federal level. They have been very influential with Congress, the Office of Management and Budget (OMB), and federal agencies such as the U.S. Department of Education’s Institute for Education Sciences. The Coalition’s registry identifies Top-Tier and Near-Tier Evidence; the difference between them is based on whether a high-quality replication of a program has been conducted. A good example is the “Nurse–Family Partnership” championed by David Olds of Syracuse University, which has been identified as a Top-Tier program by the Coalition.
First, the Coalition solicits or seeks out candidates for Top-Tier or Near-Tier programs. For those candidates, the Coalition then undertakes a careful search to find the evidence on the effects of the program. The Coalition only considers evidence from randomized experiments to designate programs as Top-Tier or Near-Tier. This is a rather strict standard and has not been adopted by nearly all of the other registries, but the Coalition
stresses that only randomized experiments—when implemented with good fidelity—produce statistically unbiased estimates of impact.
To be designated as Top-Tier, a program must have sizable and sustained effects. This is established with multiple experiments testing the program. The Coalition located three randomized experiments of the Nurse–Family Partnership with different populations that have all reported positive effects on a variety of outcomes. Two studies reported a reduction in child abuse and neglect, the outcome that is most relevant to violence prevention. After the Coalition is done summarizing the evidence, it asks for a review by the evaluators who produced the experiments to ensure any inaccuracies are corrected.
Each summary of Top-Tier interventions in the Coalition’s Registry includes details on the program, and how it was different than what the control group received; the populations and settings in which the intervention was evaluated; the quality of the randomized experiment; and the results on the main outcomes of interest. Because it is Top-Tier, the Coalition argues that it should be implemented more widely, and has been pushing Congress and OMB to facilitate wider adoption of programs like the Nurse–Family Partnership. Most registries contain very detailed information on the intervention and population because one goal is to facilitate adoption and implementation of these Top-Tier programs.
The move toward systematic reviews and evidence-based registries resonates with me as a former state government researcher in the justice area in two states (Massachusetts and New Jersey) over my professional career. Our units would, on occasion, receive an urgent request from the state’s Attorney General (AG), the Governor’s Office, a state legislator, or the head of the Office of Public Safety. These requests came in the days when the Internet was just beginning and offered skimpy sites compared to today. The request would go something like this: “We want to know what works and we want to know by five o’clock.” Generally, this meant there was money to be appropriated and they wanted to make sure those funds were allocated toward effective strategies. Or there might be some controversy over a program like G.R.E.A.T. and they wanted to know what the evidence on the program’s impact was. (In the interests of full disclosure, sometimes those requests were something like “here’s what we’re going to do, now get us the evidence to support it.”)
Little did I know, electronically accessible systematic reviews and evidence-based registries would spring up all over the Internet a few years after I left state government service. These allow the busy government researcher to respond quickly to urgent policy requests. If I were employed