Rebecca Maynard (member, steering committee) opened the session by noting that although there are several stakeholder groups that have a vested interest in developing a principles and practices document for evaluation, there are others who would not consider it such a good idea. She referred back to earlier discussion about developing a clear strategy for the evaluation, considering who may be threatened by the outcome, and the need to incorporate evaluations into policy. Maynard said she believes strongly that there should be a push to design evaluations with the expectation that the results may be positive, neutral, inconclusive, or negative and to plan for any of these outcomes. She also does not think every evaluation should be thought of as start to finish: having fluidity and built-in decision points can be beneficial.
Jon Baron (Arnold Foundation) said he believes that any guidance document being developed should take on a “less is more” approach, highlighting a few key principles and making a persuasive case for each of them. He does not think it would be beneficial to try and cover the whole landscape of evaluation in a single document, nor does he think that a long document would be read thoroughly and carefully. In addition, Baron said he believes that a technical document would be preferable to a consensus document, which might not contain the needed clarity or specificity. He gave the example of the Common Evidence Guidelines, a joint publication of the Institute of Education Sciences and the National Science Foundation (NSF) (2013), which states that “[g]enerally and when feasible, [studies] should use designs in which the treatment and comparison groups are randomly assigned.”
Baron said he thinks these guidelines could potentially serve as a starting point for an evaluation policy document. He also said that a central goal of evaluation efforts should be to grow the body of interventions that are backed by credible evidence of effectiveness. The specific goal would be to build the number of interventions shown in high-quality randomized experiments, replicated across different studies or sites, that produce sizable impacts on important life outcomes. He gave examples of health research and social policy studies that follow this paradigm, and he asserted that this approach helps nonscientific stakeholders know and accept the value of evaluation studies. Baron said he wants to see this type of visceral demonstration of the value of research in social policy and believes that it is the key to making evaluations politically sustainable. Howard Rolston (member, steering committee) added that, because the general public is often wary that facts and figures coming from government reports appear to support specific political agendas, it is important to consider evaluation and dissemination strategies that are free of bias and preserve the facts.
Baron noted that he has also been on the other side, in a way, when highly credible evaluations produced disappointing results. He quoted Manzi’s Uncontrolled (2012, Ch. 11), saying that “innovative ideas rarely work,” and mentioned two ideas on how to increase the yield of positive findings in larger evaluations. First, he suggested that prior to funding a large randomized experiment, one look for a very strong signal from prior research or evaluation literature that the intervention being evaluated could produce meaningful positive effects—promising evidence that it could be the exception, so to speak. The second tactic Baron suggested was to make a small investment up front to discover the mechanisms and look for a large effect on proximal outcomes before going forward with a major evaluation. This small step could take the form of an initial low-cost randomized controlled trial or quasi-experiment. He also echoed Maynard’s suggestion about incorporating an interim decision point for short-term follow-up in a study in which one could expect early indication of long-term outcomes.
Mark Shroder (Department of Housing and Urban Development) raised the concern that the funding for the studies and the information requests are still closely controlled by Congress and the Office of Management and Budget’s (OMB) Office of Information and Regulatory Affairs, respectively. Russ Whitehurst (chair, steering committee) said he believes that, because of that control, an evaluation document of the kind being discussed should not be produced by a direct stakeholder; instead, it should be written by a foundation or similar nonstakeholder, nonpolitical organization, be addressed directly to Congress, and take the form of proposed legislation. He added that he has seen growing interest in evidence-based policy from both sides of the aisle, and he believes that evaluation has the potential to gain similar bipartisan appeal.
Judith Gueron (member, steering committee) said that in her experience foundations are sometimes less concerned with exploring new learning, often taking the position that they “already know enough”; their focus is on proving the desired outcome instead of learning whether it was worthwhile. She said she believes that foundations can be useful partners to the federal government, as they can fund essential activities that the government is less likely to fund—communication and dissemination, for example—and asked Baron how to better engage them. Baron answered that some foundations believe they are helping a program simply by making a contribution but that highlighting the importance of rigorous evaluation could go a long way in terms of measuring actual progress. He also noted that it is of key importance to learn what is important to the foundations when engaging with them. Rolston added that an “inside” effort by a key player, such as OMB, could bolster the acceptance of the principles.
Sherry Glied (New York University) asked about the issue of magnitude and power for some of the smaller evaluation agencies and what to do when the program or budget is not big enough to support a desired study. Would smaller experiments be accepted in these cases? Should quasi-experimental analysis be used routinely and be supplemented by randomized controlled trials once evidence accumulates? Maynard said she believes there is a benefit to accumulating small experiments, either through sequential replications or more formalized networked studies.
Demetra Nightingale (Urban Institute) cautioned the workshop participants that any document that might be created needs to go beyond simply covering impact evaluations, social programs, and experiments in more established programs: it also needs to be applicable to the variety of agencies trying to build evaluation offices. In response to a query from Maynard about the smaller agencies that often may not have a voice in these conversations, Nightingale explained that they are represented in cross-agency evaluation groups that OMB convenes and are actively involved in discussions about funding, strategy, design, and other concepts around evaluation. Jeff Dowd (Department of Energy) echoed Maynard’s concern, cautioning the participants not to forget about smaller agencies with decentralized evaluation offices and to take the time to learn about their specific challenges.
Mark Schroeder (NSF) commented on the relationship among evidence, law, and legal writing and asked to what extent lawyers can contribute to making an effective synergy between different types of evidence. He mentioned that patent lawyers in particular could prove valuable because of their knowledge of science, in addition to law. Baron reminded Schroeder that the term “evidence” has a different meaning in law, but said that he is aware of rigorous evaluation having been introduced recently into legal
contexts and can see how lawyers could use it to test different approaches in the criminal justice system.
Thomas Feucht (National Institute of Justice) identified three groups that might be opposed to a principles document: those from program agencies who may challenge the notion of rigor and ascribe more to the “I tried it and it works” philosophy; practitioners who may see an investment in evaluation as detracting from direct services; and smaller agencies whose programs or target populations may be underrepresented in a push towards randomized controlled trials. Baron replied that a response to these arguments would be to focus on evaluating components of a program rather than the entire program—e.g., looking at preschool interventions as opposed to the entire Head Start program. Naomi Goldstein (Administration for Children and Families) added that some political or high-level appointees may be resistant to strengthening the independence and transparency of evaluation activities; conversely, however, she said that private-sector organizations that routinely do this type of evaluation could be supportive.