methods have been extended to nonmedical applications, greater acceptability of other types of evidence has been granted, but reluctantly (see below). More recently, the Campbell Collaboration (see Sweet and Moynihan, 2007) attempted to take a related but necessarily distinctive approach to systematic reviews of more complex interventions addressing social problems beyond health, in the arenas of education, crime and justice, and social welfare. The focus was on improving the usefulness of systematic reviews for researchers, policy makers, the media, interest groups, and the broader community of decision makers. The Society for Prevention Research has extended efforts to establish standards for identifying effective prevention programs and policies by issuing standards for efficacy (level of certainty), effectiveness (generalizability), and dissemination (Flay et al., 2005).

The criteria of the USPSTF mentioned above were adapted by the Community Preventive Services Task Force, with greater concern for generalizability in recognition of the more varied public health circumstances of practice beyond clinical settings (Briss et al., 2000, 2004; Green and Kreuter, 2000). The Community Preventive Services Task Force, which is overseeing systematic reviews of interventions designed to promote population health, is giving increasing attention to generalizability in a standardized section on “applicability.” Numerous textbooks on research quality have tended to concern themselves primarily with designs for efficacy rather than effectiveness studies, although the growing field of evaluation has increasingly focused on issues of practice-based, real-time, ordinary settings (Glasgow et al., 2006b; Green and Lewis, 1986, 1987; Green et al., 1980). Finally, in the field of epidemiology, Rothman and Greenland (2005) offer a widely cited model that describes causality in terms of sufficient causes and their component causes. This model illuminates important principles such as multicausality, the dependence of the strength of component causes on the prevalence of other component causes, and the interactions among component causes.

The foregoing rules or frameworks for evaluating evidence have increasingly been taken up by the social service professions, building not just on biomedical traditions but also on agricultural and educational research in which experimentation predated much of the action research in the social and behavioral sciences. The social service and education fields have increasingly utilized RCTs, but have faced growing resistance to their limitations and the “simplistic distinction between strong and weak evidence [that] hinged on the use of randomized controlled trials …” (Chatterji, 2007, p. 239; see also Hawkins et al., 2007; Mercer et al., 2007; Sanson-Fisher et al., 2007), especially when applied to complex community interventions.

Campbell and Stanley’s (1963) widely used set of “threats to internal validity (level of certainty)” for experimental and quasi-experimental designs were accompanied by their seldom referenced “threats to external validity (generalizability).” “The focus on internal validity (level of certainty) was justified on the grounds that without internal validity, external validity or generalizability would be irrelevant or misleading, if not impossible” (Green and Glasgow, 2006, p. 128). These and other issues

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement