Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
10 Advances in Prevention Methodology S ince the 1994 Institute of Medicine (IOM) report Reducing Risks for Mental Disorders: Frontiers for Preventive Intervention Research, substantial progress has been made in the development of methodolo- gies for the measurement, design, and analysis of the effects of preventive interventions, as well as in the identification of antecedent risk and protec- tive factors and their effects. These new methodological tools are necessary to assess whether an intervention works as intended, for whom, under what conditions, at what cost, and for how long. Although not unique to prevention, answers to these fundamental research questions are needed to help a policy maker determine whether to recommend an intervention and to help a community know whether it can reasonably expect that a newly implemented program is likely to lead to benefit. Methodological advances are due in part to technical developments in biostatistical methods, causal inference, epidemiology, and other related quantitative disciplines. However, many of the new approaches have been developed by federally funded methodology centers (see Box 10-1) to respond to specific scientific and practical questions being raised in ongo- ing evaluations of prevention programs. In particular, evaluations of pre- ventive interventions that have been conducted as randomized field trials (Brown and Liao, 1999; Brown, Wang, et al., 2008) have contributed not only to the development of alternative study designs and statistical models to examine intervention impact, but also to dramatic improvements in sta- tistical computing. This has led to more insightful statistical modeling of intervention effects that takes into account the longitudinal and multilevel nature of prevention data. 263
264 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS BOX 10-1 Centers for Research on Prevention Science and Methodology The Prevention Science and Methodology Group (PSMG) is an interÂdisciplinary network that has been supported by the National Institute of Mental Health (NIMH) and the National Institute on Drug Abuse (NIDA) for the past 20 years. It brings together prevention scientists conducting cutting-edge randomized trials and expert methodologists who are committed to addressing the key design and analytic problems in prevention research. PSMG has attempted to anticipate needs for methodological development and to have new methods ready when the trials demand them (Albert and Brown, 1990; Brown, Costigan, and Kendziora, 2008). As the field of prevention science has matured over the past 15 years, PSMG has worked on such problems as generalized estimating equations as a way to account for uncertainty in longitudinal and multilevel inferences (Zeger, Liang, and Albert, 1988; Brown, 1993b); methods to assess intervention impact with growth models (MuthÃ©n, 1997, 2007; MuthÃ©n, Jo, and Brown, 2003; MuthÃ©n and ÂCurran, 1997; Curran and MuthÃ©n, 1999; MuthÃ©n and Shedden, 1999; Carlin, Wolfe, et al., 2001; MuthÃ©n, Brown, et al., 2002; Wang, Brown, and Banderen-Roche, 2005; MuthÃ©n and Asparouhov, 2006; Asparouhov and MuthÃ©n, 2007); variation in impact by baseline characteristics (Brown, 1993a, 1993b; Ialongo, Werthamer, et al., 1999; Brown, Costigan, and Kendziora, 2008); mediation analysis (MacKinnon, 2008); multilevel models for behavior observations (Dagne, Howe, et al., 2002; Dagne, Brown, and Howe, 2003, 2007; Howe, Dagne, and Brown, 2005; Snyder, Reid, et al., 2006); modeling of self-selection factors (Jo, 2002; Jo and MuthÃ©n 2001; Jo, Asparouhov, et al., in press); and randomized trial designs specifically for prevention studies (Brown and Liao, 1999; Brown, Wyman, et al., 2006; Brown, Prevention methodology, or the use of statistical methodology and statistical computing, is a core discipline in the field of prevention science (Eddy, Smith, et al., 2005) and is one of the new interdisciplinary fields embodied in the NIH Roadmap. It aims to invent new techniques or apply existing ones to address the fundamental questions that prevention science seeks to answer and to develop ways to present these findings not only to the scientific community but also to policy makers, to advocates and com- munity and institutional leaders, and to families, the ultimate potential beneficiaries of prevention programs and often, their potential consumers. Methodologists make inferences about program effects by relying on three things: (1) measures of key constructs, such as risk and protective factors or processes, symptoms, disorders, or other outcomes, and pro- gram implementation, fidelity, or participation; (2) a study design that â See http://nihroadmap.nih.gov/.
ADVANCES IN PREVENTION METHODOLOGY 265 Wang, et al., 2008). Besides its close collaboration with ongoing trials (Brown, Costigan, and Kendziora, 2008), PSMG has continued to maintain close ties to the developers of the Mplus statistical package, allowing for a seamless integration of new statistical models, broad application of these models in existing software, and application of these new methods in existing trials. A similar interdisciplinary methodological group, the Methodology Center, is located at Pennsylvania State University and is funded by NIDA and the National Science Foundation. The Methodology Center works in collaboration with preven- tion and treatment researchers to advance and disseminate statistical methodology related to research on the prevention and treatment of problem behavior, particu- larly drug abuse. This group has developed longitudinal models that address the unique aspects of changes in drug use over time including latent transition analy- ses (Collins, Hyatt, and Graham, 2000; Chung, Park, and Lanza, 2005; Chung, Walls, and Park, 2007; Lanza, Collins, et al., 2005) and two-part growth models (Olsen and Schafer, 2001); missing data routines for large, longitudinal data sets (Schafer, 1997; Schafer and Graham, 2002; Demirtas and Schafer, 2003; Graham, 2003; Graham, Cumsille, and Elek-Fisk, 2003); designs and inferences that take into account varying dosages or levels of exposure to an intervention or adaptive interventions (Bierman, Nix, et al., 2006; Collins, Murphy, and Bierman, 2004; Collins, Murphy, and Strecher, 2007; Murphy, 2005; Murphy, Collins, and Rush, 2007; Murphy, Lynch, et al., 2007), and cost effectiveness (Foster, Porter, et al., 2007; Foster, Johnson-Shelton, and Taylor, 2007; Olchowski, Foster, and Webster- Stratton, 2007). determines which participants are being examined, how and when they will be assessed, and what interventions they will receive; and (3) statistical analyses that model how those given an intervention differ on outcomes compared with those in a comparison condition. This chapter discusses statistical designs and analyses, as well as offering comments about mea- sures and measurement systems. While there are important technical issues to consider for measurement, design, and analysis, the community and institutional partnerships that are necessary to create and carry out a mutu- ally agreed-on agenda are critical to the development of quality prevention science (Kellam, 2000). We discuss first the uses of randomized preventive trials, which have led to an extraordinary increase in knowledge about prevention programs (see Chapters 4 and 6). Because well-conducted randomized preventive t Â rials produce high-quality conclusions about intervention effects, they have achieved a prominent place in the field of prevention research. Despite
266 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS their clear scientific value, randomized experiments of prevention programs are often viewed warily by communities and institutions, and their place in community prevention studies is often questioned. Since trials can be conducted only under the aegis of communities and their organizations, this chapter presents information about these trials so community leaders and policy makers can make informed decisions about whether such trials match their own community values and meet their needs, or if alternative designs are needed. The chapter also reviews the use of other designs, including natural experimental designs and nonexperimental designs to examine a programâs effects, whether a training model works, and whether a program can be implemented with sufficient strength or fidelity in different communities. Next comes an overview of statistical analysis methods that incor- porate longitudinal and multilevel data from prevention studies to model how interventions affect young peopleâs development in different contexts. We discuss the unique strengths of qualitative data in prevention research and ways that qualitative and quantitative data can be used alongside one another. Finally, the chapter identifies challenges that have not yet been met in addressing the fundamental research questions in the prevention field. Evaluating a Preventive Intervention with a Randomized Preventive Trial Randomized preventive trials are central in evaluating efficacy (impact under ideal conditions) or effectiveness (impact under conditions that are likely to occur in a real-world implementation) of specific intervention programs that are tested in particular contexts (Coie, Watt, et al., 1993; Kellam, Koretz, and Moscicki, 1999; Howe, Reiss, and Yuh, 2002; Kellam and Langevin, 2003). The design for a randomized trial divides participants into equivalent groups that are exposed to different interventions, and analysis that appropriately compares outcomes for those exposed to differ- ent interventions leads to inferential statements about each interventionâs effects. A well-conducted randomized trial is a high-precision instrument that leads to causal statements about a programâs effect so that one can be assured that any observed differences are due to the different interventions and not some other factor. Randomization strengthens confidence in the conclusions about an interventionâs impact by ensuring the equivalence of the intervention and the control groups. Because of random assignment, participants in the two intervention conditions are nearly equivalent prior to the study, both on measured characteristics, such as age, gender, and baseline risk, and on relevant characteristics that may not be measured, such as community readiness. With randomized assignment to these groups, it is possible to
ADVANCES IN PREVENTION METHODOLOGY 267 test for the effect of an intervention even when a community is undergoing major, uncontrolled societal changes, such as a recession. Other designs, for example those that compare a cohort exposed to intervention with the cohort in a previous year, may be more likely to reach erroneous conclu- sions because of differences between the two groups (e.g., different eco- nomic circumstances) that may be undetected or difficult to account for in the analysis. In prevention science, evaluation trials are usually conducted only after substantial preliminary data demonstrate that the intervention shows promise. Initially a theoretical model of the development of a disorder, or etiology, is used to specify risk and protective factors that can be selectively targeted in preventive interventions. For example, social learning theory posits that for many children, conduct disorder arises from the learned behavior of children exposed to repeated coercive interactions in the family. This etiological theory is then used to identify potential mediators (risk or protective factors), such as inconsistent and punitive parental responses to the child and association with deviant peers, in a causal model for outcomes of substance abuse disorders or delinquency. A theory of change is then used to identify an existing intervention or to develop a new preventive intervention aimed at these target risk or protective factors. In a program aimed at preventing substance abuse and delinquency among children who are returning to parental care from a fos- ter placement, a parent training intervention might be designed to reduce punitive statements, to enhance communication with the child, and to improve linkages with the childâs own parents and teacher in preparation for the critical transition period of return to the family of origin. The tim- ing of the intervention may be a consideration as well as the content. Key transition periods may occur when a stage of life begins, such as entry into elementary or middle school or during times of stress, such as a parental divorce or separation. Measures are developed to assess these risk (e.g., punitive and inconsis- tent parenting) and protective factors (e.g., communication and monitoring of the child over time) to assess the effect of the intervention on parental behavior, and to determine whether changes in these hypothesized media- tors actually lead to reductions in deviant behavior among young people. In a pilot study with a few dozen families, data can be collected to check whether the trainers are delivering the program as designed to the original custodial parents, whether the parents are changing their interactions with their children appropriately, and whether the predicted immediate behav- ior changes are seen among the children. After successful completion of this initial work, a randomized trial with a larger number of families can then be used to test this preventive intervention on a defined population of foster children (e.g., those in group care) and at a set time preceding their
268 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS return to their families. Upon the trialâs completion, intent-to-treat analyses are typically used to assess overall effects as well as examine the condi- tions under which the intervention effect varies by child, family, or service provider characteristics. To understand how behavior is modified over the longer term by this intervention, the children are typically followed for a year or more beyond the end of the intervention services. Finally, mediation analyses are used to understand how the effects of an intervention actually take place. Both efficacy and effectiveness trials require appropriate analyti- cal models to produce valid statements about intervention effects (Brown, Wang, et al., 2008). Substantial investment in both time and money is required to conduct a randomized preventive trial. This process begins with framing the theoreti- cal basis for a preventive intervention; then moves on to partnering with communities around an appropriate design, selection and recruitment of the sample, random assignment to intervention conditions, collection of data while adhering to the protocols specified by the design; and finally analysis of data and reporting of the results. The payoff for this work is described in three sections below. Evaluating the Effects of Preventive Interventions Some randomized preventive trials examine questions of program effi- cacy, or impact under ideal conditions, and can also help determine whether the intervention affects hypothesized mediators and proximal targets in expected ways. These efficacy trials are conducted in settings in which the intervention fidelity is maintained at a high level, usually by having trained researchers deliver the intervention rather than by individuals from the community. The intervention itself can be delivered in research labora- tory settings outside the community (Wolchik, Sandler, et al., 2002) or in schools or other settings that serve as the units that are randomized to the intervention or control conditions (Conduct Problems Prevention Research Group, 1992, 1999a, 1999b; Reid, Eddy, et al., 1999; Prado, Schwartz, et al., 2009). Efficacy trials require randomization of youth to either the new intervention or to standard settings so that a comparison of outcomes can be made. Some communities have a concern that youth assigned to the control or standard setting do not receive the intervention and thereby do not receive its potential benefit. These concerns can at times be mitigated, as discussed below. Other randomized trials address questions of effectiveness, or impact under settings that are likely to occur in a real-world implementation of a preventive intervention (Flay, 1986). An effectiveness trial tests a defined intervention that is delivered by intervention agents in the institutions and communities in a manner that would ultimately be used for large-scale
ADVANCES IN PREVENTION METHODOLOGY 269 implementation. This typically requires a stronger community partnership and involvement in all aspects of the study design and conduct. Any com- munity concerns about withholding a new intervention from youth who are randomly assigned to the control or standard condition need to be addressed directly, because of ethical and human subject concerns, as well as from the practical side of maintaining the study design in a field setting. Often, communities come to consider randomization as a fair way to assign a novel intervention program to its community, given insufficient resources to deliver to everyone at once. Communities may want to test one inter- vention that they have already adopted but not fully implemented; it may be acceptable to compare an enhanced version of this intervention to that already being used (Dolan, Kellam, et al., 1993). Also, for some Âstudies, it may be possible to provide the new intervention later to those who were initially assigned to the control setting (Wyman, Brown, et al., 2008); such wait-list designs, however, allow for only short-term, not long-term evalu- ations of impact. Using Preventive Trials to Improve an Intervention An equally important goal of randomized preventive trials is to search for ways to improve in an intervention. A specific intervention that targets a single risk factor, such as early aggressive behavior, can be used in a randomized trial to test a causative link between this risk factor and later behavior or emotional disorders (Kellam, Brown, et al., 2008). Specifically, if one found that the intervention did change the target risk factor, and this led to reduced disorders, it would provide support for the underlying etiological theory. For example, elaborated statistical analyses of interven- tion impact can show who benefits from or is harmed by an intervention, how long the effects last, and under what environmental circumstances these effects occur. Interventions may deliver different levels of benefit or harm to different kinds of participants or in different environments (Brown, Wang, et al., 2008), and information about these differences can extend the causal theory as well as guide decisions on whether to adopt or expand a prevention program or to attempt to improve outcomes through program modification. For example, one first-grade intervention was found in a randomized trial to produce improvement in mathematics achievement, but all of this gain occurred among children who began school with better than average mathematics achievement; those who were below average gained nothing compared with children in the control group (Ialongo, Werthamer, et al., 1999). However, a behavioral component of this intervention was found to have a beneficial impact on precursors to adolescent drug use (Ialongo, Werthamer, et al., 1999). In follow-up research studies, the mathematics
270 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS curriculum has been discontinued but the behavioral program has been continued. For the school district, the benefits of this trial were more immediate. In another example, a study of young adolescents at risk for delinquency tested three active preventive intervention conditions against a control: a par- ent intervention alone, a peer-based intervention, and a combined peer and parent intervention. The parent condition alone produced a beneficial out- come; the combined peerâparent intervention produced results similar to the control; and the peer-based intervention produced more delinquency than did the other conditions (Dishion, Spracklen, et al., 1996; Â Dishion, Â Burraston, and Poulin, 2001; Dishion, McCord, and Poulin, 1999). Detailed examina- tion revealed that the at-risk adolescents were learning deviant behavior from the more deviant peers in their group before, during, and after the program. This adverse, or iatrogenic, effect when a peer group includes a high propor- tion of delinquent youth is thought to be a major factor in explaining why boot camps and other similar programs often show a negative impact (Welsh and Farrington, 2001). In this way, analysis of intervention failures can be highly informative in guiding new prevention programs. Testing Whether a Programâs Population Effect Can Be Improved by Increasing the Proportion Who Participate In randomized trials with individual- or family-level assignment, often a large fraction of those randomly assigned to a particular intervention never participates in that intervention, even after consenting (Braver and Smith, 1996). This minimal exposure from not coming to intervention sessions means that they cannot benefit from the intervention. Would the intervention be more effective if one could increase participation? Or would outreach to a more difficult-to-engage portion of the population be counterÂ productive, because they already have the skills or resources that the inter- vention develops, or because the intervention does not meet their needs? Given the generally low level of participation in many effective interven- tions, it has been increasingly important to identify ways to increase a programâs reach into a community to those who could benefit (Glasgow, Vogt, and Boles, 1999). Some designs help evaluate these self-selection effects. One option is to use âencouragement designsâ under which individuals are randomly selected to receive different invitation strategies, reinforcers, or messages to encourage acceptance of an intervention. This approach can be seen in an evaluation of the impact of Head Start programs by the Administration for Children and Families (2005). Because these programs were already avail- able in most counties in the United States, and the program is viewed as a valuable resource, especially for poor families, it was considered unethical
ADVANCES IN PREVENTION METHODOLOGY 271 to use a design that withheld a childâs access to this program. Instead, in selected Head Start sites around the country, 3-year-old children and their families were randomized to one of two conditions: enrolling in a Head Start center at age 3 (early Head Start) or enrolling in the same center at age 4 (later Head Start). Those entering at age 3 were also accepted for enrollment at age 4. About 75 percent of the families enrolled their chil- dren in Head Start at the assigned age. Among the remaining 25 percent, some 3-year-olds randomized to early Head Start enrolled at age 4, some randomized to later Head Start enrolled at age 3, and some did not enroll in Head Start at all. This encouragement trial attempts to modify the time of enrollment in Head Start. If all enrollments matched the assigned condition, standard or intent-to-treat analyses would provide legitimate causal inferences about the effects of the timing of enrollment. Because one-quarter of the parents made enrollment decisions contrary to the assigned condition, the intent-to-treat analysis, which makes no allowance for deviations from the assigned condi- tion, provides a biased estimate of the causal effect of the intervention. Use of Preventive Interventions to Test and Elaborate Theories of Change and Development Although using preventive interventions to test and elaborate theories of change and development is the least practical reason for conducting trials, it may be the most important for generating new knowledge. The empirical findings from prevention science experiments can also be used to refine and modify the etiological theories that were used to guide the development of the intervention. Indeed, this bootstrap processâusing an incomplete theory to guide the development of an intervention (Sandler, Gersten, et al., 1988) and then using the empirical results to advance the theory and fill in critical gapsâis a hallmark of the current prevention science model. It is also an atypical model in experimental sciences. A traditional epidemiological approach to treatment of an existing disorder, such as schizophrenia, generally uses a randomized trial to test a specific treatment at a certain dosage and length, with the analyses showing whether the treatment had a positive effect. Before conducting a treatment trial using this traditional approach, the hypothesized etiological model is often highly developed, and only when the pharmacokinetics and other factors are well understood is the treatment tested in a rigorous random- ized trial. With modern preventive trials, the experimental trial is, in contrast, often used to inform etiological theory at the same time. An etiological model of drug use, for example, is based on malleable risk and protective factors that can then be targeted by an intervention (Kraemer, Kazdin, et
272 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS al., 1997; Botvin, Baker, et al., 1990; Botvin, 2004). The preventive inter- vention tests both the malleability of identified risk factors and the causal chain leading from these risk factors to distal outcomes (Snyder, Reid, et al., 2006). These causal chains can be tested with mediation modeling (MacKinnon, 2008), which decomposes the overall effects into those that follow hypothesized pathways and those whose pathways are not identified. A mediation model that explains most of an interventionâs impact through the hypothesized pathways confirms the underlying theoretical model of change, whereas if the hypothesized pathways contribute little explanatory power, a new theory (or better mediating measures) needs to be developed to explain an interventionâs effects. More detailed models of etiology can be developed with analyses that examine the variations across subgroups and environments in the impact of an intervention on both mediators and distal outcomes (Kellam, Koretz, and Moscicki, 1999; Howe, Reiss, and Yuh, 2002; MacKinnon, 2008). For prevention of drug use, for example, a universal intervention that (1) builds social skills to resist the use of drugs, (2) gives feedback to young people about the true rate of peersâ drug use, and (3) enhances coping skills could well have very different effects on young people who are current drug users and those who are nonusers. Understanding such differences can lead to an elaboration of knowledge of how peer messages and media images influence initiation and escalation behavior, as well as the roles played by personal and social skills (Botvin and Griffin, 2004). Griffin, Scheier, and colleagues (2001), for example, identified psychological well-being and lower positive expectancy toward drug use as key mediators between competence skills and later substance use. Preventive trials can also examine the causal role of a particular risk factor when it is targeted by an intervention. For example, continuing aggressive or disruptive behavior early in life is a strong antecedent to a wide range of externalizing behaviors for both boys and girls (Ensminger, Kellam, and Rubin, 1983; Harachi, Fleming, et al., 2006). While these behaviors are much less frequent for girls than for boys, the long-term risk of any problem behavior is high for both sexes (Bierman, Bruschi, et al., 2004). Nevertheless, there are important differences in the specific risks and mediation pathways (Moffitt, Caspi, et al., 2001; Ensminger, Brown, and Kellam, 1984). The long-term link between individual-level aggression in first grade and adult antisocial personality disorder has been found to be both stronger and also more malleable by the Good Behavior Game (see Box 6-8) for boys compared with girls (Kellam, Brown, et al., 2008), which points to differences in the causal role of this risk factor for boys and girls.
ADVANCES IN PREVENTION METHODOLOGY 273 Using Randomized Trials to Address Other Questions Randomization can be used in highly flexible ways in studies of pre- ventive interventions (Brown, Wyman, et al., 2007; Brown, Wang, et al., 2008), often answering different questions from the traditional randomized trial that focuses on efficacy or effectiveness alone (West, Biesanz, and Pitts, 2000). For example: â¢ Head-to-Head Impact. How beneficial is a preventive intervention program compared with another type of intervention? Preventive interventions can be compared not only with one another, but also with a service-based or treatment approach. In elementary school systems in the United States, for example, many incoming first grade children do not do well in the first couple of years of school; nevertheless, most of these failing children are not provided remedial educational services until the third grade. It is feasible to compare the impact of a universal classroom-based preventive intervention aimed at improving childrenâs ability to master the social and educational demands at entry into first grade with a more traditional model that provides similar services at a later stage for children in serious need. â¢ Implementability. What effects come from alternative modes of training or delivery of a defined intervention? After demonstrating that an intervention is effective, one can examine different means of implementing that intervention, holding fixed its content. ÂWebster- Stratton (1984, 2000) has used such trials to demonstrate that self-administered videotapes are effective and a cost-effective way of delivering the Incredible Years Program (see Box 6-2) outside the clinic. â¢ Adaptability. How does a planned variation in the content and delivery of a tested intervention affect its impact? For example, the third-generation Home Visitor Trial, conducted by Olds, Robinson, and colleagues (2004) in Denver, compared the delivery of a home- based intervention by a paraprofessional with the standard inter- vention delivered by nurse home visitors. â¢ Extensibility. What impact is achieved when an intervention is delivered to persons or in settings different from those in the origi- nal trial? One question being addressed is whether Oldsâs work on nurse home visitors, which originally focused on high-risk new mothers, would work as well for all pregnancies. Encouragement designs (described above) are also extensibility trials, since they can be used to expand the population that would normally participate in these interventions.
274 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS â¢ Sustainability. Does impact continue as the time since completion of training increases? A sustainability trial compares the outcomes achieved by those who completed training earlier with outcomes for those who have just completed training or have not yet been trained (controls). For example, teachers can be randomized to start training at one of three times. At the end of the second training period, a sustainability trial would compare the outcomes achieved by the teachers who were trained first with those of the newly trained teachers and the teachers in the third training group. â¢ Scalability. What impact is achieved when an intervention is expanded to more settings? Using the same rolling system of teacher training as an example, a scalability trial would assess whether such an intervention maintains its effect as it is expanded system-wide. As it expands, the number of teachers requiring train- ing and supervision increases, and therefore a scalability trial tests the system-level responses to these demands. Using Randomized Preventive Trials to Meet the Needs of the Community Field experiments of prevention programs are guided by federal require- ments to maintain protection of human subjects. But they also require addi- tional active community support and oversight in the design and conduct of the trial. Through partnerships with researchers, communities and institu- tions can play a major role in all aspects of the trial, including framing the research around community goals, norms, and values; shaping the questions that are asked during the research, granting access to people, data, and intervention and evaluation sites; and holding researchers accountable for the study and reporting back to the community. These community and insti- tutional partnerships provide an added level of commitment and assurance of ethical conduct of research beyond those regulations required by univer- sities and research institutions for human subjectsâ protection. Most often these partnerships are facilitated by setting up community and institutional advisory boards that provide direction to researchers and memoranda of understanding between all parties. As mentioned above, communities often have major concerns about random assignment itself, which can be seen by parents, service providers, and administrators as manipulating, or as providing fewer opportunities for children, or as interference by outside researchers in the ways that chil- dren interact with schools, communities, or health systems. Also, service providers are concerned that the assessments made by researchers could be used to evaluate their performance. By active engagement of broadly representative community leaders, institutional leaders, and researchers
ADVANCES IN PREVENTION METHODOLOGY 275 around the issues of randomization, issues of trust and the social contract with researchers arise and need to be worked through to provide a base for conducting research in the community. For example, randomization can be seen as providing an equal chance for every child to receive a new intervention that cannot immediately be given to everyone. This process of âflipping a fair coinâ can be seen as an equitable way of distributing limited resources. From this process can come a study design that is acceptable from the communityâs and institutionâs perspectives as well as that of the researchers. Community-based participatory research, an intensive approach that involves the community in all phases of the research process, including specification of research questions and approaches (see Israel, Schulz, et al., 2003), is another potential approach to ensuring that trials meet the needs of the community. Similarly, partnerships that involve the systematic evalu- ation of interventions developed by community organizations in response to community priorities and values can increase their value to the community (see also Chapter 11). Scientific Logic Behind the Use of Randomized Preventive Trials Some in the scientific community believe that it is not possible to con- duct field trials of prevention programs to produce sound causal inferences about these programs. However, good, randomized, preventive trials share many of the qualities that scientists have come to expect from controlled clinical trials, including random assignment to intervention and procedures to limit attrition and selective dropout or bias in measurement (Brown, 2003; Brown, Costigan, and Kendziora, 2008). Preventive trials, however, have some unique aspects. First, it is virtually impossible to conduct a completely masked (or blind) psychosocial field trial the way double-blind clinical trials are conducted, in which neither the treating physician nor the patient knows whether an active drug or a placebo is used. A double-blind protocol provides a built-in protection against outcomes being influenced by patient or physician prefer- ences, expectations, or beliefs. In psychosocial preventive interventions, this type of blinding does not happen. The intervention agents, often teachers or parents, must receive training in the intervention and participate in its delivery. Furthermore, the participants are generally aware that they are receiving the intervention, if for no other reason than they experience a different environment determined by the intervention. The fact that randomized field trials cannot blind either the interven- tion agents or the study participants has important implications for the assessment of outcomes. It is important that these assessments be conducted by staff who do not know the participantâs intervention condition. This is
276 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS much easier to manage when participants are assigned individually to inter- vention or control conditions. It is more challenging in settings in which the intervention is applied to a whole group, such as a school, a classroom, or a medical or social service setting. Steps to reduce the chance that assessment biases influence conclusions about the interventionâs effect include revealing as little of the actual study design to the assessment staff as possible, con- ducting follow-up assessments in a random order of individuals or groups (e.g., schools), and incorporating direct observations of behaviors whenever possible (Brown and Liao, 1999; Brown, 2003; Snyder, Reid, et al., 2006; Brown, Wang, et al., 2008). Second, preventive field trials often require long evaluation periods and repeated measures that extend over different stages of life. By contrast, typical clinical trials often have relatively brief follow-up periods. The long f Â ollow-up periods for randomized field trials increase the potential for missing observations (âmissingnessâ) and loss of study participants, which c Â reates major challenges in both design and analysis. Often, multistage designs or designs with planned missingness can increase the efficiency of follow-up (Brown, Costigan, and Kendziora, 2008) as well as protect against potential sources of attrition bias (Brown, Indurkhya, and Kellam, 2000). Furthermore, effective fieldwork procedures now exist that help maintain low attrition (Murry and Brody, 2004). Advanced analytical techniques are also available for handling missing data, even in the face of high levels of missingness (Schafer, 1997). Another aspect of psychosocial preventive interventions is that they are often delivered in existing group settings, such as the classroom, school, family, or community. These group settings are âsocial fieldsâ that are strongly linked to many of the predictive risk or protective factors that affect mental health and drug abuse. They also establish norms, determine the relevant set of task demands for the child, and provide formal or infor- mal evaluations by natural raters that shape and mold childrenâs response to the demands in that particular social field (Kellam, Branch, et al., 1975; Kellam and Rebok, 1992; Kellam, 1990). Because many preventive interventions are carried out in these exist- ing social fields, they are tested in preventive trials that often randomize whole groups rather than randomize at the level of individuals in the groups (Raudenbush, 1997; Murray, 1998; Brown and Liao, 1999). A major con- sequence is that the statistical power of such a design depends most heavily on the number of groups in the study rather than the total number of par- ticipants. Thus a trial involving 500 students in each of four schools with the schools randomly assigned to two interventions has statistical power similar to a traditional one-level design with four individuals assigned to two interventions. The large number of students in this design contributes
ADVANCES IN PREVENTION METHODOLOGY 277 relatively little precision to inferences about impact because of the small number of schools in the design. The requirement for sufficient statistical power in group-based designs has led some researchers to conduct trials in a large number of schools or other group settings. Life Skills Training, for example, was carefully tested in 56 middle schools with approximately 70 children per school (Botvin and Griffin, 2004). Although a modest number of children per school is often sufficient to evaluate the overall strength of a group-based intervention compared with a control setting, additional participants may be required for more complex analyses. An examination of theory-driven hypotheses about how the intervention may vary as a function of baseline risk requires substantially more participants than would be required for examining over- all impact (Brown, Costigan, and Kendziora, 2008). Ways to Reduce Trial Size in Group-Based Randomized Trials In some circumstances, group-based trials are prohibitively expensive unless special designs and strategies are used to make them cost-effective. One approach is the statistical technique of blocking. Blocks refer to higher level units, such as a school, in which both the intervention and the control conditions are included. For example, assigning classes in the same school to different interventions would be a classroom-based design with the school used as a blocking factor, whereas assigning all classes from the same school to the same intervention would be a school-based trial without blocking. In deciding whether to randomize at the individual, classroom, or school level, for example, one needs to take into consideration both the most efficient way to deliver the intervention and the possibility of contami- nation, that is, when controls are inadvertently exposed to the intervention. In general, randomizing units that are at the same level as the unit of inter- vention (e.g., randomizing classes within a school for a classroom-based intervention) will provide the highest level of statistical power, provided contamination is limited (Brown and Liao, 1999). Other approaches can also be followed to increase statistical power in group-based randomized trials. Designs that force balance on group- level characteristics and then randomize or that form matched pairs of these groups followed by random assignment of one in each pair to each condition can sometimes lead to increases in statistical power. With small numbers of schools or other units, however, matching can sometime reduce power by decreasing the degrees of freedom that are available for testing intervention effects. Analytical methods can increase power as well. For example, a group- level covariate, such as the level of positive norms toward drug use in a school at baseline, can be used to adjust for differences by intervention
278 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS condition that remain after randomization for a school-based drug preven- tion trial. Indeed, the inclusion of a baseline variable measured at the level at which randomization occurs can often increase statistical power more than the inclusion of individual-level baseline variables. Even when there are no natural settings (e.g., schools) to use in imple- menting a prevention program (e.g., for families experiencing divorce), the intervention may still be designed and delivered in a group setting in the community (Wolchik, Sandler, et al., 2002; Sandler, Ayers, et al., 2003). BUILDING RIGOROUS CAUSAL INFERENCES FROM RANDOMIZED FIELD TRIALS At the time of the 1994 IOM report, prevention scientists generally had a limited understanding of the underlying framework for drawing causal conclusions about their interventions from randomized and nonrandomized experiments. There is now a greater understanding and appreciation of the design requirements that must be met for a trial to provide an adequate basis for making clear statements about the causal effect of an intervention. The most commonly used model for making causal inferences about the effects of an intervention is based on the Neyman, Rubin, Holland (NRH) approach of counterfactuals (Neyman, 1990; Rubin, 1974, 1978; Holland, 1986). Although these key publications were available before the 1994 IOM report was written, understanding of the significance of this work and its implications for study designs has matured since then. This theoretical approach considers that each individual in a two-arm trial could potentially have two outcomes, one when assigned to the first arm of the trial and a second when assigned to the second arm. Using this âpotential outcomeâ model, the true intervention impact for that individual is then defined to be the difference in these two outcomes. However, it is impossible to observe both outcomes for a single participant; the trial makes only one outcome available to measure. The remaining unobserved outcome for each individual is a counterfactual: what would this person have been observed to do if he or she had received the other intervention. Because it is not possible to observe the outcome under both the assigned intervention and the counterfactual, it is not possible to assess this causal impact for a single individual. With a randomized experiment, however, it is possible to compare the average response for those assigned to one intervention with the average response of those assigned to the other condition. The difference in average responses for those assigned to the two conditions (often adjusted for c Â ovariates) is generally interpreted as a causal effect of the intervention. The NRH approach provides conditions under which the difference in the average responses to the treatments is, in fact, an unbiased estimate for the average causal effect in the population. In nontechnical terms, the
ADVANCES IN PREVENTION METHODOLOGY 279 assumption that the estimate is unbiased depends on the following condi- tions being met (Rubin, 1974): â¢ The sample selected for study is representative of the population. â¢ As a whole, the participants assigned to the two intervention condi- tions are equivalent to one another. â¢ The intervention received is the same as the one randomly assigned. â¢ Any differences in assessment are unrelated to the intervention condition. â¢ Attrition or loss to follow-up is unrelated to the intervention condition. â¢ Each individualâs response under the assigned intervention is u Â naffected by the intervention conditions assigned to all others in the sample. Adhering to a specified study protocol for maintaining equivalence will go a long way toward satisfying many of these criteria. For example, when the assignment to an intervention is in fact random or a stratified random process, the second condition of equivalent intervention groups is satisfied. Likewise, attrition bias and assessment bias can both be minimized if the procedures for recontacting and reassessing participants in the follow-up period are performed blind to intervention condition (Brown and Liao, 1999; Brown, Indurkhya, and Kellam, 2000) or corrections are made for missing data at baseline. Possible Inferences in Response to Self-Selection One innovative change in the way prevention trials are now analyzed is to account for self-selection factors that differentiate those who choose to participate in the prevention program from those who do not. Consider- ation of self-selection factors is critical in examining the effects of preven- tion programs aimed at individual young people or families. Some decline to participate at all, others may participate in the intervention initially but drop out before the study is completed, and others may continue to partici- pate throughout the intervention period. It is tempting to compare the outcomes by level of participation and interpret any differences as being due to the effects of the intervention. For example, one might find that, on average, those exposed to the full intervention had poorer outcomes overall compared with those who did not participate. This might suggest the conclusion that the intervention was harmful. However, these observed differences alone are not a sufficient basis for statements about program effect or causality, and indeed such an intervention could well be beneficial for those who participate, despite the
280 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS finding above. The problem with making conclusions taking into account level of participation is that the participants with greater involvement may have a higher baseline risk than those with more limited or no participa- tion, and therefore those who self-select into the intervention could end up having worse outcomes than those who do not participate, regardless of intervention effect. The design and analysis of studies can aid in distinguishing the effect of the intervention from the effects of self-selection. Individual participa- tion can be measured only in those randomized to the intervention group, because those in the control group are not offered the opportunity to par- ticipate. Nevertheless, a randomized trial design makes it possible to treat the control group as a mixture of would-be participants and would-be nonparticipants. Thus, with appropriate assumptions, it is possible to arrive at causal inferences about the intervention effect on those who would be participants in an intervention. This is an example of the general approach called âprincipal stratificationâ (Bloom, 1984; Angrist, Imbens, and Rubin, 1996; Frangakis and Rubin, 1999, 2002; Jo and MuthÃ©n, 2001; Jo, 2002; Jo, Asparouhov, et al., in press). Such analyses are extremely valuable in that they characterize not only the effects of an intervention on participants, but also who chooses to participate in an intervention. Distinct Ethical Issues for Conducting Preventive Trials In treatment studies, the existing standards for ethical conduct of research dictate that it is improper to withhold an effective, safe treat- ment from participants. Thus because there are successful treatments for schizophrenia, it would be inappropriate and unethical to evaluate a new antiÂpsychotic drug in a randomized trial that assigned some psychotic individuals to receive a placebo. The ethical considerations are differ- ent, however, in testing an antipsychotic drug for its ability to prevent schizophrenia or psychotic episodes in individuals exhibiting prodromal or preclinical signs or symptoms of schizophrenia. Although a few small ran- domized trials suggest that low-dose resperidone along with family therapy may provide some preventive value for adolescents who are at high risk for developing schizophrenia (McGorry, Yung, et al., 2002), the potential for causing side effects or otherwise harming individuals with these powerful drugs must be considered. In the case of a disorder that has not yet been manifest and an intervention that is known to have significant side effects, âdoing no harmâ has to be considered in order to decide whether it is e Â thical to conduct this kind of trial. One potential way to deal with some of these ethical concerns, when there is a very real possibility of doing harm, is to use a mediational model
ADVANCES IN PREVENTION METHODOLOGY 281 to predict who is likely to benefit most from this type of antipsychotic drug. This type of mediation design (Pillow, Sandler, et al., 1991) uses the trialâs inclusion/exclusion criteria to limit the trial to those whose signs or symptoms most closely match those targeted by the intervention. Limit- ing participants in the trial to those with prodromal symptoms as well as brain abnormalities associated with schizophrenia identifiable by magnetic resonance imaging, for example, may tip the benefit-cost ratio sufficiently to justify a trial (with appropriate consent) of a potentially risky pharma- cological intervention. The burgeoning availability of genetic and other biological information with tenuous links to specific disorders also elevates ethical considerations (see Chapter 5). Sometimes a design that would clearly be unethical or impractical with individual-level random assignment can be appropriate if conducted with group-level random assignment. This approach was used for practical reasons in a large preventive trial aimed at preventing the spread of HIV among Thai military conscripts through changes in sexual practices. Rather than randomly assign individuals in the same company to two different conditions, companies were matched within battalions and then randomly assigned to an active behavioral intervention or a passive diffusion model (Celentano, Bond, et al., 2000). Part of the rationale in such studies is that a community-wide preventive intervention cannot be implemented across a country at the same time, thus randomly assigning some of the com- munities to this intervention deviates from what would normally happen simply by using a fair method of assigning which communities receive the intervention first. Using Wait Lists to Randomly Assign When an Intervention Is Delivered In many situations, a community or government agency decides that all its young people should receive a new preventive intervention, even though the intervention itself has not yet been well evaluated. Indeed, in suicide prevention, for which few programs have been evaluated rigorously, communities frequently decide to saturate the community with a program. Under certain circumstances it is still possible to evaluate the effectiveness of such an intervention using a randomized design. For example, a standard wait-list design can be used to randomly assign half of the participants or groups to receive the intervention immediately and half to receive it later. Communities are often accepting of a standard wait-list design because there are benefits to both conditions: a community that initially receives the intervention has an opportunity to benefit immediately; the community with a delayed start has the opportunity to benefit from any enhancements of the intervention made on the basis of the initial experience. A disadvan-
282 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS tage with this design is that, because everyone receives this intervention within a short time frame, only short-term effects can be examined. How- ever, if groups are randomized, such as schools, and the wait list is delayed until the following cohort in the delayed schools, evaluation of longer term effects is still possible. This is because the first cohort contains participants who never receive the intervention. A type of randomized design that has only recently been used in pre- vention studies is called a dynamic wait-list design (Brown, Wyman, et al., 2006). In contrast to the standard wait-list design, in which an interven- tion is delivered either immediately or after a specified delay, the dynamic wait-list design randomly assigns participants to one of three or more times to start the intervention. For time-to-event outcomes, the dynamic wait-list design has more statistical power because it increases the number of time periods, with most of the statistical gain occurring in moving from two to four or six time periods (Brown, Wyman, et al., 2006). This design was used in the school-based Georgia Gatekeeper Training Trial (Wyman, Brown, et al., 2008), in which 32 schools were randomly assigned to one of five start times for the training program, and the primary outcome was the rate at which suicidal youth were identified by the school. Ethical Issues for Prevention When Variation in Intervention Impact Is Found Researchers are beginning to identify different degrees of benefit or harm from an intervention across different subgroups on the basis of baseline characteristics and contexts. If one finds that one subgroup shows consistent benefits and another shows that the same intervention causes them to do worse, then both the use and the nonuse of this program will cause some harm to a segment of the population. Another situation that may arise is a finding of benefit on some outcomes but compensatory harm on others. There is reason to believe that genetic variations, whose prevalences are due to evolutionary pressures, provide either advantages or disadvantages in adaptive response to specific environments (see Chapter 5). As one begins to look at how a complex preventive intervention affects individuals with specific genetic characteristics, it would not be surprising to find allelomorphic variation in outcomes, or that positive as well as negative outcomes can occur for those with a single allele. Any of these occurrences raises questions about the use of an intervention and should suggest continued work to adapt the intervention to specific individual and environmental situations.
ADVANCES IN PREVENTION METHODOLOGY 283 EMERGING OPPORTUNITIES FOR PREVENTION TRIALS Preventive Trials for Disorders with Low Prevalence The prevention field still has relatively little information about effective interventions for conditions that occur infrequently. In designing preven- tion trials for low-base-rate disorders and outcomes, such as schizophrenia (Faraone, Brown, et al., 2002; Brown and Faraone, 2004) and suicide (Brown, Wyman, et al., 2007), the sample sizes necessary to obtain suf- ficient statistical power often seem prohibitively large. For example, a universal preventive trial aimed at a 50 percent reduction in youth suicide in the general population would require more than 1,000,000 person-years of observation. Although a study this large is often considered impractical, some novel alternatives exist. One approach is to combine data across a cluster of similar trials by using a common outcome, such as death from sui- cide or unintentional causes, for a long-term follow-up assessment. Data on mortality outcomes can be collected relatively cheaply using the National Death Index. An approach that aggregates data across studies will have to take into account variation in impact across studies with random effects, just as in meta-analysis (Brown, Wang, and Sandler, 2008). An important strategy that other health fields use to test interventions on low-base-rate outcomes is to assess the impact of the intervention on a more common surrogate endpoint that has been identified as an antecedent risk factor for the outcome of interest. The rate of HIV seroconversion, for example, is sufficiently low in the general U.S. population that most HIV prevention trials use a reduction in HIV risk behavior as their primary out- come. Likewise, suicide attempts can serve as a surrogate for suicide itself, because there are roughly 100 times more suicide attempters than suicide completers, and attempt is a strong predictor of future suicide. The use of suicide attempts as an outcome would allow for sufficient statistical power with a much smaller study population. Evaluating the Components of Interventions and Adaptive Interventions Trials to examine the functioning of distinct components of an interven- tion may be needed, as when a comprehensive prevention program, such as Life Skills Training (see Box 6-1), has multiple components or modules that have been incorporated over the years. Although an intervention is normally tested in its entirety, the contribution of separate components can be examined through such approaches as study designs that deliver selected components (Collins, Murphy, and Bierman, 2004) or by examining the strength of different mediational pathways (West, Aiken, and Todd, 1993; West and Aiken, 1997).
284 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS Testing components is also necessary in preventive interventions that are designed to be flexible, so that the program can be tailored to the specific needs of the participants. Fast Track, for example (see Box 6-9), was a randomized trial aimed at preventing the consequences of aggres- sive behavior from first grade through high school (Conduct Problems Prevention Research Group, 1992, 1999a, 1999b). Over the course of the 10-year study, each participant in the intervention condition received specific program components that were deemed most appropriate based on his or her risks and protective factors at a given point in life. By the end of the study, the set of interventions and their Âdosages or durations differed substantially from person to person. Analytical techniques are available to disentangle some of the effects of dosage from different levels of need, but the use of designs, especially with multiple levels of randomization, may provide clearer insight into the effects of the intervention components (Murphy, van der Laan, et al., 2001; Murphy, 2003; Collins, Murphy, and Strecher, 2007). Testing Prevention Components There is also interest in testing whether small, relatively simple ele- ments of a prevention program can be successfully applied in different contexts. For example, implementation of the Good Behavior Game in first and second grade, which gave teachers an extensive period of training and supervision and included the creation of a support structure in the school and the district, was found to have long-term benefits for high-risk boys (Kellam, Brown, et al., 2008; Petras, Kellam, et al., 2008). In an effort to provide this intervention at reduced cost, others have attempted to imple- ment the Good Behavior Game intervention using much less training and system-level support (Embry, 2004). Because the training received as part of one intervention becomes part of a teacherâs toolkit, it would be useful to evaluate the subsequent effects of the differences in teachersâ training and support in conjunction with the Good Behavior Game on levels of aggressive behavior in their students. Program components can be tested by themselves by randomizing which teachers, or other such intervention agents, are to receive no training, low training, or high training. Using the Internet for Randomized Preventive Trials The Internet presents new opportunities to deliver preventive interven- tions to a diverse and expanding audience and to test the interventions in large randomized trials. With the delivery of a prevention program through the web, the opportunity exists to test new or refined components using
ADVANCES IN PREVENTION METHODOLOGY 285 random assignment and to revise the program in response to these results using methods described by Collins, Murphy, and Bierman (2004) and West, Aiken, and Todd (1993). Internet-based programs are also likely to present methodological chal- lenges. First, a randomized trial would typically depend on data from self- reports obtained through the Internet, and uncertainty as to the validity of these data, as well as the proportion of participants willing to respond to long-term evaluations, could limit the evaluation plan. It may be necessary to use a multistage follow-up design (Brown, Indurkhya, and Kellam, 2000; Brown, Wang, et al., 2008), which would include a phone or face-to-face interview for a stratified sample of study participants. Sequencing of Preventive Trials and Selective Long-Term Follow-Up In most health research, trials are staged in a progression from basic to clinical investigations to broad application in target populations, allowing for an ordered and predictable expansion of knowledge in specific areas (e.g., Greenwald and Cullen, 1985). In the prevention field, rigorous evalu- ations of the efficacy of a preventive intervention can be lengthy, as are studies of replication and implementation. However, opportunities exist for strategic shortcuts. One approach is to combine several trials sequentially. For example, in a school-based trial, consecutive cohorts can serve different purposes. The first cohort of randomly assigned students and their teachers would comprise an effectiveness trial. In the second year, the same teachers, who continue with the same intervention condition as in the first year, along with a second cohort of new students, can be used to test sustainability. Finally, a third student cohort can be used to test scalability to a broader system, with the teachers who originally served as the interventionâs con- trols now also trained to deliver the intervention. A related issue involving the staging of trials is determining when there is sufficient scientific evidence for moving from a pilot trial of the inter- vention to a fully funded trial. In the current funding climate, research- ers often design a small, pilot trial to demonstrate that an intervention looks sufficiently strong to proceed with a larger trial. Reviewers of these applications for larger trials want to have confidence that the intervention is sufficiently strong before recommending expanded funding. However, as pointed out by Kraemer, Mintz, and colleagues (2006), the effect size estimate from the pilot trial is generally too variable to provide a good decision-making tool to distinguish weak from strong interventions. There is need for alternative sequential design strategies that lead to funding of the promising interventions. Another methodological challenge involving the review process is
286 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS deciding when an interventionâs early results are sufficiently promising to support additional funding for a long-term follow-up study. A limited number of preventive interventions have now received funding for long- term follow-up, and many of these have demonstrated effects that appear stronger over time (Olds, Henderson, et al., 1998; Wolchik, Sandler, et al., 2002; Hawkins, Kosterman, et al., 2005; Kellam, Brown, et al., 2008; Petras, ÂKellam, et al., 2008; Wilcox, Kellam, et al., 2008). It is difficult for reviewers to assess whether an interventionâs relatively modest early effects are likely to improve over time or diminish, and therefore some of the most promising prevention programs may miss an opportunity for long-term funding. NONRANDOMIZED EVALUATIONS OF INTERVENTION IMPACT Conducting high-quality randomized trials is challenging, but the effort and expense are necessary to answer many important questions. How- ever, many critical questions cannot be answered by randomized Â trials ( Â Greenwald and Cullen, 1985; Institute of Medicine, 1994). For example, Skinner, ÂMatthews, and ÂBurton (2005) examined how existing welfare pro- grams affected the lives of families. Their ethnographic data demonstrated that many families cannot obtain needed services because of enormous logistical constraints in reaching the service locations. In other situations, there may be no opportunity to conduct a true randomized trial to assess the effects of a defined intervention, because the community is averse to the use of a randomization scheme, because ethical considerations preclude conducting such a trial, or because funds and time are too limited. Even so, many opportunities remain to conduct careful evaluations of prevention programs, and much can be gained from such data if they are carefully collected. Indeed, much has been written about the limits of the knowledge that a standard randomized trial can provide, and natural experiments can sometimes provide complementary informa- tion (West and Sagarin, 2000). When a full randomized trial cannot be used to evaluate an interven- tion, an alternative study should be designed so that the participants in the intervention conditions differ as little as possible on characteristics other than the intervention itself. For example, it will be difficult to distinguish the effect of an intervention from other factors if a community that has high readiness is compared with a neighboring community that is not at all ready to provide the intervention. It may be necessary to work with both communities to ensure that they receive similar attention before the inter- vention starts as well as similar efforts for follow-up.
ADVANCES IN PREVENTION METHODOLOGY 287 Pre-Post Designs A pre-post design is another alternative to randomization. Such studies evaluate an intervention on the basis of the changes that occur from a base- line (the âpreâ measurement) to after the intervention period (the âpostâ measurement). This type of design can provide valuable information, par- ticularly when it supports a hypothesized developmental model involving known mediators that lead to expected prevention targets. However, the pre-post design suffers from confounding with developmental changes that are occurring in young people. On one hand, with drug use in adolescents, for example, the sharp increases in drug use with ageâas well as seasonal effectsâcould completely mask the potential benefit of an intervention. On the other hand, lower drug use after the intervention than before would sug- gest that the intervention has prevention potential. Also, pre-post designs can lead to erroneous conclusions if they involve selecting participants at high risk and assessing whether their risk goes down; improvement might be expected simply because of a regression to the mean effect. Interrupted Time-Series Designs An important way to improve pre-post designs is to include multiple measurements of variables of interest. A good example of this is the inter- rupted time series (or multiple baseline design extended to several groups), in which multiple measurements of the target behavior are made both before and after the intervention. Varying the timing of the intervention across participating individuals or groups, especially if assignment to an intervention time is randomized, can further strengthen the evaluation design. Policy changes, such as wide-scale implementation of a new pro- gram, changes in the law or changes in enforcement of existing laws, often provide opportunities to evaluate an intervention in this type of natural experiment. One example is the evaluation of policies that restrict tobacco sales to minors (Stead and Lancaster, 2005). In their examination of the effect of positive reinforcement to tobacco stores and sales clerks to avoid tobacco sales to minors, Biglan, Ary, and colleagues (1996), for example, repeatedly assessed the proportion of stores making underage sales both before and after the intervention, demonstrating that the behavior of clerks is modifiable. Regression Discontinuity Designs Another type of natural experiment that provides an opportunity for program evaluation occurs when strict eligibility criteria, such as age or level of risk along a continuum, are imposed for entrance into a program.
288 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS In such cases, the difference in regression intercepts, or the expected out- come when other variables are equal, for the outcome measure among those who were eligible and those who were not eligible provides an estimate of the intervention effect (Cook and Campbell, 1979). Gormley, Gayer, and Phillips (2005) used this design in concluding that a universal statewide prekindergarten program had a large impact on achievement. ADVANCES IN STATISTICAL ANALYSIS OF PREVENTION TRIALS At the time of the 1994 IOM report, virtually all published analyses were limited to examining an interventionâs impact on an outcome vari- able measured at a single point in time at follow-up. Analyses of impact in randomized field trials and longitudinal analyses were conducted indepen- dent of one another. Now, however, it is customary to use growth mod- eling techniques to examine trajectories of change using more extensive longitudinal data, with corresponding gains in statistical power (MuthÃ©n and Curran, 1997) and interpretability (MuthÃ©n, Brown, et al., 2002). Growth models can be a valuable tool in understanding the impact of interventions. Using Growth Models Most theories of change in prevention research posit an evolving effect on the individual that varies over time as new developmental stages are reached. Although it should be possible to detect intervention effects at a critical transition period using an outcome measured at a single time point, it is also possible to examine the impact of interventions using longitudi- nal data to show differences in individualsâ developmental trajectories or growth patterns (e.g., repeated measures of aggression or symptoms) by intervention condition. Often the patterns of growth can be summarized with a few Âparameters. By fitting individual-level data to linear growth curves, for example, an interventionâs effect can be summarized based on the difference in mean rates of growth for intervention and control participants. Other approaches might include latent growth modeling of different aspects of growth using quadratics and higher order polynomials, piecewise growth trajectories, and nonlinear growth models (MuthÃ©n, 1991). The effects of interventions may vary not only as a function of time, but also across individuals. For example, a universal intervention may have a stronger effect over time on those who start with higher levels of risk compared with those with lower levels of risk, as is now found in a number of preventive interventions (Brown and Liao, 1999; Brown, Wang, et al., 2008). Growth models that include an interaction between interven-
ADVANCES IN PREVENTION METHODOLOGY 289 tion condition and baseline levels of risk (MuthÃ©n and Curran, 1997) can capture such variation in impact over time. Growth mixture modeling is another analytic approach that allows individuals to follow one of several different patterns of change over time (MuthÃ©n and Shedden, 1999; Carlin, Wolfe, et al., 2001; MuthÃ©n, Brown, et al., 2002; Wang, Brown, and Bardeen-Roche, 2005). Its advantage over the interaction model described in the previous paragraph is its flexibility; for example, if the intervention causes low- and high-risk youth to receive benefit but youth with moderate risk are harmed, growth mixture models should detect these differential effects. Intervention effects can be modeled for each pattern of growth in risk behaviors over time, such as stable low levels of drug use, escalating levels, stable high levels, and decreasing levels. The results of such analyses may show, for example, that although a uni- versal intervention reduces drug usage among those who begin using drugs early, it may have the unintended effect of increasing drug usage in what began as a low-risk group. A result of this type should lead to a redesign of the intervention. Latent transition analyses (Collins, Graham, et al., 1994) are also used to examine changes in drug usage trajectories over time. These methods can directly model the changes in patterns of drug use over time and changes through exposure to an intervention. To distinguish drug initiation from escalation or similar qualitative versus quantitative differences in delin- quency (Nagin and Land, 1993), methods that allow censoring, truncation, and so-called two-part models (Olsen and Schafer, 2001) can now be used in growth mixture modeling and other complex analyses. For behavioral observation data, which has a prominent place in pre- vention research (Snyder, Reid, et al., 2006), multilevel random effects can be used to incorporate large tables of contingent responses or associations in complex mediation analyses (Dagne, Brown, and Howe, 2007). Similarly, analysis of trajectories can involve not only continuous data but also binary data (Carlin, Wolfe, et al., 2001), count data (Nagin and Land, 1993), and time-to-event or survival data (MuthÃ©n and Masyn, 2005). In addition, many analytical tools are available to examine different types of variables in the same model, so that continuous measures can be used to assess the impact of an intervention on growth trajectories through one stage of life while impact on adult diagnoses is measured as a dichotomous variable (MuthÃ©n, Brown, et al., 2002). All these methods provide opportunities to specify and test precise questions about variation in the impact of an intervention. However, erro- neous conclusions are possible if the underlying processes are not carefully modeled (Carlin, Wolfe, et al., 2001; Wang, Brown, and Bandeen-Roche, 2005).
290 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS Multilevel Modeling of Intervention Effects Multilevel modeling of contextual effects, such as the school, has also been well integrated into the evaluation of preventive trials. At the time of the 1994 IOM report, it was rare for published analyses of group-based randomized trials to correct for nonindependence among the participants in a group. As a result, they could erroneously report impact when it was not statistically significant. In a trial with 20 schools, half of which are randomized to a prevention program, the correct statistical test of impact is based on the number of schools, not the number of children, which may be several orders of magnitude larger (Murray, 1998). Now it is expected that published papers of group-based randomized experiments will use multiÂ level analysis (Raudenbush, 1997) or generalized estimating equations and sandwich-type estimators (Zeger, Liang, and Albert, 1988; Brown, 1993b; Flay, Biglan, et al., 2005) to account for group randomization. Modeling That Incorporates Growth and Context in the Same Analysis At the time of the 1994 IOM report, it was customary to report only the overall impact of an intervention in a population. Since then, statistical modeling has advanced so that longitudinal and multilevel modeling can now be handled in the same analysis. It is common to see analyses that include both individual growth and multiple levels of nesting, such as chil- dren nested within classrooms and schools (Gibbons, Hedeker, et al., 1988; Brown, Costigan, and Kendziora, 2008). Analyses can examine how change occurs across multiple levels (Raudenbush and Bryk, 2002) and examine impact across both individuals and contextual levels with different types of growth trajectories (MuthÃ©n, Brown, et al., 2002; MuthÃ©n and Asparouhov, 2006; Asparouhov and MuthÃ©n, 2007). Handling of Missing or Incomplete Data A major advance has been the treatment of missing data in statistical analysis of longitudinal data. When the previous IOM report was written, most published analyses of intervention impact simply deleted any miss- ing cases. Now most impact analyses make use of full maximum likeli- hood methods (Dempster, Laird, and Rubin, 1977) or multiple imputations (Rubin, 1987; Schafer, 1997; Schafer and Graham, 2002; Demirtas and Schafer, 2003; Graham, 2003; Graham, Cumsille, and Elek-Fisk, 2003). These techniques are especially important for evaluating impact across long periods of time, because data will be incomplete for many of the partici- pants and differentially across contexts.
ADVANCES IN PREVENTION METHODOLOGY 291 Intent-to-Treat and Postintervention Modeling The traditional standard of intent-to-treat analyses used to analyze clinical trials has been extended to multilevel and growth modeling for randomized field trials. This approach overcomes the challenges in handling dropin and dropout and other types of missing data that regularly occur in prevention trials (Brown, Wang, et al., 2008). So-called intent-to-treat analyses, or analyses based on the assigned rather than the actual inter- vention or treatment, are generally used as the primary set of models to examine intervention effects overall and for moderating effects involving individual-level and group-level baseline characteristics. These traditional methods of examining the effects of an intervention can be supplemented with postintervention analyses. The postintervention approach takes into account the intervention actually received by each participant (Wyman, Brown, et al., 2008), the dosage received (Murphy, 2005; Murphy, Collins, and Rush, 2007; Murphy, Lynch, et al., 2007), and the level of adherence (Little and Yau, 1996; Hirano, Imbens, et al., 2000; Barnard, Frangakis, et al., 2003; Jo, Asparouhov, et al., in press), as well as the interventionâs effect on different mediators (MacKinnon and Dwyer, 1993; MacKinnon, Weber, and Pentz, 1989; Tein, Sandler, et al., 2004; MacKinnon, Lockwood, et al., 2007; MacKinnon, 2008). METHODOLOGICAL CHALLENGES AHEAD FOR PREVENTION RESEARCH As the field of prevention science matures, important new develop- ments in methodological research will be needed to meet new challenges. Some of these challenges include (1) integrating structural and functional imaging data on the brain; (2) understanding how genetics, particularly geneâÂenvironment interactions, can best inform prevention; (3) testing and evaluating implementation strategies for prevention programs; and (4) modeling and expressing effects of prevention for informing public policy. Incorporating imaging and genetics data into analyses will require the ability to deal with huge numbers of voxels, polymorphisms, and expressed genes. The large literature on data reduction techniques and multiple comparisons may provide a basis for methods for studying media- tional pathways, expressed genes, and geneâenvironment interactions that may influence prevention outcomes and should be considered in interven- tion designs. Also, as the body of evidence for effective programs continues to grow, demand will increase for evaluations of alternative strategies for implementing such programs. Finally, the ability to model the costs as well as the effectiveness of different preventive interventions for communities
292 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS will allow for policy decisions made on the basis of the best scientific find- ings. These issues are discussed in more detail in Part III. CONCLUSIONS AND RECOMMENDATIONS Since the 1994 IOM report, new methodological tools have been devel- oped that enable more nuanced analysis of outcomes, more sophisticated designs that enable randomized assignment, and more reliable outcomes. These advances in modern statistical approaches have been particularly useful in the context of field trials of preventive interventions that face par- ticular randomization challenges not usually relevant to clinical trials. Conclusion: Significant advances in statistical evaluation designs, mea- sures, and analyses used in prevention research have contributed to improved understanding of the etiology of emotional and behavioral disorders and related problem behaviors since 1994. Prevention methodology has enabled the use of refined statistical and analytical techniques to be used in an iterative manner to refine inter- ventions, for example, by identifying components or groups for which the intervention is most successful and to further develop theories about causal mechanisms that contribute to the development of problems or to an interventionâs results. Conclusion: Improved methodologies have also led to improved inter- ventions, etiological theories, and theories of change. The highest level of confidence in the results of intervention trials is provided by multiple well-conducted randomized trials. In addition, for some areas of prevention, the types of designs that are typically used have relatively limited ability to produce unambiguous causal inferences about intervention impact because of statistical confounding or inadequate controls, low statistical power, lack of appropriate outcome measures, or attrition. In these situations, it is important to develop additional evalu- ation designs that provide more rigorous testing of these interventions. Furthermore, few interventions have been tested for long-term outcomes despite the availability of appropriate methodologies. Several interventions have demonstrated effects on reducing multiple disorders and other related outcomes, such as academic performance. The value of preventive inter- ventions would be significantly strengthened if long-term results could be demonstrated on a more consistent basis.
ADVANCES IN PREVENTION METHODOLOGY 293 Recommendation 10-1: Research funders should invest in studies that (1) aim to replicate findings from earlier trials, (2) evaluate long-term outcomes of preventive interventions across multiple outcomes (e.g., disorders, academic outcomes), and (3) test the extent to which each prevention program is effective in different race, ethnic, gender, and developmental groups. Being able to obtain replicable results is one of the hallmarks of science, since lack of replicability raises questions about generalizability. Direct replicability corresponds to a test of the same intervention under very simi- lar conditions. Systematic replicability refers to testing of the intervention under conditions that are deliberately modified (e.g., intervention agent, trainer, length of program, target population) in order to examine whether the results change with these modifications (see Chapter 11 for discussion of adaptation to different populations). Given limited funding, lack of interest by review groups in direct replication, and the current state of knowledge about the effects of preventive interventions, we recommend that systematic replications are more appropriate than direct replications. Funding is often limited for evaluations that assess outcomes beyond the end of an intervention or a short time after the intervention. Yet dem- onstrating outcomes that endure increases confidence in an intervention and provides a more comprehensive test of the impact of the intervention on childrenâs lives and its benefit to society. Assessment of long-term outcomes would ideally include consideration of the sustainability of outcomes across developmental periods (Coie, Watt, et al., 1993). Given that most preven- tive interventions are designed to mitigate developmental processes that can lead to mental, emotional, and behavioral disorders and problems over time, assessment of whether proximal outcomes at one developmental period are sustained in distal outcomes at a later developmental period is needed. Several of the programs discussed in Chapters 6 and 7, including the Nurse-Family Partnership, Life Skills Training, Good Behavior Game, Strengthening Families 10-14, and the Family Check-up, have met this criterion. Although the Society for Prevention Research (Flay, Biglan, et al., 2005) has suggested six months as a minimum follow-up period, the committee considers this to be a necessary but insufficient time frame for the majority of outcomes. As statistical and methodological approaches have been developed in â For âoutcomes that may decay over time,â the Society for Prevention Research (Flay, ÂBiglan, et al., 2005, p. 2) recommends that evaluations include âat least one long-term follow-up at an interval beyond the end of the intervention (e.g., at least 6 months after the Âintervention.â The Society for Prevention Research standards also acknowledge that the interval may need to differ for different types of interventions.
294 PREVENTING MENTAL, EMOTIONAL, AND BEHAVIORAL DISORDERS response to ongoing evaluations over the past 15 years, advances in this area must continue to keep pace with and respond to new knowledge that affects prevention science. The significant rise in interventions with evidence of effectiveness, the importance of implementing interventions with Âfidelity, and the lack of empirical evidence on how to successfully implement inter- ventions will call for the development of new methodologies to explore various implementation and dissemination strategies (see also Chapter 11). This might include exploration of such questions as implementability, adaptability, extensibility, sustainability, and scalability. Conclusion: Methodologies to evaluate approaches to implementation and dissemination are less well developed than methodologies related to efficacy and effectiveness. Other recent research advances, including the results of imaging and other developmental neuroscience studies and findings related to the role of geneâenvironment interactions (see Chapter 5), provide new challenges and opportunities for intervention research and will require thoughtful consideration of design strategies. Recommendation 10-2: The National Institutes of Health should be charged with developing methodologies to address major gaps in cur- rent prevention science approaches, including the study of dissemina- tion and implementation of successful interventions. The methodologies developed should include designs to test alternative approaches to implementation and dissemination of evidence-based and community-generated prevention programs (see Chapter 11). Priority areas should also include approaches that link neuroscience methods and clinical research with epidemiology and prevention in understanding the etiology of mental health and of disorders and approaches that link theories developed through neuroscience research with preventive intervention approaches designed to test causal mechanisms (see Chapter 5).