Methodological Issues in Research on Educational Interventions
Research on educational interventions for young children with autism should inform consumers, policy makers, and scientists about practices that produce positive outcomes for children and families. Ultimately, such research should be able to demonstrate that there is a causal relationship between an educational intervention and immediate or long-term changes that occur in development, behavior, social relationships, and normative life circumstances. A primary goal of early intervention research is to determine the types of practices that are most effective for children with specific characteristics (Guralnick, 1997).
If young children with autistic spectrum disorders were homogeneous in intelligence, behavior, and family circumstances, and if researchers and educators could apply a uniform amount of treatment in nearly identical settings and life circumstances, then a standard, randomizedgroup, clinical-trial research design could be employed to provide unequivocal answers to questions about treatments and outcomes. However, the characteristics of young children with autistic spectrum disorders and their life circumstances are exceedingly heterogeneous in nature. This heterogeneity creates substantial problems when scientists attempt to use standard research methodology to address questions about the effectiveness of educational treatments for young children with autistic spectrum disorders.
In this chapter we examine a range of issues related to research designs and methodologies. We begin by discussing the different research literatures that could inform early intervention research but which currently are relatively independent. We then consider a range of method-
ological issues pertaining to research involving children with autistic spectrum disorders, including information useful for describing samples; the benefits and practical problems of using randomized, clinical trial research design and the movement toward treatment comparison and aptitude-by-treatment interactions; the relative benefits and limitations of single-subject research methodology; assessing fidelity of treatment; potential use of current methodologies for modeling developmental growth of children and factors affecting growth; and group size.
There are several distinct, substantial, and independent bodies of research addressing issues concerning young children with autistic spectrum disorders. One basic body of literature describes and attempts to explain the neurological (Minshew et al., 1997), behavioral (Sigman and Ruskin, 1999), and developmental (Wetherby and Prutting, 1984) characteristics of children with autistic spectrum disorders. A second body of research has addressed issues related to diagnosis, particularly early diagnosis, of autism (Lord, 1997) and the related issue of prevalence (Fombonne, 1999). A third body of literature has examined the effects of comprehensive treatment programs on the immediate and long-term outcomes for young children with autistic spectrum disorders and their families (e.g., Harris et al., 1991; McEachin et al., 1993; Rogers and DiLalla, 1991; Strain and Hoyson, 2000). A fourth body of research has addressed individual instructional or intervention approaches that focus on specific aspects of a child’s behavior, such as social skills (McConnell, 1999), language and communication (Goldstein, 1999), or problem behavior (Horner et al., 2000). These four bodies of literature have different primary purposes (and research questions), conceptual and theoretical frames of reference, and research methodologies. However, these research literatures all have the potential of informing the design, content, and evaluation of intervention procedures.
Similarly, funding for autism intervention and educational research has also come from a number of federal institutes with separate, but overlapping missions. These include the Office of Special Education Programs (OSEP) in the U.S. Department of Education and the National Institute of Child Health and Human Development (NICHD), National Institute of Mental Health (NIMH), National Institute of Neurological Disorders and Stroke (NINDS) and National Institute on Deafness and Other Communication Disorders (NIDCD), in the U.S. Department of Health and Human Services. More recently, parent-initiated, nonprofit agencies such as Autism Society of America Foundation, Cure Autism Now (CAN), and the National Alliance for Autism Research (NAAR) have had an increasing role in supporting and instigating research.
Although several of these literatures appear to be internally well integrated, there is remarkably little integration across literatures. For example, the information from the literature describing characteristics of children with autistic spectrum disorders is often not linked to treatment programs. Likewise, the developmental literature, which is descriptive in nature, has only rarely been integrated into individual intervention practice research, which tends to be behaviorally oriented (see Lifter et al., 1993 for a notable exception). Similarly, research that emphasizes the relationships among behaviors in response to treatment has been much more rare than descriptive studies of development in multiple domains (Wolery and Garfinkle, 2000).
Integration of the collective body of knowledge represented in these four literatures is important and could inform practice. It would be productive for leaders from these four research traditions to communicate regularly around the common issue of educational interventions for young children with autistic spectrum disorders. This communication could foster the research integration that appears to be missing from the literature. Communication could be enhanced by a series of meetings that bring together researchers and agencies who sponsor research, focusing on the task of reporting implications for designing programs for young children with autistic spectrum disorders.
EARLY SCREENING AND DIAGNOSIS
One assumption in early intervention research is that treatment should begin as soon as possible. However, to accomplish this, children must be identified. Early diagnosis has important implications for treatment, since different interventions would be appropriate for very young children (e.g., 15 months of age) than for children of 2 or 3 years old.
There is a difference between screening and diagnosis. Screening, as understood in the United States, may mean two things. One is a process carried out by a primary care provider to decide whether a referral for more services is warranted: for example, a pediatrician, told by parents that their 18-month old child has poor eye contact and has stopped speaking within the last month, must decide whether and where to refer the child for further assessment. A second type of screening is a public health process by which health care providers routinely assess for risk for autistic spectrum disorders in children whose parents have not necessarily raised concerns.
Diagnosis is a much more comprehensive process carried out by a specialized team of professionals. For autistic spectrum disorders, diagnosis involves not only identifying the disorder and any other developmental and behavioral disorders associated with it, but also helping parents to understand the meaning of the diagnostic terms and what the
parents can do to help their children. (Issues relating to diagnosis are discussed in detail in Chapter 2.)
In the early 1990s, the Checklist for Autism in Toddlers (CHAT) was developed as a creative, theoretically based attempt at a public health screening instrument (Baron-Cohen et al., 1992). With follow-up, however, it appeared that the sensitivity of the CHAT in identifying autism in nonreferred children was far too low to be considered an appropriate screening tool (Baird et al., 2000). Nevertheless, the instrument has made a significant contribution as a first step in this area. The techniques described in the CHAT may also be helpful in providing a primary health care professional with some behaviors on which to focus during screening (e.g., eye contact, pretending). Pilot data from a modification of this instrument, the M-CHAT, are in press.
Other screening tools, such as the Pervasive Developmental Disorders Screening Test (PDDST; Siegel, 1998) and the Screening Tool for Autism in Two Year Olds (STAT, Stone, 1998), are used to determine whether further diagnostic assessments are merited after a concern has arisen. Each of these instruments has promise: an initial empirical evaluation of the STAT has just been published (Stone et al., 2000); an evaluation of the PDDST is not yet available. The Autism Screening Questionnaire (ASQ; Berument et al., 1999) was developed for screening research participants 4 years of age and older. It has not yet been tested with younger children or with families who have not already received a diagnosis of autistic spectrum disorder. Chapter 2 provides more information about screening, as do the interdisciplinary practice parameter guidelines described by Filipek and colleagues (2000). An adequate screening instrument is not currently available either for public health screening or for a brief assessment when a concern arises. Addressing this need is a high priority for researchers. It involves determining how specifically the features of autistic spectrum disorders can be defined in toddlers and contrasting the benefits of this approach with more general identification of risk status.
Research in diagnosis is at a quite different stage. Well-standardized and documented diagnostic instruments have been available for years. These include the Childhood Autism Rating Scale (CARS; Schopler et al., 1988), the Autism Diagnostic Interview-Revised (ADI-R; Lord et al., 1993), and the Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2000). Although there are many ways that these instruments could be improved, their ability to document autism in a reliable and standardized way has been demonstrated. There are also numerous other instruments, including the Autism Behavior Checklist (Krug et al., 1980) and the Gilliam Autism Rating Scale (Gilliam, 1995), about which there are more questions regarding the degree to which their scores reflect accurate diagnosis.
Difficulties also remain for the most well-standardized instruments. While the CARS has been repeatedly shown to produce autism categorizations much like diagnoses, the items on the scale no longer reflect current diagnostic criteria. The ADI-R and the ADOS produce operational categories that fit with current conceptualizations of autism, but they require training and are intended to be used by experienced clinicians. The ADI-R is also quite lengthy, taking about 2 hours to administer. Standardization samples for both instruments are small, though replications of their diagnostic categorizations have been good (Yirmiya et al., 1994; Tanguay, 1998). Neither provides adequate discrimination between autism and other autistic spectrum disorders, though the ADOS makes a first attempt to do so. Thus, these instruments are important in providing standards for research, but their contributions to educational practice will require training of specialists (both in and outside educational systems) and perhaps modification of the instruments.
DESCRIPTION OF PARTICIPANTS IN STUDIES
To interpret the results of early intervention research and to conduct some of the sophisticated analyses described below, it is important to understand the characteristics of the participants in the studies. As mentioned above, heterogeneity in child characteristics is nearly as much a defining feature of autistic spectrum disorders as are the DSM-IV criteria. Children with the same diagnosis of autistic spectrum disorders, gender, chronological age, and IQ score may well have a range of other different characteristics (e.g., problem behaviors, communication skills, play skills) and may respond differently to intervention treatments. In most research on comprehensive intervention programs using group designs, a limited amount of information is provided about the children participating in the study. Individual intervention practices research often uses a single-subject design; anecdotal descriptions of participants’ behaviors are sometimes provided in addition to demographic information, but such descriptions do not follow a standard format. These limitations are reflected in the small proportion of studies that meet the highest standards for research in internal or external validity, as shown in Figures 1–1 and 1–2 (in Chapter 1), and the greater but still variable proportion that meet the second level of criteria in these areas.
Vaguely described samples pose a problem for both group and single-subject designs. One problem is related to internal validity of the study (i.e., the degree to which a researcher can rule out alternative hypotheses that account for treatment outcomes [Campbell and Stanley, 1963]). Unless specific information about participants is provided, it is impossible to know to whom the results of the study apply. For group design research, there are additional problems. When random assignment to treatment
groups occurs, the assumption is that the groups will be equivalent. However, with a relatively small sample size, which is the case for most studies of intervention effectiveness, it is essential for the researcher to confirm that participants in different groups are equivalent on major variables that might affect outcome. If participants are vaguely described, then there is limited information about the equivalence of comparison groups.
The recruitment, selection, and attrition of participants are also important issues. Standards and expectations for reporting how potential research participants were identified and persuaded to participate, how they were selected from the pool of potential participants, and how many participants completed the study have been very different within different disciplines (e.g., experimental psychology and epidemiology) and different perspectives (e.g., developmental and behavioral). With increasing attempts to integrate perspectives (see Filipek et al., 2000) to produce practical guidelines or meta-analyses, this information becomes crucial. For example, it is much more difficult to interpret results of a meta-analysis of success rates when a potentially large number of participants proposed for the research may have not been selected because they were deemed likely to be poor responders to an intervention, and another significant proportion of participants may not have completed their course of treatment. If samples are to be combined, and if interpretations are going to span fields, then there will be a need for more information about these processes.
Researchers are often interested in the interactions between child or family characteristics and treatment, sometimes referred to as aptitude-by-treatment interactions. Such analyses allow researchers to determine if the intervention was more effective for participants with certain characteristics. For example, one type of comprehensive treatment program might produce more positive outcomes for children who communicate verbally than for children who are nonverbal. The analysis requires that a reliable measure of the child characteristic or “aptitude” variable be collected. Vague participant descriptions could preclude the possibility of such analyses.
General, nonstandard participant descriptions also affect the external validity of studies (i.e., the degree to which the findings of a study can be generalized to other individuals not in the study [Campbell and Stanley, 1963]). To interpret for whom an individual intervention procedure or comprehensive intervention program might be effective, one has to have a clear understanding of who participated in the study. Both single-subject and group studies build their evidence for external validity on study replications. To compare the findings of different studies, researchers must be able to determine that children with similar characteristics participated in the study.
In many studies of children with autistic spectrum disorders, descrip-
tions of the families’ characteristics are either limited or absent. Family and community characteristics represent potential risk and opportunity variables (Gabarino and Ganzel, 2000); yet, there has been very limited research on the effects of such family and community variables on outcomes for children with autistic spectrum disorders (Wolery and Garfinkle, 2000). For example, it is possible that a young child with autism who lives in a single-parent family and low-income neighborhood will respond differently to treatment than a child with autism from a two-parent family living in a middle-class neighborhood. In order to investigate the effect of family and community characteristics on treatment outcomes, it is necessary to provide descriptive information about families of children who participate in intervention research.
In order to further knowledge of the effects of interventions, it is critical that researchers develop and use standard procedures for describing the characteristics of participants in their studies and of their families. In addition to the information that is routinely provided (e.g., standardized diagnosis, chronological age, gender, IQ), standard information should include measures of adaptive behavior, communication, social skills, school placement, and race. Also, information about the family should include number of parents living in the family, parents’ education levels, and socioeconomic status. Although some recent studies have begun providing such information, this has not been the norm for the field.
To examine effectiveness of comprehensive early intervention programs and individual intervention practices for children with autistic spectrum disorders, standards must be established for determining the causal relationship between the treatment procedures and the identified outcomes. The various experimental methodologies employed reflect the different literatures noted earlier. Studies documenting the effects of comprehensive treatment programs have employed experimental group designs, while those documenting individual practices have primarily employed single-subject designs, often replicated across several subjects.
Randomized Clinical Trials
The most rigorous approach for experimental group research design is the randomized clinical trial. In this design, study participants are randomly assigned, if possible by someone not associated with the program or knowledgeable about the participants’ characteristics, to a treatment group that receives the educational intervention or to a comparison group that receives no educational intervention or a different form of
intervention (Kasari, 2000). Measurement of potential treatment effects (e.g., developmental assessments, family measures) occurs before the educational intervention begins and again at the end of the intervention; the measurement is blind to which group a participant has been assigned to. Assuming that the groups are equivalent on the pretest measures, differences at the end of the intervention are attributed to the treatment. As noted above, the purpose of random assignment is to control for or reduce the likelihood that confounding variables (e.g., very determined parents requesting a particular treatment) would account for differences in outcomes for the treatment and contrast groups.
Reviews of the literature to date (Rogers, 1998) and individual papers prepared for this committee (Kasari, 2000; Wolery and Garfinkle, 2000) show that the randomized clinical trial model has only rarely been used to determine treatment outcomes (see Jocelyn et al.  and Smith et al.  for exceptions). Other studies have attempted to address the research question of treatment effectiveness by employing quasi-experimental designs (Cook and Campbell, 1979) in which nonrandomized control or contrast groups are used as a basis for gauging treatment effects (Fenske et al., 1985). Another approach has been to use single group designs in which the changes in children’s development while they are in the program are compared with children’s rates of development before they entered the program, or to the rate of development of typically developing children (Harris et al., 1991; Hoyson et al., 1984). These designs, while providing some information about treatment outcomes, may not control for important confounding variables, such as subject selection and nonspecific or placebo effects (see Campbell and Stanley’s  classic paper on group experimental methodology).
For programs providing treatment to young children with autistic spectrum disorders and their families, random assignment is often a difficult procedure. By its very nature, it requires that some children and families be assigned to an alternative treatment condition. Unless two treatments of equal potential value can be compared, such assignment creates the ethical issue of not providing the most promising treatment to children who might benefit. An argument is sometimes made (as it often is in medical treatment studies) that until a treatment is supported by a randomized clinical trial, the evidence for effectiveness of the treatment does not exist. In addition, when children are randomly assigned to two different treatment conditions, a researcher still must closely assess the experiences of the child and family, because families may seek and obtain services for their children outside of the treatment study. Ideally, children and families could be assigned to equally attractive alternative treatments, so that the research question changes from one of single treatment effectiveness to treatment comparison. However, this approach would require the availability of two different and equally strong programs,
usually within the same geographic area, and the willingness of the programs and parents to participate. This situation does not often occur.
Another issue related to random assignment is the heterogeneity of the population of children with autistic spectrum disorders. Most treatment studies, because of the prevalence of autistic spectrum disorders and the expense and labor intensity of treatment, will have small sample sizes. Random assignment within a relatively small, heterogeneous sample does not ensure equivalent groups, so a researcher may match children on relevant characteristics (e.g., IQ score, age) and then select from the matched sets to randomly assign children to control and treatment groups. As noted above, such stratification of the sample of participants requires a thorough description of the participants as well as confidence that the variable(s) on which children are matched are of greatest significance.
An issue related to the size and heterogeneity of groups in the randomized clinical trail approach is statistical power (Cohen, 1988). Groups have to be large enough to detect a significant difference in treatment outcomes when it occurs. The smaller the size of the group, the larger the difference in treatment outcomes has to be in order to show a statistically significant effect. Also, variability on pretest measures, as may occur with heterogeneous samples, sometimes obscures treatment differences if the sample size is not sufficiently large. Because the number of children with autistic spectrum disorders enrolled in particular treatment programs often is not large, sample size and within-group variability are challenges to the use of randomized clinical control methodology for determining the effectiveness of educational interventions for those children.
In contrast to group experimental designs, single-subject design methodology uses a smaller number of subjects and establishes the causal relationship between treatment and outcomes by a series of intrasubject or intersubject replications of treatment effects (Kazdin, 1982). The two most frequently used methods are the withdrawal-of-treatment design and the multiple baseline design.
In the withdrawal of treatment design, a baseline level of performance (e.g., frequency of stereotypic behavior or social interactions) is established over a series of sessions, and a treatment is applied in a second phase of the study. When reliable changes in the outcome variable occur, the treatment is withdrawn in the third phase of the study, and concomitant changes in the outcome variable are examined. Often, the treatment is reinstated in a fourth phase of the study, with changes in the outcome variable expected. Changes in the outcome variable (e.g., in
creases in desired behavior or decreases in undesirable behavior) that reliably occur when the treatment is implemented and withdrawn indicate a functional (i.e., causal) relationship between the treatment and outcome variables (Barlow and Hersen, 1984). This design is usually replicated with at least two or three participants.
In a multiple baseline design, three (or more) participants may be involved. Data are collected for all participants in an initial baseline phase, and then the treatment is begun with one participant while the others remain in the baseline phase of the study. When changes occur for the first participant, the treatment is introduced for the second participant, and when changes occur for the second participant, the treatment is introduced for the third participant. Variations on this design include multiple baselines across behaviors of single individuals and multiple baselines across settings. Again, the researcher infers a functional relationship when changes reliably occur only after the treatment is implemented across (usually three) participants, settings, or behaviors.
Single-subject designs differ from group designs in three ways. First, changes in the outcome variables are measured frequently (e.g., daily, weekly) rather than at the beginning and end of the treatment. The second is that visual analysis of differences in trends in the data (e.g., increases in social interaction or decreases in stereotypic behavior) is usually used to determine the effectiveness of treatment, rather than statistical analyses between groups. Third, unlike group designs, in which the treatments often represent a range of theoretical perspectives, treatments evaluated through single-subject designs tend to follow an applied behavior analysis theoretical orientation (Kazdin, 1982).
There are methodological problems and limitations when single-subject designs are applied to studying children with autistic spectrum disorders. The most obvious is that only a small number of children are involved in any single study, so the applicability of findings of a single study to other children is limited. Single-subject designs build their external validity on systematic replications across studies (Tawney and Gast, 1983). One set of current standards stipulates (Lonigan et al., 1998) that nine replications of studies with good experimental designs and treatment comparisons should be required for effectiveness of an intervention to be “well-established,” while three replications of studies with the acceptable methodological characteristics are necessary for an intervention to be identified as “probably efficacious.” These are arbitrary, though useful, designations.
The issue of inter- and intrasubject variability also exists for this methodology. Single-subject designs require that some level of stability in the participants’ performance be reached before another phase is implemented, and variability in participants’ behavior, as occurs for children with autistic spectrum disorders, may obscure comparisons across phases.
As noted above, the characteristics of the participants must be described explicitly in single-subject methodology, and variability in the characteristics of children with autistic spectrum disorders could result in children with very different characteristics participating in the same study. Such variability could contribute to the limitations of the external validity of a study.
Two key issues in single-subject methodology relate to generalization and maintenance of treatment effects. In this context, generalization refers to the occurrence of desired treatment outcomes outside of the treatment settings and with individuals who were not involved in the treatment. Maintenance refers to the continued performance of the behaviors or skills acquired in treatment after the treatment has ended. Reviews of the literature suggest that evidence for generalization and maintenance data is weak for some single-subject treatments or has not routinely been assessed (Horner et al., 2000; McConnell, 1999). It should be emphasized that the issues of maintenance and generalization are not unique to single-subject research. Group design studies of comprehensive intervention programs have not often used measures of generalization and maintenance; the notable exceptions are the studies that have examined long-term follow-up of participants in comprehensive treatment programs (e.g., Harris and Handleman, 2000; McEachin et al., 1993; Strain and Hoyson, 2000). As shown in Figure 1–3 (in Chapter 1), generalization to natural settings was studied in about 30 percent of reported research concerning social and communication interventions, and not at all in the research reviewed in other areas. Some measurement of generalization and/or maintenance was addressed in an additional 10 to 40 percent of studies, with the greatest frequency in positive behavioral and communication interventions, but there is still much room for improvement. For research on early interventions for young children with autistic spectrum disorders, assessment of generalization and maintenance should be a standard feature of single-subject and group design studies. Particularly in autism, generalization to new contexts cannot be assumed, though it is the goal of most interventions.
Developmental and Nonspecific Effects
Two other related methodological issues affect both single-subject and pre-post group designs: the effects of development on maturation and the nonspecific, positive effects of participating in an intervention (even if no specific treatment is offered, as in placebo effects). Nonspecific treatment effects may also occur in single-subject designs. Both of these issues are relevant, to different degrees, to many studies in autistic spectrum disorders conducted from a range of theoretical perspectives. For many behaviors, most children with autistic spectrum disorders show
gradual improvement, whether or not they receive intervention. For example, some children with autism learn to talk without direct language intervention; many learn to sit, dress themselves, and sort and match items without highly specific interventions. In addition, there are carryover effects of one intervention to another (e.g., teaching appropriate play often decreases repetitious behavior and may increase eye contact). This carryover is a positive factor that is extremely important for children. However, it limits interpretation of designs, such as multiple baselines, that assume that behaviors are independent, and designs such as pre-post testing, which assume that all improvements are due to the treatment specified (and not to carryover from other phenomena, such as a change in parents’ behavior).
For children and their families, there are also strong effects of being in a program and feeling that they are receiving treatment, even when there is no “active ingredient” of the intervention. These effects have been repeatedly documented in education, medicine, and psychology in comparisons of open trials with randomized clinical trials; they are also relevant to single-subject designs in which the intervenor is also the principal data collector. “Blindness” to which children and families receive which treatments, and to the characteristics of participants, in at least some of the assessments—even in single-subject designs—would considerably improve the interpretability of results.
On the whole, developmental and nonspecific or placebo effects are positive factors for children and families. They attest to the positive trajectory of many behaviors and the power of hope and perceived purpose. However, recognizing the potential contributions of these factors is crucial in interpreting the results of specific interventions. There are methodological features of research designs that can be applied to control for maturation and nonspecific effects. For example, a randomized group design using a contrast intervention as a control for a treatment of interest and a single-subject design in which the baseline has a form of treatment being provided can be applied to enhance the interpretation of such effects.
Replications and Measures of Treatment Effects
For single-subject and group experimental designs, the issues of replication of studies and measurement of treatment outcomes are important. Research on comprehensive intervention programs and individual intervention approaches tends to be conducted and replicated by individuals who developed the approaches. Evidence for the effectiveness of these approaches is strengthened when researchers who are independent of the developers replicate findings of effectiveness. This form of replication has generally not occurred in the research on comprehensive treat-
ment programs. For individual intervention techniques, interventions addressing language and communication skills (see Goldstein, 1999) and problem behaviors (see Horner et al., 2000) are the most often replicated by different investigators.
Independent measurement or verification of treatment outcome is another important issue. The potential effect of experimenter bias exists when outcome assessments are conducted by individuals who know about the nature of the research study, the treatment groups to which children are assigned, and the phases of studies in which children are participating. For most group and single-subject design research, outcome data are collected by project staff; this may introduce a potential confounding effect. This confounding effect may be countered by having blind or naive assessors collect pre- and post-outcome data for group designs and daily performance data for single-subject designs. Also, for single-subject designs, the assessment of socially important outcomes of interventions by individuals outside of the project, called “social validity” (Schwartz and Baer, 1991; Wolf, 1978), provides some control of potential bias by observers, raters, and testers.
Interaction Between Treatment and Child or Family Characteristics
In experimental group designs, the average or mean performances of children on outcome measures and standard deviations are generally reported for each group. The standard deviation describes the variation of outcome scores around the mean. In group-design studies, children make different amounts of progress, with some possibly scoring much higher and some scoring much lower than the mean. Analyses of group means does not provide information about which children benefited the most or least from treatment.
To obtain more specific knowledge about the characteristics of children that are associated with performance, researchers analyze aptitude-by-treatment interactions or ATIs. For example, an examination of different language training curricula for preschool children with disabilities (not specifically autism) did not find a main effect for treatment (i.e., both treatments appeared to be equally effective) (Cole et al., 1991). However, when they analyzed the interaction of treatment by aptitude, they found that children who were higher performers on pretest measures benefited more from a didactic language training approach, and children who were lower performers at pretest benefited more from a responsive curriculum approach to language training.
This type of aptitude-by-treatment-interaction analysis has the potential for providing valuable information about the characteristics of children with autistic spectrum disorders that are associated with outcomes
for comprehensive treatment programs, but these analyses have rarely been conducted. Studying interactions between child or family features and treatment requires a sample size large enough to generate sufficient power to detect a difference. For example, in one study, children diagnosed as having autism or pervasive developmental disorder were randomly assigned to an intensive intervention program based upon the UCLA Young Autism Project model or a parent training model. Although it appeared that children with pervasive developmental disorder scored consistently higher than children with autism on some measures, there were no significant differences between groups (Smith et al., 2000). The authors attributed the failure to find significant difference to the small sample size (6–7 in each subgroup in each experimental condition). In another example, Harris and Handleman (2000) examined class placements of children with autism 4–6 years after they had left a comprehensive early intervention program. In an aptitude-by-treatment-interaction type analysis, they found that children who entered their program at an earlier age (mean=46 months) and had relatively higher IQ scores at intake (mean=78 months) were significantly more likely to be in regular class placements, and children with relatively lower IQ scores at intake who entered the program later (54 months) were more likely to be placed in special education classes. Even with a relatively small number of participants (28), the robustness of this finding provided information about characteristics of the children who were likely to benefit most from the program.
Fidelity of Treatment
In addition to assessing outcome measures, it is important for researchers examining the effects of educational interventions to verify that the treatment was delivered. Measurement of the delivery of an individual intervention practice or comprehensive intervention program has been called fidelity of treatment, treatment implementation, and procedural reliability (Billingsley et al., 1980; Hall and Louchs, 1977). Here we use the term treatment fidelity.
Treatment fidelity requires that researchers operationally define their intervention or the components of their comprehensive program well enough so that they or others can assess the degree to which procedures have been carried out. Such assessment takes different forms (e.g., direct observations with discrete behavioral categories, checklists, etc.). For example, staff of the LEAP preschool program (see Chapter 12) have developed a set of fidelity-of-treatment protocols that assess whether eight components of the program are being implemented: positive behavioral guidance, interactions with families, teaching strategies, interactions with children, classroom organization and planning, teaching communication
skills, IEPs and measuring progress, and promoting social interaction (LEAP Preschool and Outreach Project, 1999). These protocols could be used in a research capacity to document the level of implementation of the comprehensive program. Also, as Strain (2000) indicated, they were used in the LEAP program to provide feedback to staff on their level of implementation in order to maintain treatment fidelity. Some researchers use hours of service provided as a measure of the intensiveness of intervention (Smith et al., 2000). Although it provides important information, hours of service is not an adequate measure of treatment fidelity, because it does not describe the procedures used during the service hours. Assessment of treatment fidelity has a long history in general education (see Leinhardt, 1980) and has been proposed as a standard for high quality intervention research in early intervention for children with disabilities (LeLaurin and Wolery, 1992). However, one review of early intervention programs for children with autism (Wolery and Garfinkle, 2000) found that only 4 out of 15 programs provided any evidence of implementation of program components. In future research on educational intervention for young children with autistic spectrum disorders and their families, measurement of the fidelity of treatment should be a standard feature of the program of research and publication of findings.
Modeling Growth and Intervention Effects
In most experimental group studies, as noted above, the developmental growth of children with autistic spectrum disorders is measured through the collection of pretest and posttest outcome measures, followed by analyses of differences between groups. More sophisticated procedures for examining the growth and development of children are available (Dunst and Trivette, 1994), but they have not been used in analyses of intervention outcomes for young children with autistic spectrum disorders. Growth curve analysis (Burchinal and Appelbaum, 1991) and the related techniques of hierarchical linear regression modeling (Bryk and Raudenbush, 1987) and structural equation modeling (Willet and Sayer, 1994) have been used to model the growth of groups of children for whom longitudinal data are available. These techniques may also be used to examine patterns of growth for children with different types of characteristics or children involved in different types of treatment conditions or programs (e.g., Burchinal, 1999; Burchinal, Bailey and Synder, 1994; Hatton et al., 1997). Natural history studies of development in children with autistic spectrum disorders are critical using these methods to provide both theoretically based insight and empirical “baselines.”
The advantage of growth curve analysis and related regression models is that they allow researchers to control for nested variables (e.g., children participating in the same intervention but in different class-
rooms), nonrandom missing data (i.e., an assessment that occurred at the wrong time or that is missing), and extreme scores of students (Burchinal et al., 1994). Also, hierarchical linear regression modeling and structural equation modeling allow researchers to determine the relationships of variables, in addition to assignment to an early intervention and contrast group conditions, that are associated with development of children (e.g., family characteristics, degree of implementation of the program).
One difficulty in using these techniques in studies of children with autistic spectrum disorders is that many of these techniques require large sample sizes, but most studies of young children with autistic spectrum disorders have small numbers. Nevertheless, to the extent possible, researchers of educational intervention programs for young children with autistic spectrum disorders should consider adopting these or similar models for analyzing variables affecting children’s development and learning. This may require that program developers include sufficient sample sizes in their programs over several years; multiple data points per participant are also required.
Group Size and Experimental Group Design
A clear problem mentioned at several points in the preceding discussion is that methodological tools available to researchers, such as studies of individual differences in response to treatments and sophisticated regression-based techniques, such as hierarchical linear regression modeling, are limited by the number of children with autistic spectrum disorders in intervention programs and the number of data points collected. Implementing an early intervention program for children and families is a labor-intensive and expensive endeavor. Because of the expense, length of treatment, and heterogeneous nature of autistic spectrum disorders, the number of young children in an individual treatment program is usually small. As noted, one solution for program developers is to collect data for multiple cohorts, building their numbers across years. However, this approach requires multiple years of funding and long-term commitments from investigators.
One solution of the sample size problem is the development of a multi-site study of treatment effectiveness. Such a study could be based on a treatment comparison model and could perhaps (because of its potential magnitude) be funded by multiple coordinating agencies (e.g., National Institute of Child Health and Human Development, Office of Special Education Programs, National Institute of Mental Health, Center for Disease Control, National Institute on Deafness and Other Communication Disorders, National Institute of Neurological Disorders and Stroke). There is a precedent for federal funding for large initiatives such as this in other areas (e.g., Fast Track project for aggressive children, Infant Health
and Development Project, National Institute for Child Health and Human Development Child Care Study). The current coordination of the biomedical grants in autism funded by the National Institute for Child Health and Human Development and the National Institute on Deafness and Other Communication Disorders in the Collaborative Program for Excellence in Autism (CPEA), and efforts to coordinate genetics studies funded by many different agencies, may represent models for such a project.
We have not reviewed qualitative or ethnographic research studies. Although such studies may add to the knowledge about program features and outcomes for young children with autistic spectrum disorders (Schwartz et al., 1998), the research literature is quite small and does not contain systematic examinations of programwide effects for young children and families. Qualitative and ethnographic research does hold promise for uncovering important features in educational interventions programs that affect the development of young children with autistic spectrum disorders and their families.
FROM RESEARCH TO PRACTICE
There is an active research literature on the developmental characteristics, diagnostic criteria, comprehensive treatment programs, and individual intervention strategies for young children with autistic spectrum disorders. The literature provides a tentative but important basis on which to design intervention strategies and decisions about treatment options for individual children. However, there are concerns about methodological issues. Considering these concerns, funding agencies and professional journals should require minimal standards in design and description of intervention research studies. These studies should include the following information: participants’ chronological age, developmental assessment data (including verbal and nonverbal levels of performance), standardized diagnoses, gender, race, family characteristics, socioeconomic status, and relevant health or other biological impairments.
In addition, fidelity of treatment documentation must operationally define the intervention in sufficient detail so that an external group could replicate it as well as assess the degree of implementation. Independent, objective assessment of expected outcomes should be conducted at regular intervals, and immediate and long-term assessment of effects on children and families should include measures of generalization and maintenance.
Future research on intervention programs for young children with autistic spectrum disorders should address the following methodological
issues: application of standardized procedures for describing participants in intervention studies, including children’s diagnoses, chronological age, developmental and behavioral information, family information, gender, sociometric status, race, and pertinent health or biological information; the association between fidelity of treatment information and treatment outcomes; the association between participants’ characteristics and treatment outcomes (e.g., aptitude-by-treatment interactions); the development of early identification procedures and their relationship to early access to services; and identification of program features (i.e., “active ingredients” of intervention programs) that relate most directly to child and family outcomes. The impact on growth for young children with autistic spectrum disorders may be measured by techniques such as growth curve analysis, hierarchical linear modeling, and/or structural equation modeling to model the longitudinal growth and treatment.
Addressing these methodological issues will require larger sample sizes, longitudinal follow-ups of participants, and interdisciplinary collaboration. To enable such needed research, initiatives should be funded jointly by federal agencies responsible for research, development, and services for young children with autistic spectrum disorders (including the Office of Special Education Programs, the Office of Educational Research and Improvement, the National Institute of Child Health and Human Development, the National Institute of Mental Health, the National Institute of Neurological Disorders and Stroke, and the National Institute on Deafness and Other Communication Disorders). These initiatives should include a task force that meets regularly to design and provide a synthesis of the diagnostic, developmental, behavioral, and treatment research that would inform the design and implementation of early educational treatment for young children with autistic spectrum disorders; consideration of the feasibility of a national, cross-site, longitudinal investigation of early intervention treatments for young children with autistic spectrum disorders and their families; and development of specific measurement tools for early diagnosis of children with autistic spectrum disorders and treatment outcomes (e.g., social functioning, spontaneous communication and language, peer relationships, and competence in natural settings). Agencies funding competitive research initiatives should include personnel with sufficient research and experiential background to judge the scientific and practical merits of proposals.