In this chapter we present first our summaries of the reports of the Immersion and Longitudinal Studies and our overall conclusions about the studies. In the second part of this chapter we present specific conclusions and recommendations about the Longitudinal and Immersion Studies and our general conclusions and recommendations regarding future studies on bilingual education.
The Longitudinal Study consisted of two phases. The first phase described the range of services provided to language-minority limited-English-proficient (LM-LEP) students in the United States; it was used to estimate the number of children in grades K-6 receiving special language-related services. The second phase was a 3-year longitudinal component to evaluate the effectiveness of different types of educational services provided to LM-LEP students. The longitudinal component itself consisted of two parts, a baseline survey and a series of follow-up studies in the subsequent years.
The initial sample design and survey methodology of the descriptive component are appropriate for making inferences that describe the U.S. population. For the longitudinal component, the method of subsampling makes it highly questionable if such inferences are possible. Since the data were used primarily for developing models, this drawback does not of itself invalidate the results of the analyses.
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 101
Assessing Evaluation Studies: The Case of Bilingual Education Strategies 6 Conclusions and Recommendations In this chapter we present first our summaries of the reports of the Immersion and Longitudinal Studies and our overall conclusions about the studies. In the second part of this chapter we present specific conclusions and recommendations about the Longitudinal and Immersion Studies and our general conclusions and recommendations regarding future studies on bilingual education. SUMMARIES The Longitudinal Study The Longitudinal Study consisted of two phases. The first phase described the range of services provided to language-minority limited-English-proficient (LM-LEP) students in the United States; it was used to estimate the number of children in grades K-6 receiving special language-related services. The second phase was a 3-year longitudinal component to evaluate the effectiveness of different types of educational services provided to LM-LEP students. The longitudinal component itself consisted of two parts, a baseline survey and a series of follow-up studies in the subsequent years. The initial sample design and survey methodology of the descriptive component are appropriate for making inferences that describe the U.S. population. For the longitudinal component, the method of subsampling makes it highly questionable if such inferences are possible. Since the data were used primarily for developing models, this drawback does not of itself invalidate the results of the analyses.
OCR for page 101
Assessing Evaluation Studies: The Case of Bilingual Education Strategies The large extent of missing data and measurement error in the Longitudinal Study severely limits the usefulness of the study data. This, and the very large number of variables measured, made the original plans for analysis unworkable. A distinct and rather ad hoc procedure was subsequently used for analysis, and it produced inconclusive results. The use of variables to characterize treatment characteristics, rather than classifying programs into different explicit types (as originally envisaged in the design), means that it is difficult or impossible to separate genuine treatment effects from the effects of other factors. This in turn means that the findings from the study are very difficult to generalize to a wider population. The data collected, despite the limitations noted, are likely to be useful in addressing some research questions, although these questions were not central to the objectives of the study. The analyses will likely take the form of describing, but not evaluating, bilingual education programs: that is, what programs existed in the United States at the time of the Longitudinal Study. In particular, data from the the descriptive component constitute a sound database for such descriptive, nonevaluative research. The panel does not believe that further analyses of data from the study can provide the kinds of evaluation of the effectiveness of bilingual education programs that were originally envisaged. The Immersion Study The Immersion Study was designed to compare the relative effectiveness of three strategies for educating children at the elementary school level: structured English Immersion Program (all instruction in English, lasting 2–3 years); Early-exit Program of transitional bilingual education (both languages used in instruction, but subject-matter content taught in English; instruction lasted 2–3 years); and Late-exit Program of transitional bilingual education (both languages used in instruction, including content areas; instruction lasted 5–6 years). Although the study's final report claims that the three programs represent three distinct instructional models, the findings indicate that the programs were not that distinct. They were essentially different versions of the same treatment; Immersion and Early-exit Programs were in some instances indistinguishable from one another. In addition, all three program settings were characterized by rote-like instruction directed by teachers, in which student language use was minimal. Given the unexceptional nature of the instruction, the question arises as to whether there were any reasonable grounds for anticipating a substantial educational effect, regardless of language of instruction. Several features about the study design and analyses make it difficult to interpret the Immersion Study. Outcomes and covariates are not always distinguished from one another. Thus, controls for intermediate outcomes are not routinely
OCR for page 101
Assessing Evaluation Studies: The Case of Bilingual Education Strategies present and the measurement of treatment (program) effects may be distorted. Even if outcomes are not used as covariates, the need for controls remains. For cohorts of students first observed after entry into the school system, there were few covariates, and key baseline measurements were often absent. It is not clear whether differences in many measured variables are indicators of pre-existing student characteristics or if they reflect prior exposure to one program or another. There are strong suggestions of such pre-existing differences among the students exposed to each type of program, which parallel differences in students across schools and school districts and the nonrandom allocation of programs to districts and schools. For example, students in Late-exit Programs were from far more economically disadvantaged backgrounds than those in either Immersion or Early-exit Programs. On the other hand, the parents of the students in Late-exit Programs were as likely as parents of students in other programs to receive English-language newspapers and twice as likely to receive Spanish-language newspapers—perhaps an indication of Spanish literacy. Other characteristics of the sites of Late-exit Programs make them potentially important sites for future studies. For example, the family income of students in Late-exit Programs was by far the lowest in the study, but these families monitored completion of homework considerably more than the families at the other sites. Furthermore, children at the sites of Late-exit Programs scored at or above the norm in standardized tests, suggesting a possible relationship between the use of Spanish for instruction, Spanish literacy in the home, parental involvement in homework, and student achievement. Notwithstanding these general problems in the Immersion Study, there is one conclusions for which the panel finds reasonably compelling and credible analyses: the difference between students in Immersion and in Early-exit Programs at kindergarten and grade 1. Early-exit Programs appear to be more successful in reading and because of the early age of these children, concerns about pre-observation treatment effects are not severe. By grades 1–3, however, differences in student achievement by program are not easily distinguished from possible differences in where the students started. Thus, conclusions from analyses at these grades are tenuous. The final report of the Immersion Study notes that school effects are not directly separable from program effects; the panel concurs with this conclusion. In addition, Late-exit Programs were rare and were found in districts without any other program type. Given the presence of substantial differences in school districts, it is virtually impossible to compare Late-exit with either Early-exit or Immersion Programs. We do not know how effective Late-exit Programs were. THE PANEL'S CONCLUSIONS AND RECOMMENDATIONS Conclusions Related to the Longitudinal and Immersion Studies The formal designs of the the Longitudinal and Immersion Studies were ill-suited to answer the important policy questions that appear to have motivated them.
OCR for page 101
Assessing Evaluation Studies: The Case of Bilingual Education Strategies The studies lacked a firm conceptual grounding and did not articulate specific objectives. The objectives that were stated were often conflicting or unrealizable. There was a misfit between the program objectives and the program implementation. Execution and interpretation of these studies—especially the Longitudinal Study—was hampered by a lack of documentation regarding (a) objectives, (b) operationalization of conceptual details, (c) actual procedures followed, and (d) changes in all of the above. For example, the objectives of the Longitudinal Study changed substantially between the year 1 report on data collection and the subsequent request for proposal (RFP) for data analysis. In the Immersion Study, the RFP specified an evaluation of only two interventions (Immersion and Early-exit Programs); the contract was later amended to involve a third treatment (Late-exit Program). Absence of adequate documentation is in part a function of the complexity of the studies and their duration, but it appears to have been exacerbated by shifts in the directives from the contract administrators as well as the mid-project changes of contractors. Because of the poor articulation of study goals and the lack of fit between the discernible goals and the research design, it is unlikely that additional statistical analyses of these data will yield results central to the policy question to which these studies were originally addressed. This conclusion does not mean, however, that the data collected are uniformly of no value. The data may prove valuable to those requiring background information for the planning of new programs and studies. Both the Longitudinal and Immersion Studies suffered from excessive attention to the use of elaborate statistical methods intended to overcome the shortcomings in the research designs. Methods of statistical analysis cannot repair failures in the conceptualization, design, and implementation of studies. Techniques such as path analysis (as planned for and abortively implemented in the Longitudinal Study) and Trajectory Analysis of Matched Percentiles (TAMP) (as implemented in the Immersion Study) provide limited insight. The assumptions inherent in them at best finesse and at worst obfuscate issues that needed to have been dealt with explicitly in the study design and attendant analysis. The absence of clear findings in the Longitudinal and Immersion Studies that distinguish among the effects of treatments and programs relating to bilingual education does not warrant conclusions regarding differences in program effects, in any direction. The studies do not license the conclusion that any one type of program is superior to any other nor that the programs are equally effective. Even if one of the programs was definitively superior, the studies as planned and executed could well have failed to find the effect.
OCR for page 101
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Taking fully into account the limitations of the two studies, the panel still sees the elements of positive relationships that are consistent with empirical results from other studies and that support the theory underlying native-language instruction in bilingual education. Most noteworthy is convergence of the studies in suggesting, under certain conditions, the importance of primary-language instruction in second-language achievement in language arts and mathematics. Specific Recommendations for the Longitudinal and Immersion Studies In the light of the foregoing conclusions, the panel makes the following specific recommendations for further analysis and examination of the data from the Longitudinal and Immersion Studies. The panel recommends that the U.S. Department of Education not seek to fund any specific additional analyses of the data from the Longitudinal or Immersion Studies. The panel was asked by the Department of Education to recommend possible additional analyses that would be of value in addressing the objectives of the studies. In Chapters 3 and 4 of this report the panel points to some specific analyses which could have been performed. It is the panel's judgment, however, that additional analyses are unlikely to change its assessment of the conclusions that can be drawn from the studies. The panel recommends that the data and associated documentation from both the Longitudinal and Immersion Studies be archived and made publicly available. The data from both studies were made available to the panel, but attempts by the panel to use the data were unsuccessful. Given the diversity between study sites that made adequate comparisons impossible, the panel recommends more focused and theoretically driven studies to analyze the interaction of different instructional approaches in bilingual education contexts of specific community characteristics. Some Desiderata Regarding Evaluation Studies in Bilingual Education Throughout this report the panel addresses issues of research design in various ways and discusses the role of discovery and confirmatory studies. We summarize here some broad themes about the design of such studies that are worthy of special attention: A study without clear, focused goals will almost certainly fail. Determining effective programs requires at least three tasks. First, an attempt must be made to identify features of programs that may be important. This first task is usually best achieved in exploratory or qualitative studies by comparing existing programs. The second task is to develop competing
OCR for page 101
Assessing Evaluation Studies: The Case of Bilingual Education Strategies theories leading to sharply distinct proposals for programs. The third task is to create these new programs to specifications and assess their effectiveness in several tightly controlled and conclusive comparative studies. An attempt to combine all three tasks in a single comprehensive study is likely to fail. Program effects will often be small in comparison to differences among communities and among demographic and socioeconomic groups. Therefore, comparative studies must compare programs in the same communities and demographic and socioeconomic groups. Program effects may vary from one community to another. Therefore, several comparative studies in different communities are needed. In comparative studies, comparability of students in different programs is more important than having students who are representative of the nation as a whole. Elaborate analytic methods will not salvage poor design or implementation of a study. Elaborate statistical methods are for the analysis of data collected in well-implemented, well-designed studies. Care in design and implementation will be rewarded with useful and clear study conclusions. Large quantities of missing data may render a study valueless. Active steps must be taken to limit missing data and to evaluate its impact. A publicly funded research study requires clear documentation of decisions about study design and analysis plans, including changes that evolve as the study progresses. A well-documented, publicly accessible archived database should be made available from any publicly funded study. The size and structure of both discovery and confirmatory studies needs to be linked to objectives that can be realized. Overly ambitious large-scale studies implemented in broad national populations, such as the Longitudinal and Immersion Studies, inevitably are difficult to control, even if the interventions and their implementation are well understood. Discovery studies of bilingual education interventions must begin on a small scale. For confirmatory studies, small-scale experiments or quasi-experiments are also more timely and informative than large-scale studies, especially if their design controls for potentially confounding factors.