3
The Longitudinal Study

OVERVIEW

In 1983 the Department of Education began a major multiyear study that came to be known as the “National Longitudinal Study of the Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students” (hereafter referred to as the Longitudinal Study). The study was commissioned in response to concerns about the lack of a solid research base on the effectiveness of different approaches to the education of language-minority (LM) students. The study was a direct response to a call from Congress, in the 1978 Amendments to the Elementary and Secondary Education Act, for a longitudinal study to measure the effectiveness of different approaches to educating students from minority language backgrounds.

Although the ultimate goal of the Longitudinal Study was to provide evaluations to inform policy choices, the Department of Education determined that an evaluation study required a firmer information base about the range of existing services. The study consisted of two phases. The first phase was descriptive of the range of services provided to language-minority limited-English-proficient (LM-LEP) students in the United States and was used to estimate the number of children in kindergarten through sixth grade (grades K-6) receiving special language-related services. The second phase was a 3-year longitudinal study to evaluate the effectiveness of different types of educational services provided to LM-LEP students. The Longitudinal Study itself consisted of two components, a baseline survey and a series of follow-up studies in the subsequent 2 years.

The study began late in 1982, and data collection for the descriptive phase occurred during the fall of 1983. The prime contractor was Development As-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies 3 The Longitudinal Study OVERVIEW In 1983 the Department of Education began a major multiyear study that came to be known as the “National Longitudinal Study of the Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students” (hereafter referred to as the Longitudinal Study). The study was commissioned in response to concerns about the lack of a solid research base on the effectiveness of different approaches to the education of language-minority (LM) students. The study was a direct response to a call from Congress, in the 1978 Amendments to the Elementary and Secondary Education Act, for a longitudinal study to measure the effectiveness of different approaches to educating students from minority language backgrounds. Although the ultimate goal of the Longitudinal Study was to provide evaluations to inform policy choices, the Department of Education determined that an evaluation study required a firmer information base about the range of existing services. The study consisted of two phases. The first phase was descriptive of the range of services provided to language-minority limited-English-proficient (LM-LEP) students in the United States and was used to estimate the number of children in kindergarten through sixth grade (grades K-6) receiving special language-related services. The second phase was a 3-year longitudinal study to evaluate the effectiveness of different types of educational services provided to LM-LEP students. The Longitudinal Study itself consisted of two components, a baseline survey and a series of follow-up studies in the subsequent 2 years. The study began late in 1982, and data collection for the descriptive phase occurred during the fall of 1983. The prime contractor was Development As-

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies sociates, Inc. A subcontractor, Research Triangle Institute (RTI), assisted with survey design and sampling. The research design for the longitudinal phase was developed during the spring of 1983. Baseline data collection for the longitudinal phase occurred in the fall of 1984. Additional data were collected in the springs of 1985, 1986, and 1987. The original contract did not cover data analysis, and a separate contract for data analysis was issued in 1988 to RTI, the subcontractor on the original contract. As a basis for comparing and contrasting the panel's analysis of the Longitudinal Study, we present first the summary prepared by the U.S. Department of Education (1991). The National Longitudinal Evaluation of the Effectiveness of Services for Language Minority, Limited English-Proficient (LEP) Students A joint initiative by OBEMLA and the Office of Planning, Budget and Evaluation from 1982 to December 1989, this study examined the effectiveness of instructional services in relation to particular individual, home and school/district characteristics. The Department is planning to contract with the National Academy of Sciences to undertake a review of the quality and appropriateness of the methodologies employed both for data collection and analysis of the very rich database. Findings from the Descriptive Phase (1984–1987) include: The need for LEP services is not evenly distributed geographically across states and districts. Almost 70 percent of all LEP students resided in California, 20 percent in Texas, and 11 percent in New York. LEP students were found to be more disadvantaged economically than other students. Ninety-one percent of LEP students were eligible for free or reduced-price lunches compared to 47 percent of all students in the same schools. LEP students were found to be at-risk academically, performing below grade level in native-language skills as well as in English and other subjects, as early as first grade. However, mathematics skills are reported to be generally superior to language skills in either language. Most instruction of LEPs is provided in English, or a combination of English and the native language. There were significant problems with district and school procedures for entry and exit: Almost half of the schools countered [sic] district policy and reported using one criterion for program entry. The entry criteria that were used were of the less rigorous variety, such as staff judgment or oral language tests versus the required use of English reading/writing tests. Schools with relatively small enrollments of LEP students (under 50) mainstreamed an average of 61 percent of LEP students, compared with 14 to 20 percent of LEP students mainstreamed in schools with relatively large LEP enrollments. Eighty-two percent of districts placed no time limit on continued participation in the bilingual program.

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Instructional staff persons who speak and understand languages other than Spanish are rare. While 78 percent of LEP students were Spanish-speaking, 64 percent of schools with LEP students had more than one foreign language represented; the mean was 3.5 languages per school. The results from the Longitudinal Study were disappointing to those interested in policy, and their interpretation remained controversial. First, the data did suggest correlations between policy-relevant variables and educational outcomes of policy interest; however, attribution of causation from the reported analyses is extremely problematic. Because the study was based on a sample survey, it does not provide the basis for inferring that differences in outcomes are due to differences in services provided, nor does it provide a warrant for inferences about the impact of proposed changes in policy. Second, despite the effort expended to develop a longitudinal database, only single-year analyses were performed. The failure of a study of such magnitude to produce results even approximating those anticipated is understandably a cause for concern. The need remains for an information base to guide policy on bilingual education. This chapter addresses four issues that arise from the study and its disappointing outcomes: What information was obtained as a result of the descriptive and longitudinal phases of the Longitudinal Study? What were the reasons for the failure of the study to achieve its major objectives, and to what extent could the problems have been prevented? Might useful information be obtained from further analyses of existing data? How should the outcome of this study affect the design and implementation of future studies of this nature? Following the time line presented below the remainder of this chapter is divided into four sections. The first two cover the descriptive and longitudinal phases, respectively. An overview of the study design, analysis methods, and results is provided for each phase, followed by the panel's critique. The third discusses the prospects for further analyses of study data, while the fourth discusses the implications of the Longitudinal Study for the conduct of future observational studies by the Department of Education. 1968 Bilingual Education Act passed. 1974 ESEA Title VII expanded Lau v. Nichols decision. School districts must give special services to LM-LEP students. September 1982 REP for the Longitudinal Study. Late 1982 Longitudinal Study begins. Spring 1983 Pilot testing of forms for descriptive phase. Fall 1983 Data collection for descriptive phase. Spring 1984 Longitudinal Study Phase of the National Evaluation of Services for Language-Minority Limited-English-Proficient Students: Overview of Research Design Plans for, report by Development Associates, Inc. Describes

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies   plans for analyzing descriptive phase data. Fall 1984 Initial data collection for longitudinal phase. December 1984 The Descriptive Phase Report of the National Longitudinal Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Development Associates, Inc. and Research Triangle Institute. Reports final results of descriptive phase. Spring 1985 Second data collection in year one. Spring 1986 Year two data collection. June 1986 Development Associates, Inc., Year 1 Report of the Longitudinal Phase. Spring 1987 Year three data collection. May 1988 Request for proposal issued for data analysis for the longitudinal phase February 1989 Descriptive Report: Analysis and Reporting of Data from the National Longitudinal Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Research Triangle Institute. Considers which original study objectives could be addressed by study data. April 1989 Analysis Plan: Analysis and Reporting of Data from the National Longitudinal Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Research Triangle Institute. Describes plans for analyzing longitudinal phase data. 1991 Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Research Triangle Institute. Reports final results of longitudinal phase. THE DESCRIPTIVE PHASE Objectives The descriptive phase of the Longitudinal Study had nine objectives: To identify and describe services provided to LM-LEP students in Grades K-6; To determine the sources of funding for the services provided; To estimate the number of LM-LEP students provided special language related services in Grades K-6; To describe the characteristics of students provided instructional services for LM-LEPs;

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies To identify and describe home and community characteristics associated with each major language group; To determine the entry/exit criteria used by schools and school districts serving LM-LEP students; To determine the relationship between services offered for LM-LEP students and services offered to students in adjoining mainstream classrooms; To identify clusters of instructional services provided to LM-LEP students in Grades K-6; and To obtain information useful in designing a longitudinal evaluation of the differential effectiveness of the identified clusters of services provided to LM-LEP students. The first eight objectives are concerned with characterizing the population of LM-LEP students in elementary grades in U.S. public schools and describing the range and nature of special services provided to them. The ninth objective was to provide information to inform the design of the subsequent longitudinal phase of the study. Study Design and Data Collection The descriptive study was designed as a four-stage stratified probability sample. First-stage units were states; second-stage units were school districts, counties, or clusters of neighboring districts or counties; third-stage units were schools; and fourth-stage units were teachers and students. The target population of students consisted of elementary-age LM-LEP students receiving special language-related services from any source of funding. The study used local definitions of the term “language-minority limited-English-proficient” whenever available. Thus, the criteria for classifying students as LM-LEP varied from site to site, and the term “LM-LEP student” used in reporting study results refers to a student classified locally as LM-LEP, not to any defined level of English proficiency. This variation in classification criteria affects interpretation of results. Appendix A includes information on the identification of LEP students, by state. Ten states (those with at least 2 percent of the national estimated LM-LEP population) were included in the sample with certainty, and an additional 10 states were selected as a stratified random sample of the remaining states, with selection probability proportional to estimated size of the elementary-grade LM-LEP population in the state. The state of Pennsylvania was subsequently dropped because of the refusal of the Philadelphia school district to participate. School districts were stratified according to the estimated LM-LEP enrollment in their respective states, then sampled within strata with probability proportional to the estimated LM-LEP enrollment. Schools were selected with a probability proportional to the estimated LM-LEP enrollment. Teachers and students were sampled only from schools with at least 12 LM-LEP enrollments in grades 1 or 3. All academic content teachers who taught

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies LM-LEP students in selected schools in grades 1 through 5 were selected for inclusion in the sample. A stratified random subsample of schools was selected for the student sample. Up to five first graders and five third graders were randomly selected from each school. Of the five students in each grade, two were from the predominant language-minority group at the school and three were from other language-minority groups if such students existed; otherwise students from the predominant language-minority group were substituted. Site visits were made to districts with large LM-LEP enrollments, and mail or telephone interviews were used in the remaining districts. In visited districts, site visits were made to schools with moderate to large LM-LEP enrollments, and mail or telephone interviews were administered in the remaining schools. Teachers completed a self-administered questionnaire. Student-level data consisted of a questionnaire completed by teachers who taught the student and a questionnaire filled out by field staff from student records. A planning questionnaire provided data from school personnel for planning the longitudinal phase. Analysis Methods The Descriptive Phase Report (Development Associates, 1984a) did not document the nature and full extent of missing data. The overall response rate on each of the major study instruments was at least 81 percent. For LM-LEP students, the combined school and student response rate within schools was 87.2 percent. The student sample was drawn from a sample of 187 of the 335 schools from which teacher-level data were obtained. (These teacher-level data were obtained from 98 percent of the schools selected.) Of the 187 schools, 176 permitted a sample of students to be drawn (94.1 percent). Within these 176 schools, a student background questionnaire was completed by 1,665 LM-LEP students of 1,779 students selected (92.6 percent). Teacher data were obtained for 95.8 percent of those 1,665 students. However, no information is given on the extent of item nonresponse, that is, missing information for individual questions. Missing data were handled by excluding cases of item nonresponse from tabulations of single items. The report notes that this approach assumes that respondents and nonrespondents do not differ in ways affecting the outcome of interest. This approach also reduces the amount of data available for analysis. The results are presented primarily in the form of tabulations and descriptive statistics (means, percentages, and distributions). The analysis methods used were standard and appropriate. Most analyses used sampling weights. This means that, in computing average values, the observations were weighted by the inverse of their probability of selection into the sample. When observations are sampled at unequal rates from subpopulations with different characteristics, use of sampling weights allows the sample results to be generalized to the target population. For the Longitudinal Study, the target population for some analyses consists of all LM-LEP students in grades 1–5 in the United States, excluding Pennsylvania. Other analyses are restricted to grade 1 and grade 3 students attending schools

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies with at least 12 LM-LEP students at either grade 1 or grade 3. Chapter 9 of the Descriptive Phase Report (Development Associates, 1984a) categorizes service patterns into service clusters, or “sets of instructional services provided to one or more LM-LEP students at a particular school or schools, … based on their most salient features,” thereby defining clusters of similar types of service. There is no description of the methodology used for the categorization. It appears that the researchers tried a number of different categorizations and finally settled on a typology that “provided the most workable array.” The report does not explain why the typology used was more “workable” than others that were considered. In general, the statistical methods used to analyze the survey results were straightforward. The results were presented in tabular form. Some form of graphical presentation of the results would have been quite helpful as an aid to understanding. Of particular statistical interest would have been graphical displays of the multidimensional space of observations on the variables used to define the service clusters—see Chambers, Cleveland, Kleiner, and Tukey (1983) for an introductory description of multidimensional graphical methods. Summary of Results The Descriptive Phase Report tabulates a great many results relevant to the objectives of the study goals listed above. This section provides a brief summary of them. There is a great deal of variation in the operational definition of a LM-LEP student from district to district, and from school to school in some districts. Of the school districts, 61 percent had an official definition for a LM-LEP student, and 75 percent reported setting official entry criteria for eligibility for special LM-LEP services. Some districts defined subcategories of LM-LEP students. Three main criteria for entry into LM-LEP services were used: (1) tested oral English proficiency; (2) judgment of student need by school or district personnel; and (3) tested proficiency in English reading or writing. There was also variation in the instruments and procedures used to measure entry criteria within these broad categories. Because of the variation in the definition of limited-English proficiency, estimates of the number of LM-LEP students based on the Longitudinal Study are not directly comparable with estimates based on any study that uses a standard definition. Moreover, the definition of a LM-LEP student can vary from year to year within a single district as a result of administrative policy, legal requirements, or economic pressures. It is possible that in some districts the requirement to serve all students in need of special services led to a definition of LM-LEP students as those students for whom services were provided. These factors argue for extreme caution in extrapolating estimates of the numbers of LM-LEP students to years much later than 1983 because changes in how LM-LEP students were defined would invalidate the results. Based on the data from this study, there were esti-

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies mated to be 882,000 students locally defined as LM-LEP in grades K-6 of public schools in the United States in the 1983–1984 school year. Spanish is by far the most prominent native language among LM-LEP students, accounting for an estimated 76 percent of LM-LEP students in all schools and 78 percent in schools with LM-LEP enrollments of more than 12 LM-LEP students. No other language accounted for more than 3 percent of the students in schools with enrollments of more than 12 LM-LEP students. Southeast Asian languages were predominant in 14 percent of schools; 36 percent of schools had students from only one language group other than English; 3 percent of schools had 12 or more language groups. The average across all schools was 3.5 languages. Third-grade LM-LEP students were a few months older than the national norms for their grade level. First graders were near the national norms. Both first-grade and third-grade students were rated by teachers as being below grade-level proficiency in mathematics, English language arts, and native language arts, but third-grade students were rated as being closer to grade-level proficiency. More third graders than first graders were rated equal or higher on English-language skills than native-language skills. Of grade K-6 LM-LEP students, 91 percent received free or reduced-price lunches (a measure of socioeconomic status), in comparison with 47 percent of all students in the same schools. Most student characteristics in the report appear as aggregates across all language groups. The Spanish-speaking group is a large subpopulation, and these students tended to receive different services. It would therefore be of interest to see tabulations of student characteristic variables classified by native language. One reported result is that 64 percent of Spanish-speaking students were born in the United States, in comparison with no more than 28 percent from any other language group. It would be interesting to determine whether observed differences exist in other measured variables, such as free-lunch participation and subject-area proficiency. The district survey results indicated that an estimated 97 percent of districts with LM-LEP students in grades K-6 offered special services to these students, although 12 percent of teachers judged that students needing services were not receiving them. The nine states with the highest LM-LEP populations provided services to a higher percentage of their LM-LEP students than states with lower LM-LEP populations. In all districts a goal of services was to bring LM-LEP students to a level of English proficiency needed to function in an all-English classroom. Most districts also stated the goal of providing other skills necessary to function in a public school classroom. Very few districts (15 percent) stated the goal of maintaining or improving native language proficiency. Services were generally provided in regular elementary schools, either in mainstream classrooms or in specially designated classrooms. Students were often in classrooms containing both LM-LEP and English-language-background students. Instruction for LM-LEP students was usually slightly below grade level. Most Spanish-speaking students received instruction in English delivered in the native language, native language as an academic subject, and ethnic heritage; most

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies other LM-LEP students did not. The contractors defined five types of service cluster in terms of the following variables: use of native language special instruction in English rate of transition native language arts instruction, based on a narrative program description by school personnel narrative program description by school personnel The five types of clusters were called: (A) native language primacy (B) continued instruction in native language and English (C) change in language of instruction, subdivided into (C1) slow transition and (C2) fast transition (D) all English with special instruction in English, subdivided into (D1) with native language-proficient personnel and (D2) without native language-proficient personnel (E) all English without special instruction in English, subdivided into (E1) with native-language-proficient personnel and (E2) without native-language-proficient personnel Table 3-1 shows the estimated percentages of schools offering, and first-grade LM-LEP students receiving, each type. Clusters emphasizing use of the native language appeared predominantly at schools with Spanish-speaking LM-LEP students; schools with no Spanish-speaking LM-LEP students were very likely to offer cluster D. There was great variation in the characteristics of teachers providing services to LM-LEP students. Sixty-three percent of districts required special certification for teachers of LM-LEP students; fewer than 35 percent of teachers had such certification. In a number of districts requiring certification, teachers were teaching with provisional certification or waivers. Approximately 60 percent of teachers had received some special training in teaching limited-English-proficient students. About half of the teachers could speak a language other than English; this other language was overwhelmingly Spanish. Overall, field researchers found a positive attitude in most schools toward serving the needs of LM-LEP students. Panel Critique of the Descriptive Phase Study The descriptive phase study was based on a national probability sample of students and teachers. The sample was obtained through a four-stage sampling process of selecting states, school districts within states, schools, and their students (all eligible teachers from a selected school).

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Table 3-1 Schools Offering and First-Grade LM-LEP Students Receiving, Each Type of Service Cluster Type of Service Cluster % Schools % Students A 3 7 B 11 26 C 26 40 C1 20   C2 6   D 51 25 D1 13   D2 38   E 6 1 E1 2   E2 4   The procedures used to draw the sample were generally standard and appropriate given the stated objectives. The sampling of states was a nonstandard feature of the design, with the 10 states with the largest proportion of the national total of elementary LM-LEP students included with certainty (these states contain 83.5 of the elementary-school LM-LEP population, 92 percent of the Spanish LM-LEP population, and 64 percent of the non-Spanish LM-LEP population). Of these 10 states, Pennsylvania, with 1.9 percent of the LM-LEP population, effectively did not participate. A stratified sample of 10 states was drawn from the remaining 41 states. In aggregate, the sample accounts for 91 percent of the LM-LEP target population. This method of selecting a first-stage sample of 20 states is unusual. The objectives of the study might have been better served by one of two alternative strategies. If it was important to limit the number of states to 20, then restricting the study to the 20 states containing the largest proportion of LM-LEP students might well have been more desirable. Such a sample would have contained 92.7 percent of the LM-LEP population and would have constituted a worthwhile study population in its own right. Conversely, making statistical inferences from a sample of 10 states to a population of 41 is hazardous due to the widely differing characteristics of states and their school districts. Another sampling design would have treated school districts as the first-stage sampling unit and would have used the nation as a target population. This design would have led to less extreme variation in sampling weights. Most of the prominent national surveys of schools and students use districts or schools as the first stage of selection (e.g., the discussion of the National Education Longitudinal Study in Spencer and Foran (1991)).

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies The selection of districts within the selected states gave a total of 222 districts. A number of districts refused to participate, including two large ones—Philadelphia, Pa., and Buffalo, N.Y. Since Philadelphia was one of only two selections from the state of Pennsylvania, the whole state (a state selected with certainty) was dropped from the sample. This decision is illustrative of one of the disadvantages of using states as the first stage of selection. In some other cases, districts that refused to participate were replaced in the sample by others. Of the 23 districts that refused, 19 were replaced in the sample, giving a final sample of 218 districts. The planned number of districts and the level of district participation appear quite adequate for a study of this nature. Within districts, the school selection procedure gave an initial selection of 536 schools with LM-LEP students. Fourteen of these schools refused to participate, and two were replaced in the sample. This is a high level of participation at the school level. Given that the sampling to the school level was carried out by sampling professionals using a variety of standard and recommended sampling procedures, with relatively few district and school refusals, the study sample gave a sound basis for characterizing U.S. schools with respect to their LM-LEP population and services. The school sample was reduced prior to the selection of teachers and students. Initially, 342 schools were identified as having sufficient LM-LEP students. All eligible teachers from these schools were selected—a total of 5,213, of whom 4,995 responded. A subsample of 202 of the 342 schools was used for selecting students. Of these, 187 schools actually provided student data, with five LM-LEP students selected for each of grades 1 and 3. Field data collectors were responsible for drawing these samples, but they were not uniform in the applications of sampling rules. As a result, the expected student sample yield of 1,980 was not achieved. A total of 1,909 students were actually sampled, but it is impossible to determine from the documentation provided which of these were original selections and which were replacements resulting from parental refusals to participate. The sample sizes, sampling procedures, and response rates for teachers and schools were at least adequate to provide a sound basis for inference from the data collected, although it would have been desirable to have rates calculated with and without replacements for refusals. Two points are worthy of special note. First, estimates based on data from teachers and students do not relate to the whole LM-LEP population, but only to that part of the population from schools that would have been deemed “viable” for the conduct of the Longitudinal Study. Such schools had 12 or more LM-LEP students in grades 1 or 3 and contained 82 percent of LM-LEP population. Second, the procedure for sampling students in the field was not well controlled. The reported problems appear to have been minor; thus, it was still possible to weight the data appropriately. However, in the longitudinal phase, the problems were so severe that the data could not be weighted. The procedures for weighting the data were appropriate and are well documented in Appendix E of Development Associates (1984a). With the caveat

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies It should also be noted that any reported significance levels need be viewed with suspicion when the modeling proceeded by trying a number of models and selecting the best; see, for example, Miller (1981). The report also notes the high degree of multicollinearity among the predictor variables. This multicollinearity further obfuscates the conclusions that can be drawn because the predictive power of one variable may be completely masked by other variables. Thus, a nonsignificant coefficient may imply that the corresponding variable has limited impact on predictions or that other correlated variables have masked the effect. The data analysis reported in the study must be more nearly viewed as an exploratory analysis than a theoretically driven quasi-experiment (see Chapter 2). Although the initial intent was a quasi-experiment, the study was not planned or executed in a way that enabled more than exploratory analyses. Exploratory analyses can serve the important purpose of suggesting hypotheses for later verification, however, firmly grounded causal inferences require carefully thought-out, planned, and executed experiments or quasi-experiments. FURTHER ANALYSES Despite the above criticisms, there may still be value to be gleaned from the data, albeit not to answer the intended questions. The panel found that the data provide a picture of the state of bilingual education in the early to mid-1980s and might be useful in planning further studies. Descriptive Phase Data The descriptive phase of the Longitudinal Study presents a large set of data that can be reliably projected to the national population of (locally defined) LM-LEP students, or a large majority of it, without substantial problems of bias or imprecision. As such, it is a useful resource for researchers attempting to understand the range of services being provided to LM-LEP students and a valuable knowledge base for use in planning studies of the effectiveness of programs. One of the inherent difficulties with the Longitudinal Study (and the Immersion Study) is the difficulty of correctly classifying LM-LEP service programs, and indeed of even finding good examples of any given type. One possibly useful application of the descriptive phase data would be a more detailed characterization of the types of services offered and the characteristics of the students to whom they were offered. This work might concentrate future evaluation efforts on those types of programs currently widely offered and to cover the full range of such programs. Information might also be extracted about the extent to which specific types of services are targeted to student needs and capabilities, rather than being a function of the resources available to the school. Such analyses could view service delivery as a multidimensional phenomenon, considering the interrelationships exhibited among the five variables used to define service clusters, plus possibly others. Such analyses have the potential to shed substantial light as to what is actually

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies “out there” and, just as importantly, why. This could well serve future debates and investigations about the efficacies of different programs. Yet one caveat must be noted. The data from the descriptive study were collected in 1983. No doubt the population of LM-LEP students has changed substantially since then. The nature and range of services offered has probably also changed. Hence, results from any further analyses of these data would necessarily be dated. Longitudinal Phase Data Descriptive Analyses of Year 1 Data The baseline data for the longitudinal analyses were collected during 1984–1985. The data were collected about districts, schools, teachers, classrooms, and students from a subset of those schools and districts surveyed during the descriptive phase with sufficient LM-LEP students to make the Longitudinal Study data collection effort cost effective. Some schools declined to participate, and for some of these substitutions were made, but no weighting adjustments were made to reflect this nonresponse. All LEP students and English-proficient students receiving LEP services in grades 1 and 3 were included, plus a sample of students who had never been LEP or received services (comparison students). Survey weights were not developed for the data. The sampling procedures for selecting the comparison students were so poorly controlled that weighting of these students within schools was not possible even if it had been deemed desirable. The Descriptive Report of the Longitudinal Study (Development Associates, 1984a) provides details of the variables collected in the longitudinal phase. The process of arriving at derived variables is described and limitations of the data, particularly the extent of missing data problems, are described. Given the large volume of baseline data collected, the question arises as to whether there are useful descriptive analyses of these data that have not been undertaken thus far, but that might be done and might shed light on the delivery of services to LM-LEP students. Some issues related to this question are addressed in this section. In general, there were considerable problems with missing and unreliable data. Many of these are discussed in some detail in the descriptive report. These problems appear to have been so severe that, if any finding derived from them contradicted the “conventional wisdom” about bilingual education, the finding could easily be dismissed as an artifact of the poor data. There may, however, be components of the data that are worthy of additional analyses. Any generalization of descriptive analyses from sample to target population would be on much firmer ground if the analyses incorporated survey weights. The panel does concur with the investigator in concluding that it is unlikely that useful survey weights could be developed for the longitudinal sample. In the descriptive phase, 342 schools were identified as having sufficient LM-LEP students for the study, yet only 86 schools participated in the longitudinal

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies phase. This implies that a more restrictive population was used than is made clear in the description on page 2 (Development Associates, 1984a). The report does not make it clear how the reduction to 86 schools was accomplished. Furthermore, the participation of only 86 schools from a smaller number of districts means that the level of sampling error is likely to be moderately high, despite the inclusion of several thousand students in the sample. The variables that appear to be most complete, and also numerous, are the school-level variables derived from teacher data and the teacher-level variables. The three derived classroom-level variables, classroom size, percentage of students who are LEP, and the percentage of students who speak only English, also appear to be relatively free from missing data and measurement error problems. One important use for these data might be to cross classify these school-, teacher-and classroom-level variables, perhaps weighted by the numbers of LEP students in the classroom, in order to characterize the type of learning environments that LM-LEP students are exposed to (or were in 1984–1985). For example, simple cross-classification of “Teacher support for using child's native language in teaching him/her” (the variable coded as SCLM1Y1 in the report) with “Principal support of school's LM-LEP program” (coded as SCLM4Y1) might provide information about the distribution of teacher support for native language usage and its association with a principal's support for LM-LEP programs. This in turn might shed some light on the relationship of school policies to teacher practices. Cross-classification of some of the teacher characteristic variables with the percentage of students in a class that are LEP (coded as CLSPLEP1) would provide information not only about the distributions of different teaching practices for LM-LEP students, and the variations in the percentage of students who are LM-LEP (in this “high LEP” population), but might also be informative about whether certain types of teacher services and practices are predominantly available in classrooms with mostly LEP students, or in those with few LEP students. In summary, it is unclear from the Descriptive Report how restricted the population for inference is from the longitudinal phase. This must be addressed before a conclusion can be reached as to whether further descriptive analyses are warranted. There do appear to be a substantial number of reliable variables that could be analyzed, and in particular cross-classified, with a view to better characterizing the range of learning environments to which LM-LEP students are exposed and their relative prevalence in practice. Such analyses might guide future efforts at evaluating programs, at least by identifying what “programs” actually exist from the viewpoint of a LM-LEP elementary school student. Other Analyses It is possible that some defensible causal inferences could be drawn from the Longitudinal Study; however, it would take considerable effort just to determine the feasibility of such analyses. The value of purely exploratory analyses might be enhanced after additional exploration of the robustness of the findings and the degree to which alternate

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies models fit the data. Variables not appearing in a model but highly correlated with model variables might be substituted to evaluate whether an alternate model based on these variables would fit as well. For example, school-level, classroom-level, and teacher-level variables might be substituted for the school and district indicator variables. Cross-validation might improve confidence in the robustness of the results. The models would be extremely suspect if coefficients changed drastically when the model was fit on a subset of the data, especially if the resulting models were poor predictors of the hold-out samples. No such analyses were performed as part of the original study, but they might be desirable given the degree of multicollinearity and other problems. Finally, no attempt was made to analyze data from students who exited LEP programs, English-proficient students, or students whose native language was not Spanish. It is likely that sample sizes are too small for the latter two categories, but exploratory analyses for students who exited, of the type already performed for LEP students, may be of interest. PROSPECTS FOR THE FUTURE General Remarks Contrary to the hopes of those who commissioned the Longitudinal Study, further research is needed to address the question of which interventions are most effective in improving educational outcomes for LM-LEP children. Despite all its problems the study does provide a valuable information base for designing future studies to address these objectives. The descriptive phase study provides information concerning the natural range of variation in services in the population and about correlations between service types and various background characteristics. The longitudinal phase adds to this information base: variables were measured in the longitudinal phase that were not in the descriptive phase, measurements were made over time, and information was collected about the factors related to exit from LM-LEP services. Although they must be regarded as exploratory in nature, the longitudinal phase analyses revealed associations between variables. Hypotheses about the causal mechanisms underlying these associations could be confirmed by well-designed and carefully implemented quasi-experimental studies. Just as important, the study provides important information about the difficulties awaiting future researchers. The measurement difficulties, missing data problems, and attrition problems encountered by the Longitudinal Study will have to be faced by future researchers. Awareness of the magnitude of the problems permits planning to mitigate their impacts. Planning for Observational Studies Directed to Causal Inference At least seven factors must be attended to in planning an observational study if it is to be used for policy-relevant questions of causation.

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies First a successful observational study must have clearly defined study objectives, and these objectives must be made a priority. Resource restrictions always require tradeoffs between objectives. Dropping infeasible or low-priority objectives early in a study enables resources to be concentrated on ensuring that high-priority objectives will be met. Too many objectives without clear priorities often leads to failure to achieve any objectives. Second, a clear, operational definition of treatments is essential and that means a full characterization of treatments as implemented. Third, after the treatments have been designed, a sampling plan must be developed that ensures that comparable subjects are assigned to each of the treatments. Collecting data on covariates with the hope of adjusting or matching after the fact invites problems unless there is some assurance that the treatment groups will be reasonably comparable with respect to the distribution of covariates. A safer approach is to select observational units that are explicitly matched on key covariates. This approach requires data on which to base selection of matched units. The knowledge base gained from the Longitudinal Study, if used properly, can provide an extremely valuable resource for sample selection for future studies. Fourth, there must be control of missing data. Missing data can undermine attempts to derive causal inferences from statistical data. It is essential to plan a strategy for minimizing and controlling for the effects of missing data. This issue is discussed in detail in the next section. Here, we note the critical importance of identifying key data items for which it is essential to have complete data, monitoring the completeness of these items as the survey progresses, and devoting resources to follow-up on these items if the extent of missing data becomes too high. Fifth, there must be control of measurement error. If a model is defined at the outset with clearly specified hypotheses, it is possible to identify a priori the variables for which measurement error could seriously affect results. Special attempts might be made to measure these variables accurately or estimate a statistical model for the measurement error distribution. Information about key variables can be obtained from multiple sources (for example, school records and interviews). Multiple questions can be asked about the same variable and responses checked for consistency. A small subsample might be selected for intensive efforts to determine these variables accurately; a measurement error model might then be estimated by comparing the accurately assessed values for these cases with their initial error-prone responses. Sixth, planning must include determination of the sample size required for reliable inferences. Having a large sample is not sufficient when the goal of a study is to infer relationships among variables. Power analyses need to be performed for specific analyses of major importance, and these power analyses must take into account not just the raw numbers of observations, but the observations expected to exhibit specific patterns of covariates. If, for example, service clusters are highly correlated with variables thought to influence outcomes of interest (for example, a particular service cluster occurs only in districts dominated by students

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies of low socioeconomic status), then estimating effects due to treatment alone will be impossible. In experiments, treatments are generally randomized within cells defined by covariates in order to assure sufficient independent variation of treatments and covariates. This is not possible in an observational study; however, especially when initial data such as that from the Longitudinal Study are available, it may be possible to determine ahead of time whether sufficient variation exists within the population to make the desired inferences. If not, resources can be concentrated on collecting data required for analyses that are more likely to be fruitful. In determining a sample size it is again crucial to consider the issue of the appropriate unit of inference. Sample size calculations must also address missing data and, in a longitudinal study, expected attrition of sample units. Results of such analysis can be used to guide resource tradeoffs. Finally, there must be careful monitoring of survey execution. As data are collected and coded, counts should be maintained of missing data, and preliminary correlational analyses can be run to determine whether the sample is showing sufficient variation among key variables. Adjustments can be made to correct for problems as they are identified. In some cases, it may turn out that some objectives cannot be met; resources can then be redirected to ensure the satisfaction of other important objectives. Strategies for Control of Missing Data Missing data problems can be greatly reduced (although seldom eliminated entirely) by proper planning and execution of a data collection effort. A National Research Council report (Madow, Nisselson, and Olkin, 1983) states: “Almost every survey should be planned assuming nonresponse will occur, and at least informed guesses about nonresponse rates and biases based on previous experience and speculation should be made.” Among the steps that can be taken to reduce nonresponse or its impact on results are the following (see Madow, Nisselson, and Olkin (1983) for a more complete set of recommendations): Based on study goals, identify a set of key variables for which it is essential to have relatively complete data. Collect data as completely and accurately as possible, using follow-ups and callbacks as necessary. A final follow-up using supervisors and superior interviewers may be necessary. Consider using a shorter and/or simpler questionnaire for the final follow-up. Pay special attention to key data items. Monitor data completeness as the study progresses; make adjustments in the data collection strategy as necessary to respond to the actual pattern of missing data. Plan strategies for collecting data items anticipated to be especially difficult to collect. Adjust strategies as data collection progresses. Monitor completion rates by important classifications of covariates. (This recommendation is crucial if relational analysis is the goal).

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Avoid unnecessarily complex and time-consuming questionnaires. Design questions to be non threatening and easy to understand. Consider including questions that are useful for modeling and adjusting for nonresponse. Consider including simpler or less-threatening versions of questions for which high non-response rates are anticipated (notice that this recommendation conflicts with the prior one; obviously a tradeoff is required). Make sure that the sample size is sufficiently large to compensate for missing data. To the extent possible, describe and/or model the missing data process and the degree to which respondents differ from non-respondents. Attempt to select cells for imputation so that respondents and non-respondents are as similar as possible within cells. If possible, use multiple imputation to improve estimates and estimates of variances. Explore the effects of alternate assumptions regarding the response process. Discuss and/or model biases that are the result of missing data. LONGITUDINAL STUDY RESEARCH QUESTIONS Reprinted here are the revised set of research objectives for the Longitudinal Study. This material is quoted verbatim from Exhibit 2 of the Longitudinal Report (Development Associates, 1984b): What are the effects of the special services provided for LM/LEP students in grades 1-6? Effects of specific student characteristics Comprehension of native language (oral & written) What is the relationship between LM/LEP student's oral proficiency in the native language and the learning of English? How is a LM/LEP student's oral proficiency in the native language related to the learning of English when the student's native language is: A language linguistically related to English? A language linguistically distant from English? What is the relationship between a LM/LEP student's literacy in the native language and the acquisition of literacy in English when the native language is a language linguistically related to English (e.g. Spanish, Portuguese)? When it is not related? What is the relationship between a student's general ability and native language proficiency? Classroom behavior What is the relationship between LM/LEP students' classroom behaviors and success in school? Parents' interest and involvement

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies What is the relationship between LM/LEP parents' interest and involvement in their child's education and the child's success in school? Effect of extent or duration of services What are the effects of length or duration of receipt of special services on the subsequent achievement of LM/LEP students? How do LM/LEP students who are receiving or have received special services for LM/LEP students compare with LM/LEP students who have never received such services? Effects of site (classroom, school, or district) and staff characteristics What is the effect of linguistic isolation; e.g., being in a school where few or no students speak the same language, on the time required to learn English? To what extent does the degree of teacher and staff familiarity with the native culture of the LM/LEP students affect student achievement. To what extent does the educational background of teacher and staff affect the achievement of LM/LEP students? Effects of conditions for exiting To what extent is oral proficiency in English correlated with proficiency in handling the written language (e.g., reading comprehension) and with academic achievement? When students are exited from special services after a fixed time, without regard to level of performance on some criterion variable, what is the effect on the student's subsequent achievement? How do the various combinations of special services, or "service clusters," provided for LM/LEP students in grades 1-6 compare in terms of the effectiveness with which recipients subsequently can function in all English medium classrooms? (A service cluster is a set of instructional services provided to a particular student at a particular point in time). Effects by specific student characteristics Socioeconomic status Which clusters work best with LM/LEP students whose socioeconomic status is low? Which clusters work best with LM/LEP students whose socioeconomic status is middle or high? General academic ability Which clusters work best for LM/LEP children whose ability level is high? Which clusters work best for LM/LEP children whose ability level is low? English proficiency Which clusters are most effective for children who speak little or no English? Which clusters are most effective for children who speak some English, but nonetheless, cannot benefit from a mainstream classroom? Grade level Which clusters are most effective for LM/LEPs by grades?

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Category of native language Which clusters are most effective when the LM/LEP students' native language is historically related to English? When it is not? Effects of specific service characteristics Using students' native language in teaching academic subjects What are the effects of using the student's native language in teaching academic subjects? How are the effects of the student's level of oral proficiency in the native language related to teaching academic subjects in the student's native language? Teaching reading of native language before teaching reading of English At the end of grade 6, how will students for whom the introduction of English reading was delayed while they were taught to read in their native language compare with students for whom reading in English was not postponed? To what extent does the effect of postponing instruction in reading English while the child learns to read in his native language first depend on The degree of lexical and orthographic similarity between the native language and English? Initial oral proficiency in the native language? General academic aptitude? Using “controlled” English vocabulary in instruction To what extent does the use of “special English” for instruction affect LM/LEP student achievement in: The content area? English? Styles of using two languages in instruction In the transition from the use of native language in instruction to 100% English, which is more effective, a slow shift or a rapid shift? Subjects taught What are the effects of teaching the child's native language as a subject of instruction rather than merely using it as the medium of instruction? Effect of school, classroom or teacher characteristics Linguistic composition of the student population Which clusters are the most effective when the child is in a school where there are few or no other children in his/her age group who speak the same language? Which clusters are most effective when the child is in a classroom where there are many other children in his/her age group who speak the same language? Untrained teachers What are the effects of untrained teachers by service clusters? What are the effects of having English proficient students in LM/LEP classrooms on the achievement and English language acquisition of LM/LEP students?

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies What are the effects when the English proficient students come from a language minority background? What are the effects when the English proficient students come from a native English speaking background? REFERENCES There are several good introductory books on samples surveys. Cochran (1977), Yates (1981), and Kish (1965) all provide thorough introductions to the entire field of the design and analysis of sample surveys. The book by Kasprzyk et al. (1987) discusses the issues of longitudinal surveys with particular attention to nonresponse adjustments and modeling considerations. Skinner, Holt, and Smith (1989) discuss methods for looking at complex surveys and pay special attention to issues of bias (in the statistical sense) and modeling structured populations. Groves (1989) provides a detailed discussion of issues related to survey errors and survey costs and includes an extensive discussion of problems with coverage and coverage error of samples together with concerns related to nonresponse. Little and Rubin (1987) is a basic reference for statistical analyses with missing data, both in experiments and sample surveys. Finally, Duncan (1975) is an early and invaluable reference for the latent and structural equation models used in path analysis. Burkheimer, Jr., G.J., Conger, A.J., Dunteman, G. H., Elliott, B. G., and Mowbray, K. A. (1989) Effectiveness of services for language-minority limited-english-proficient students (2 vols). Technical report, Research Triangle Institute, Research Triangle Park, N.C. Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. (1983) Graphical Methods for Data Analysis. Belmont, Calif.: Wadsworth International Group. Cochran, W. G. (1977) Sampling Techniques (third ed.). New York: John Wiley. Development Associates (1984a) The descriptive phase report of the national longitudinal study of the effectiveness of services for LMLEP students. Technical report, Development Associates Inc., Arlington, Va. Development Associates (1984b) Overview of the research design plans for the national longitudinal study study of the effectiveness of services for LMLEP students, with appendices. Technical report, Development Associates Inc., Arlington, Va. Development Associates (1986) Year 1 report of the longitudinal phase. Technical report, Development Associates Inc., Arlington, Va. Duncan, O.D. (1975) Introduction to Structural Equation Models. New York: Academic Press. Groves, R.M. (1989) Survey Errors and Survey Costs. New York: John Wiley. Kasprzyk, D., Duncan, G., Kalton, G., and Singh, M. P. (1987) Panel Surveys. New York: John Wiley. Kish, L. (1965) Survey Sampling. New York: John Wiley. Little, R. J. A., and Rubin, D. B. (1987) Statistical Analysis with Missing Data. New York: John Wiley.

OCR for page 24
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Madow, W. G., Nisselson, J., and Olkin, I., eds. (1983) Incomplete Data in Sample Surveys, Volume 1: Report and Case Studies. Panel on Incomplete Data, Committee on National Statistics, Commission on Behavioral and Social Sciences and Education, National Research Council. New York: Academic Press. Miller, R. G. (1981) Simultaneous Statistical Inference (second ed.). New York: Springer Verlag. Skinner, C. J., Holt, D., and Smith, T. M. F. (1989) Analysis of Complex Surveys. New York: John Wiley. Spencer, B. D., and Foran, W. (1991) Sampling probabilities for aggregations, with applications to NELS:88 and other educational longitudinal surveys. Journal of Educational Statistics, 16(1), 21–34. U.S. Department of Education (1991) The Condition of Bilingual Education in the Nation: A Report to the Congress and the President. Office of the Secretary. Washington, D.C.: U.S. Department of Education. Yates, F. (1981) Sampling Methods for Censuses and Surveys (fourth ed.). New York: Macmillan.