bias in much education research is arguably an important reason for inconsistent findings across different studies.

The difficulty of sorting out causality in quasi-experimental studies helps explain why truly experimental studies, though few in education, are often quite influential. The most significant example is a major class size experiment launched in 1985 in Tennessee.

Project STAR (summarized in Finn, 1998) ran from 1985 to 1989 in 79 elementary schools in Tennessee. Entering kindergarten students were randomly assigned to one of three class types: small (enrollment of 13–17), regular (enrollment of 22–26), or regular with a full-time teaching aid in addition to the regular teacher. Classes remained the same type for 4 years, through 3rd grade, while a new teacher was assigned at random to each class each year. About 7,500 pupils in more than 300 classrooms participated. After the original STAR project ended, Tennessee authorized a follow-up study (the lasting benefits study) to see how long the original benefits of small classes would persist.

Differences in the three class types were highly statistically significant, thanks to achievement gains in the small classes and not in the regular classes with aides. The benefits of small classes were found to be greater for minority students (most of whom were black) and for students attending inner-city schools. After kindergarten, the effects on reading and mathematics achievement were typically twice as large for blacks as for whites (Nye et al., 1993) and even larger for blacks in inner cities (Krueger, 1997). The effects were robust even after sensitivity analysis examined several limitations in the study design and implementation (Krueger, 1997), although researchers have questioned the extent to which meaningful gains occurred after the first year of enrollment in a small class (Hanushek, 1998).

Because of the experimental nature of the class-size study (and, no doubt, because its results correspond to the belief of many parents and teachers that smaller classes are better than larger ones), the Tennessee results have spurred efforts around the country to reduce class sizes, especially in the early grades. While not necessarily disputing the Tennessee findings, however, scholars have questioned whether reducing class sizes is the more effective use of resources.

Hanushek et al. (1998), for example, have drawn on the Texas database mentioned earlier, now augmented by longitudinal data on academic test scores for several cohorts of students at different grade levels, to examine whether there are significant differences among schools in their ability to raise academic achievement, what characteristics of schools seem to account for any differences in impact, and whether any such differences are systematically related to school resources or to measurable aspects of schools and teachers. While they found that schools vary greatly in the impact they have on student achievement and that the differences centered on the differential impact of teachers, they also found that differences among teachers are not readily measured by simple characteristics of the teachers and classrooms. In other words, the study provides strong

