In a typical American elementary or secondary school, the curriculum serves two purposes that often exist in tension with each other. One is to have all students master a common core of knowledge, an objective reflected in the current emphasis on "high standards for all." The other is to provide curricular differentiation—differentiated instruction suited to students' varied needs, interests, and achievement levels (Gamoran and Weinstein, 1998). This second purpose is pursued in many schools through practices variously known as "tracking," "ability grouping," and "homogeneous grouping." Put differently, educators "organize school systems so that students who appear to vary in their educational needs and abilities can be taught separately, either in specialized schools or in the same school in distinct programs, classes, or instructional groups within classrooms" (Oakes et al., 1992:570).
The literature on tracking is voluminous, and the effects of tracking have often been debated in recent years.1 Tracking policies and practices vary from state to state, district to district, and school to school. A comprehensive survey of these practices and their effects on students would have been beyond the committee's resources. We have therefore tried to focus our work on matters directly within our charge.
Limitations of Terminology
Although many terms are used to describe practices of curricular differentiation, each has its limitations.
Tracking, the term used by the Congress in defining the committee's mandate, suggests the classic, rigid form of curricular differentiation in which a student's program or "track"—academic, general, or vocational—determines virtually every course that the student will take and at what level of difficulty. In recent decades, formal grouping systems this rigid have become less common in schools (Lucas, in press).
Ability grouping, a term used widely by scholars and practitioners, implies—incorrectly, in our view—that students are being grouped on the basis of "ability," a quality that some view as innate and immutable. As we will see, schools that group students usually do so on the basis of classroom performance and other measures of achievement that reflect acquired knowledge—something that can and does change over time—rather than ability. It is therefore misleading to use the term "ability grouping." Moreover, given the degree of racial and socioeconomic stratification that is often associated with grouping, it may reinforce false stereotypes to imply incorrectly that students in different groups are distinguished by ability. We find it more accurate to say that schools that group students typically try to do so by "skill level" or "achievement level" (Mosteller et al., 1996).
Homogeneous grouping is also a misnomer, based on studies of actual practice. The term "homogeneous" suggests that all the students in a given group are alike, or at least similar, in their achievement levels. Empirical studies cast doubt on this assumption, however. "Grouping's effect on reducing even cognitive diversity may be very small," report Oakes et al. in their comprehensive survey (1992:594). "Other studies document considerable overlap of students' skills and abilities among groups …. Thus the degree to which tracking reduces heterogeneity may be far less than we typically assume." For reasons discussed below, it appears that factors other than student achievement—scheduling constraints, parental interventions, and student choice, in particular—often help to determine who takes which classes. Although these other factors may be entirely legitimate, they often produce groupings that are not very homogeneous. In some circumstances, "it is unclear whether it is possible to organize classes that contain a narrow range of student ability" (Gamoran and Weinstein, 1998:387). At the same time, there is evidence
of considerable homogeneity in secondary mathematics classes (Linn, 1998a).
The committee has decided to use in this report the term that the Congress chose—tracking—while recognizing that neither it nor any of the common alternatives is entirely satisfactory as a description of actual practice in most schools. The committee defines tracking as forms of placement whereby individual students are assigned, usually on the basis of perceived achievement or skill level, to separate schools or programs, classes within grade levels, groups within classes (at the elementary level), and courses within subject areas (at the secondary level).
Nature and Extent of Tracking
Tracking takes many forms in American schools. Among them are "exam" schools and "gifted and talented" programs or classes, to which only certain students are admitted usually on the basis of their perceived achievement levels or talents.2 Some scholars and practitioners also see programs for students with mild mental disabilities (mild mental retardation, learning disabilities, and emotional problems) as a form of tracking (Lipsky and Gartner, 1989) because students are often referred for possible placement in such programs on the basis of their perceived abilities or achievement levels. When this is the case, the committee considers such referrals a potential form of tracking, even though actual placement depends on individualized assessments conducted with parental consent.
Although almost all elementary schoolchildren study the same core subjects, "in the United States, differentiation begins early, with most elementary schools employing between-class … grouping for the entire day, between-class grouping for specific subjects, and/or within-class grouping for specific subjects" (Oakes et al., 1992:571). In the last decade, however, there has been an increase in heterogeneous grouping within elementary schools, and new techniques, such as cooperative learning, offer promising ways of grouping children heterogeneously within classrooms (Slavin et al., 1989, 1996).
Tracking also remains typical in American secondary schools (Oakes
et al., 1992:571), despite opposition from many middle school educators (Lynn and Wheelock, 1997) and despite the demise of formal tracking, under which a student's program of study (college preparatory, general, or vocational) largely determined the courses he or she would take (Lucas, in press). As "formal tracks were abolished … the reality of tracking has been preserved in many schools through a variety of new mechanisms" (Moore and Davenport, 1988:11–12). Within-school grouping continues, although less rigidly than in the past. For example, although many schools retain the familiar three-tiered system, some assign most students to the middle group, with relatively few being placed in higher- or lower-level classes (Gamoran, 1989).
The secondary school schedule also tends to promote tracking. "Because students assigned to a high-level class for one subject tended to be assigned to a similar level in other subjects, the end result was a set of curricular tracks as distinct as in the past. Sometimes students were actually assigned to sets of classes at the same ability level all at once" (Oakes et al., 1992:575).3
Parental intervention also operates to preserve curricular differentiation in public secondary schools. "Middle-class parents intervene to obtain advantageous positions for their children even over and against school personnel…. Middle-class parents are the protectors of the existing in-school stratification system" (Lucas, in press:206). Especially in schools with racially and socioeconomically diverse student populations, these parental influences serve to replace formal tracking with "a more hidden in-school stratification system" (Lucas, in press:205; Meier et al., 1989).
The secondary curriculum is differentiated by subject—students typically have more electives than in elementary school—as well as by track. The degree of differentiation in secondary mathematics, for example, is considerable. It is common to find within a single high school courses ranging from remedial and "business" math to calculus and statistics, arrayed in as many as four distinct tracks (Linn, 1998a:3, citing McKnight et al., 1987). We note with interest that results from the Second International
Mathematics Study show that the variation in student math performance associated with tracking is far greater in the United States than in most other countries; that is, the difference in average achievement of students in different classes in the same school is far greater in the United States than in most other countries (Linn, 1998a).4 Even in schools that have tried to reduce or eliminate tracking, however, the practice remains nearly universal in the teaching of mathematics, in part because math teachers and parents believe strongly in its effectiveness.
In sum, tracking in various forms has been and remains an important feature of public elementary and secondary education in the United States.
Role of Tests in Tracking Decisions
Tests play a complex role in tracking decisions. On one hand, there is evidence that most within-grade and within-class tracking decisions are not based solely on test scores (Delany, 1991; Selvin et al., 1990; White et al., 1996). Although practice varies considerably, even from school to school, educators consistently report that such decisions are based on multiple sources of information: test scores, teacher and counselor recommendations, grades, and (at secondary levels) student choice (Oakes et al., 1992). Also, as previously noted, parents often play a powerful role.
On the other hand, standardized tests are routinely used in making tracking decisions (Glaser and Silver, 1994; Meisels, 1989). Moreover, they may play an important, even dominant, role in selecting children for exam schools and gifted and talented programs.5 IQ tests play an important part in the special education evaluation process, and their use contributes to the disproportionate placement of minority students into
classes for students with mild mental retardation (National Research Council, 1982; Haney, 1993).6 Even when test scores are just one factor among several that influence tracking decisions, they may carry undue weight by appearing to provide a scientific justification and legitimacy for tracking decisions that such decisions would not otherwise have.7
Some standardized test scores can be used appropriately in making tracking decisions, and the following sections of this chapter describe criteria that are relevant in determining whether a particular test use is appropriate. At the same time, research suggests that some other standardized tests commonly employed for tracking are not valid for this purpose. For example, Darling-Hammond (1991) asserts that schools improperly use norm-referenced multiple-choice tests for tracking purposes; she argues that such tests are designed to rank students and not to support instruction, and that linking such test scores to student tracking can seriously limit students' learning.8 Tests that yield criterion-referenced interpretations may be preferable. Similarly, Glaser and Silver (1994) find evidence of negative consequences from the use of selection tests for placement in tracks.9 Meisels (1989) also contends that some standardized tests are used inappropriately for tracking purposes and recommends that other, more appropriate standardized tests be used in making tracking decisions.10 Finally, a recent report prepared for the National
Education Goals panel calls attention to a troubling use of tests to track young children (Shepard et al., 1998:4):
Recently … there has been an increase in formal assessments and testing [of children up through age 8], the results of which are used to make "high-stakes" decisions such as tracking youngsters into high- and low-ability groups …. In many cases, the instruments developed for one purpose or even one age group of children have been misapplied to other groups. As a result, schools have often identified as "not yet ready" for kindergarten, or "too immature" for group settings, large proportions of youngsters (often boys and non-English speakers) who would benefit enormously from the learning opportunities provided in these settings. In particular, because the alternative treatment is often inadequate, screening out has fostered inequities.
There is some evidence that students' race or socioeconomic status (SES) may influence the weight that educators accord to their test scores, leading to differential treatment in the tracking process. For example, one case study found "that school counselors and teachers respond to comparable achievement scores of Asian and Hispanic students quite differently, with Asians far more likely to be placed in advanced classes than Hispanics with similar scores" (Oakes et al., 1992:577). Similarly, more than one court decision has established that some school officials assign low-scoring white students to high tracks and high-scoring minority students to low tracks (e.g., People Who Care v. Rockford Board of Education, 1997; Oakes, 1995). Previously noted research by Lucas (in press) provides powerful evidence that middle- and higher-income parents intervene in tracking decisions, effectively overriding test scores (and other factors that schools may use in tracking decisions) to produce tracks that are highly stratified by SES and race. The importance of social class in tracking decisions is suggested by a study that controlled for prior achievement, social class, and school, using data from the High School and Beyond survey; Gamoran and Mare (1989) concluded that black students were 10 percent more likely than comparable white students to be placed in high-track classes.
The educational consequences of these practices and trends are considered below. It is clear, however, that the role of tests in tracking decisions justifies consideration of their appropriate use.
Psychometrics of Placement
Tracking decisions are basically placement decisions, and tests used for this purpose should meet professional test standards regarding placement
(American Educational Research Association et al., 1985, 1998; Joint Committee on Testing Practices, 1988).
The main assumption underlying tracking decisions is that particular students will benefit more from certain experiences, resources, or environments than they would from others, and that this benefit is optimized when they are taught with other students like themselves in achievement level. Because of this assumption, valid placement requires evidence that students are likely to be better off in the setting in which they are placed than they would be in a different available setting. Such evidence, in psychometric terms, shows an aptitude-treatment interaction in terms of outcome measures of learning and well-being. For example, students who get high scores on a placement test of spatial ability should in fact be found to learn more in a physics course in which the problems are expressed in pictures than they would in a physics course in which similar problems were expressed in numbers.
Other assumptions underlying test use for tracking decisions include: that the test taps the knowledge, skills, or other attributes it is interpreted to measure; that the cutscore chosen is an accurate discriminator of the attribute measured in relation to the associated levels of benefit; and that the test scores have comparable meanings and properties for all students. Depending on the context involved, however, it may not be necessary to gather supporting evidence or documentation for all of these assumptions. For example, some of them may be argued to be plausible on their face or already supported by evidence provided by the test developer or in the testing literature. What will always be required, however, is that the sum of the evidence gathered as part of the test validation process is sufficient to make a credible case that the use of the test for placement is appropriate—that is, both valid and fair.11
Validation of Test Use
As previously noted, there is evidence that test scores are routinely used, although rarely as the sole criterion, in making tracking decisions. To the extent that they are used, however, they should be validated by the kinds of information described below (American Educational Research Association et al., 1985; 1998).
The types of evidence required to establish validity are elaborated in Chapter 4.
Decisions about a student's placement should be based on predictions about which available setting will produce the most beneficial expected educational outcome (National Research Council, 1982). The standard for using a test in this way should be its accuracy in predicting the likely educational effects of each of several alternative future placements. For example, if a student performs in a particular way on a math test, that performance should help predict whether the student will be better served by being placed in one type or level of math course rather than another (American Educational Research Association et al., 1985: Standards 1.20 to 1.23, 8.10, and 8.11; 1998). This is true not only when the possible placements include alternative math courses, but also when the choice is between placement in a gifted and talented class or a more traditional class, or when the choice is between special education and general education.
For example, as an earlier National Research Council report (1982) notes, one of the main validity claims for the use of IQ tests to place students in classes for the educable mentally retarded (EMR) was the test's predictive power. That committee concluded, however, that this prediction alone was insufficient evidence of the test's educational utility. Additional evidence was required that children with scores in the EMR range would actually learn more effectively in a special education program than in other available placements. Research on tests used for placement in early childhood has come to the same conclusion about the type of evidence required for validation (Shepard et al., 1998).
Similar standards are relevant to tests used for course placement decisions in high school. Kane concluded that, to establish the validity of an algebra test used as a prerequisite for calculus, one had to demonstrate that students with low scores "do substantially better in the calculus course if they take the remedial course before taking the calculus course" (1992:531). This evidence would be required in addition to the usual conceptual and empirical verification that the test, when used for differential placement, is in fact a valid measure of algebra skills. In this instance, the hypothesized consequences could be checked by means of a randomized experiment, comparing the calculus performance of low scorers with and without remediation.
As previously stated, however, a test score is seldom used as the sole criterion for making a tracking or placement decision. Rather, it is more likely to be used in combination with other sources of information about
the student. Therefore, the strength of the interaction between test scores and placement outcomes should be considered in the context of the availability of other relevant information and its relative weight.
In general, a test used to make a placement decision is not being used to certify mastery but rather to predict a student's response to alternative future educational settings. Therefore it is not essential to show that the students have already been taught the skills tested. To the extent possible, however, the content of such tests should be relevant to the experiences to which the student will be exposed (American Educational Research Association et al., 1985: Standards 6.1 and 6.4; 1998).
For example, in the case of a math test used to aid in placing a student in a beginning or advanced algebra course, the validity of score interpretation may be enhanced by ensuring that the test adequately covers the relevant content and thought processes in the knowledge domain it is interpreted to measure (that domain could be algebra but might also be general mathematics). As noted earlier, a number of researchers claim that some kinds of tests commonly used in making tracking decisions do not, in fact, provide information on the extent to which individual students are prepared for the content to which they are likely to be exposed in future placements, and they recommend that the use of such tests for tracking purposes be discontinued (Darling-Hammond, 1991; Glaser and Silver, 1994; Meisels, 1989; Shepard, 1991).
In addition to evidence of adequate content coverage, the test should be examined to ensure that it does not contain irrelevant material that could confound or obscure the construct to be measured. For example, a math test should not require an unnecessarily high level of reading proficiency, as this may prevent poor readers from demonstrating their readiness to learn math.
Finally, a low score on the test should not be taken as a lack of readiness with respect to the skills being tested without consideration of alternate explanations for the test taker's performance. Variables such as clinically relevant history, school record, and examiner or test taker differences should be considered in interpreting test scores. Influences associated with socioeconomic status, ethnicity, language, age, gender, or specific disabilities may also be relevant (American Educational Research Association et al., 1985: Standard 6.11; 1998).
Accuracy of Cutscores
Tracking decisions, like those for promotion and graduation, depend to some degree on the setting of cutscores. Cutscores are performance standards dividing acceptable levels of readiness from unacceptable levels. Because setting them is inherently judgmental, their validity depends on the reasonableness of the standard-setting process and of its consequences—not the least of which are passing rates and classification errors, especially if they vary by gender, racial, or language minority group.
For example, consider the reasonableness of the widely used Angoff (1971) method of standard setting. In this procedure, expert judges are asked to estimate the probability that a minimally competent respondent will answer each item correctly. The average estimate for each item provides a kind of minimum passing level for the item. These estimates are summed to determine a passing or cutscore for the test. Modified versions of the Angoff method are typically used to set nonminimum standards, such as the basic, proficient, and advanced levels of the National Assessment of Educational Progress (NAEP). The reasonableness of the procedures depends on many factors, including the expertise of the judges. The judges should be knowledgeable not only about the subject tested but also about the expected performance on each item of persons exhibiting various levels of proficiency in the field.
Other procedures have been developed to improve the reasonableness of the standard-setting process (e.g., Jaeger et al., 1996) and to offset some of the vulnerabilities of the Angoff method (Messick, 1995).12 Several new approaches are being examined to make cutscore judgments by various stakeholders both more reasonable and more defensible. 13
The importance of the cutscore may be lessened by the extent to which other information is used in making placement decisions. Whenever cutscores are used, the quality of the standard-setting process should be documented and evaluated—including the qualification of the judges, the method or methods employed, and the degree of consensus reached (American Educational Research Association et al., 1985: Standard 6.9; 1998).
Chapter 4 discussed the issue of fairness in terms of comparable validity across individuals, groups, and contexts. Test scores should have comparable meanings and properties for all groups of students. Accordingly, in assessing the fairness of test use in tracking, it is important to determine the extent to which the test is measuring the same construct—and hence has similar meaning—for different populations.
The racial and socioeconomic stratification that often accompanies tracking is discussed below. For the present purpose, the important question is whether the use of tests in tracking contributes to negative outcomes for particular groups of students. For example, in the case of a math test used to assign students to a beginning or advanced algebra class, it may be found that the test consistently assigns higher numbers of males than females or whites than blacks to the advanced class—more so than assignments based on other factors, such as grades or recommendations. This disproportion may be due to bias in certain test items that make them easier for males or white students. 14 Alternatively, the reason may
the scale is well structured (such as one based on item-response theory) and if it is well described in terms of the cognitive processes required for item performance at different scale levels, then cutscores can be set directly on the scale rather than indirectly by cumulating item judgments. More work is required up front by the test developer in constructing the scale and in developing benchmarks and process descriptions for scale levels, but then the subsequent cutscore judgments by test users become both more informed and more straightforward.
lie in inequities in the testing process itself, such as differential access to test preparation materials and different physical conditions on the day of testing. Even if the disproportionate outcome is an accurate representation of the degree to which different groups of students have mastered the skills measured by the test, the use of the test for tracking purposes would be improper if students were subsequently exposed to instruction that differed substantially in quality—resulting in higher proportions of females or minority students failing an end-of-course algebra test that is a prerequisite for high school graduation.15
Although this type of adverse impact is not automatic evidence of test invalidity, such questions should be part of the validity investigation (Messick, 1989). According to Messick, if adverse impact is traceable to construct over- or underrepresentation, it signals a validity problem. If it is not so traceable, it signals a policy problem. For example, if a test designed to assess algebra skills places a heavy emphasis on complicated word problems, English-language learners will be at a disadvantage in demonstrating their knowledge of algebra. If the resulting scores are weighted heavily in placement, some English-language learners are likely to be placed inappropriately in lower-level classes. Although studies of these types of side effects may not often be part of initial test development, the test user should include a well-designed evaluation component to monitor the intended and unintended consequences of tracking on all students and on significant subgroups of students, including minorities, English-language learners, and students with disabilities.
Effects of Low-Track Placement
"Decisions about a student's track placement," a previous National Research Council report concluded, "should be based on predictions about what track will produce the most beneficial expected educational
procedures are also problematic with performance-type assessments due to the small number of items involved, which makes it difficult to match students. There is a recognized need for the development of more sophisticated techniques for the detection of DIF and/or bias in performance-type items, since these are not immune from fairness concerns (Linn et al., 1991a). Absent such techniques, greater reliance must be placed on judgmental review of items or tasks.
outcome for the student" (National Research Council, 1982). It is beyond the committee's mandate to speculate on what track placements are educationally optimal, as a general matter or for particular students.
Under the committee's definition of appropriate test use (National Research Council, 1982), however, it is inappropriate to use tests to place students in settings that are demonstrably ineffective educationally. As tracking is currently practiced, students assigned to typical low-track classes are worse off than they would be in other placements. The most common reasons for this disadvantage are the failure to provide students in low-track classes with high-quality curriculum and instruction and the failure to convey high expectations for such students' academic performance. Unless these conditions are changed, and there is evidence that students will benefit more from such placements than from others, we recommend that low-track placements be eliminated, whether based on test scores or other information.
This is not to say that grouping students by achievement or skill level is in general a bad practice. Some forms of tracking, such as proficiency-based placement in foreign language classes or other classes for which there is a demonstrated need for prerequisites, may be beneficial. We know, moreover, that researchers have found some schools and programs in which students in low-track classes received beneficial, high-quality instruction. These, however, involved not typical public schools but Catholic schools (Lee, 1985; Valli, 1986; Page and Valli, 1990), alternative schools, dropout programs (Wehlage, 1982), magnet programs (Mitchell and Benson, 1989), and a school that had recently undergone a thorough restructuring of staff and curriculum (Gamoran and Weinstein, 1998). And what made some of these low-track classes educationally beneficial appears to have been such factors as high teacher expectations, small class size, extra resources that permitted individualized instruction, strong intellectual leadership, a rigorous academic curriculum, extra efforts by teachers to promote extensive class discussion, the capacity to choose students and teachers, and "no system of assigning inexperienced or weak teachers to the low-track classes" (Gamoran, 1993:1; Gamoran and Weinstein, 1998).
Unfortunately, however, empirical research demonstrates that there is a very different reality in typical low-track classes. Moreover, there are serious structural and attitudinal barriers to change: "[Trying] to improve the quality of instruction in low tracks … fails to address the problem that tracking and ability grouping constitute not merely differentiation
but stratification—that is, an unequal distribution of status—which typically leads to an unequal allocation of resources such as curricular materials [and] teaching competencies" (Gamoran and Weinstein, 1998:387). That minority students and low-SES students are disproportionately assigned to low-track classes is further cause for concern. The following sections describe more fully the research on typical low-track classes.
Numerous studies show that students in most low-track classes have less access to well-qualified, highly motivated teachers than do their peers in other tracks. "[T]eachers often prefer instructing high-ability classes" and principals commonly "use class assignments as a reward for teachers judged more powerful or successful and as a sanction against those deemed weaker or undeserving" (Oakes et al., 1992:583, citing Becker, 1953; Hargreaves, 1967; and McPartland and Crain, 1987). "This process may result in a vicious circle for low tracks: Repeated assignment to the bottom of the school's status hierarchy may demoralize teachers, hindering their improvement and perhaps even reducing their competency over time" (Oakes et al., 1992:583, citing Finley, 1984; Gamoran and Berends, 1987; and Hargreaves, 1967). Although the academic backgrounds of elementary school teachers do not appear to differ much by track taught, there are "significant discrepancies among teachers assigned to various classes in secondary schools" (Oakes, 1990). For example, "[t]eachers of low-ability secondary science and mathematics classes are consistently less experienced, less likely to be certified in math or science, hold fewer degrees in those subjects, have less training in the use of computers, and less often report themselves to be 'master teachers'" (Oakes et al., 1992:583).16
Access to Knowledge
In elementary school, students in low tracks proceed by design at a slower pace than do students in higher tracks. Consequently, students who have been in high-track classes "are likely to have covered considerably
more material by the end of elementary school" (Oakes et al., 1992:583). The type of material they have covered is also different; "low reading groups spend relatively more time on decoding activities, whereas more emphasis is placed on the meanings of stories in high groups" (Oakes et al., 1992:583, citing Alpert, 1974; Hiebert, 1983; McDermott, 1987; and Wilcox, 1982).
"In secondary schools, low-track classes consistently offer greater exposure to less demanding topics and skills, whereas high-track classes typically include more complex material and more difficult thinking and problem-solving tasks" (Oakes et al., 1992:583, citing Burgess, 1983, 1984; Hargreaves, 1967; Keddie, 1971; Metz, 1978; Oakes, 1985; Page, 1989; Powell et al., 1985; Sanders et al., 1987; Squires, 1966; and Trimble and Sinclair, 1986). "At both the elementary and secondary levels, teachers of low-ability classes reported giving less emphasis than teachers of other classes to such matters as students' interest in math and science … inquiry skills and problem solving … and to preparing students for further study in those subjects" (Oakes et al., 1992:584. "[H]igh-level classes were more often characterized by authentic assignments, student control over work, and high-order cognitive tasks" (Oakes et al., 1992:584, citing Nystrand and Gamoran, 1988). According to Oakes (1985), low-track classes are characterized by "a dull, isolating curriculum of passive drill and practice with trivial bits of information, whereas the upper-track curriculum encompass[es] imaginative, engaging assignments with 'high-status knowledge' such as Shakespeare or calculus" (Oakes et al., 1992:585, citing Oakes, 1985).
In sum, the research suggests that instruction in low-track classes is far less demanding than in high-track classes (Welner and Oakes, 1996; McKnight et al., 1987) and far less oriented to the higher-order knowledge and thinking skills that are strongly associated with future success (Linn, 1998a).
Equally important, low-track placements do not serve a remedial function, in that they do not help low-track students catch up with students in other tracks. Instead, "numerous studies provide evidence of the increasing disparity between high- and low-track students over time" (Oakes et al., 1992:591, citing Gamoran and Berends, 1987; Murphy and Hallinger, 1989; Gamoran, 1987; Gamoran and Mare, 1989; Hotchkiss and Dorsten, 1987; Lee and Bryk, 1988; and Vanfossen et al., 1987). Track effects are large, moreover; Gamoran (1987) has estimated that "the academic track advantage was larger than the gap between students
in school and dropouts" (Oakes et al., 1992:591). Not surprisingly, therefore, mobility between low tracks and higher tracks is limited: "Children in the lowest groups are rarely moved to the highest groups; the disparity … grows greater over time …. [E]ach subsequent assessment of ability is, in part, a product of the assessments that preceded it" (Oakes et al., 1992:596, citing Goodlad and Oakes, 1988).
Finally, students in low-track classes would learn more if they received high-quality teaching and a demanding curriculum, as research demonstrates (Slavin et al., 1996; Levin, 1988; Oakes et al., 1992). The weight of the evidence has been recognized by the Congress. In its 1994 amendments to Title I, the Congress expressly found that: "[a]ll children can master challenging content and complex problem-solving skills. Research clearly shows that children, including low-achieving children, can succeed when expectations are high and all children are given the opportunity to learn challenging material" (Title I, Elementary and Secondary Education Act, 20 U.S.C. section 6301(c)(1)). Based on this conclusion, other provisions of Title I require that eligible students receive "accelerated," "enriched," and "high-quality" curricula, "effective instructional strategies," "highly qualified instructional staff," and "high-quality" staff development (20 U.S.C. sections 6314(b)(1), 6315(c)(1), and 6320(a)(1)).
As tracking is currently practiced in the United States, students will need to be educated in settings other than typical low-track classes if they are to receive the high-quality curriculum and instruction they need to "master challenging content and complex problem solving skills."
Disproportions Based on Race, National Origin, Language, and SES
Research on patterns of student stratification has found disproportionate percentages of low-SES students and minority students in curricula designed for low-ability and noncollege-bound students (Gamoran and Mare, 1989; Moore and Davenport, 1988; National Center for Educational Statistics, 1985; Oakes, 1990; Braddock, 1990). High School and Beyond survey data from 1982 provide an illustration. "African American students participated in the vocational education track at a rate 34 percent higher than … the rate for white students …. The participation rate in academic programs among African American students was 88 percent of the rate for whites, and, in the general track, the
African American student participation rate was 84 percent of the rate for whites" (Braddock, 1990:2). Similar statistics were found for Hispanic students (Braddock, 1990).
Minority students in racially mixed schools are disproportionately placed in low-track classes (Oakes et al., 1992) and consistently underrepresented in programs for the gifted and talented (Darling-Hammond, 1985). The same holds true for advanced placement classes; in Milwaukee for example, whites make up 24 percent of the total student population but 54 percent of those enrolled in advanced placement courses, whereas black students constitute 61 percent of the student population but only 17 percent of those in advanced placement courses (interview with Lynn Krebs, guidance director, Milwaukee School District).
There is evidence that tests used for tracking contribute to these disproportions: lower test scores by minority students and low-SES students under gird these patterns (Oakes et al., 1992). Tests used for tracking are not unique in this respect: "Gaps between average scores of minority and nonminority individuals show up not just on so-called intelligence or ability tests and general achievement tests. They also show up on competency tests used for grade promotion and high-school graduation [and tests used for other purposes]" (Haney, 1993:50, citing National Commission on Testing and Public Policy, 1990). At the same time, disproportionate placement rates are also due to factors other than test use; placement differences by race and social class seem to occur whether test scores, counselor and teacher recommendations, or student and parent choices are the basis for placement (Oakes et al., 1992).
Whether it is due to test scores or other information, the committee see cause for concern in the fact that minority students and low-SES students are proportionately overrepresented in a classes typically characterized by an exclusive focus on basic skills, low expectations, and less-qualified teachers.
The committee's findings and recommendations about tracking are reported in Chapter 12.
Alpert, J.L. 1974 Teacher behavior across ability groups: A consideration of the mediation of Pygmalion effects. Journal of Educational Psychology 66(3):348–353.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1985 Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.
1998 Draft Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.
Angoff, W.H. 1971 Scales, norms, and equivalent scores. Pp. 508–600 in Educational Measurement, 2nd Edition), R.L. Thorndike, ed. Washington, DC: American Council on Education.
Becker, Henry S. 1953 The teacher in the authority system of the school. Journal of Educational Sociology 27(3):128–141.
Braddock, J.H. II 1990 Tracking: Implications for Student Race-Ethnic Subgroups. Baltimore, MD: Center for Research on Effective Schooling for Disadvantaged Students.
Burgess, Robert G. 1983 Experiencing Comprehensive Education: A Study of Bishop McGregor School. London: Methuen.
1984 It's not a proper subject: It's just Newsom. Pp. 181–200 in Defining the Curriculum, J. Goodson and S. Ball, eds. London: Falmer.
Cole, N., and P.A. Moss 1989 Bias in test use. Educational Measurement, 3rd Edition, R. Linn, ed. New York: American Council on Education.
Darling-Hammond, L. 1985 Equality and Excellence: The Educational Status of Black Americans . New York: College Entrance Examination Board.
1991 The implications of testing policy for quality and equality. Phi Delta Kappan 73(3):220–225.
Delany, B. 1991 Allocation, choice, and stratification within high schools: How the sorting machine copes. American Journal of Education 99(2):181–207.
Feldt, L.S., and R.L. Brennan 1989 Reliability. Pp. 105–146 in Educational Measurement, 3rd Edition, R.L. Linn, ed. New York: MacMillan.
Finley, Marilee K. 1984 Teachers and tracking in a comprehensive high school. Sociology of Education 57:233–243.
Gamoran, A. 1987 Organization, instruction, and the effects of ability grouping: Comment on Slavin's best-evidence synthesis. Review of Educational Research 57(3):341–345.
1988 A Multi-level Analysis of the Effects of Tracking. Paper presented at the annual meeting, American Sociological Association, Atlanta, GA.
1989 Tracking and the Distribution of Status in Secondary Schools. Paper presented at the annual meeting, American Sociological Association, San Francisco, CA.
1993 Alternative uses of ability grouping in secondary schools: Can we bring high-quality instruction to low-ability classrooms? American Journal of Education 102(1):1–22.
Gamoran, A., and M. Berends 1987 The effects of stratification in secondary schools: Synthesis of survey and ethnographic research. Review of Education Research 57(4):415–435.
Gamoran, A., and R.D. Mare 1989 Secondary school tracking and educational inequality: Compensation, reinforcement or neutrality? American Journal of Sociology 94(5):1146–1183.
Gamoran, A., and M. Weinstein 1998 Differentiation and opportunity in restructured schools. American Journal of Education 106:385–415.
Gardner, H. 1993 Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books.
Glaser, R. 1963 Instructional technology and the measurement of learning outcomes: Some questions. American Psychologist, 18:519–521.
Glaser, R., and E. Silver 1994 Assessment, Testing, and Instruction: Retrospect and Prospect. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.
Goodlad, J.I., and J. Oakes 1988 We must offer equal access to knowledge. Educational Leadership 45:16–22.
Haney, W. 1993 Testing and minorities. In Beyond Silence: Class, Race, and Gender in United States Schools, edited by L. Weiss, and M. Finne. Albany: State University of New York Press.
Hargreaves, D.H. 1967 Social Relations in a Secondary School. London: C. Tinling.
Hiebert, E. 1983 An examination of ability grouping for reading instruction. Reading Research Quarterly 18(2):231–255.
Holland, P.W., and H. Wainer 1993 Differential Item Functioning. Hillsdale, NJ: Erlbaum.
Hotchkiss, L., and L. Dorsten 1987 Curriculum effects on early post high school outcomes. Pp. 191–219 in Sociology of Education and Socialization, R.G. Corwin, ed. Greenwich, CT: JAI Press.
Jaeger, R.M., I.V.S. Mullis, M.L. Bourque, and S. Shakrani 1996 Setting performance standards for performance assessments: Some fundamental issues, current practice, and technical dilemmas. Pp. 79–115 in Technical Issues in Large-scale Performance Assessment, G.W. Phillips, ed. Washington, DC: U.S. Government Printing Office.
Joint Committee on Testing Practices 1988 Code of Fair Testing Practices in Education. Washington, DC: National Council on Measurement in Education.
Kane, M.T. 1992 An argument-based approach to validity. Psychological Bulletin 112:527–535.
Keddie, N. 1971 Classroom knowledge. Pp. 133–150 in Knowledge and Control, M.F.D. Young, ed. London: Collier-Macmillan.
Kornhaber, M. 1997 Seeking Strengths: Equitable Identification for Gifted Education and the Theory of Multiple Intelligences. Doctoral dissertation, Harvard Graduate School of Education.
Lee, V.E. 1985 Investigating the Relationship Between Social Class and Academic Achievement in Public and Catholic Schools: The Role of the Academic Organization of the School. Doctoral dissertation, Harvard Graduate School of Education.
Lee, V.E., and A.S. Bryk 1988 Curriculum tracking as mediating the social distribution of high school achievement. Sociology of Education 62:78–94.
Levin, H. 1988 Accelerated Schools for At-Risk Students. New Brunswick, NJ: Center for Policy Research in Education.
Linn, R. 1998a Assessments and Accountability. Paper presented at the annual meeting, American Educational Research Association, San Diego, CA.
1998b Validating inferences from National Assessment of Educational Progress Achievement-Level Reporting. Applied Measurement in Education 11(1):23–47.
Linn, R.L., E.L. Baker, and S.B. Dunbar 1991a Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher 20(8):15–21.
Linn, R.L., D. Koretz, E.L. Baker, and L. Burstein 1991b The Validity and Credibility of the Achievement Levels for the 1990 National Assessment of Educational Progress in Mathematics. Tech. Rep. No. XX. Los Angeles: University of California, Center for the Study of Education .
Lipsky, D., and A. Gartner 1989 Beyond Separate Education: Quality Education for All. Baltimore, MD: Brookes.
Lucas, S. in press Tracking Inequality: Stratification and Mobility in American Schools . New York: Teachers College Press.
Lynn, L., and A. Wheelock 1997 Making detracking work. The Harvard Education Letter 13(1):1–4.
McDermott, R.P. 1987 The explanation of minority school failure, again. Anthropology and Education Quarterly 18(4):361–364.
McKnight, C.C., and F.J. Crosswhite, J.A. Dossey, E. Kifer, S.O. Swafford, K. Travers, and T. J. Cooney 1987 The Underachieving Curriculum: Assessing U.S. School Mathematics from an International Perspective. Champaign, IL: Stipes Publishing.
McPartland, J.M., and R.L. Crain 1987 Evaluating the trade-offs in student outcomes from alternative school organization policies. Pp. 131–156 in The Social Organization of Schools: New Conceptualizations of the Learning Process, Maureen T. Hallinan, ed. New York: Plenum.
Meier, R., J. Stewart, and R. England 1989 Race, Class, and Education: The Politics of Second-Generation Discrimination . Madison: University of Wisconsin Press.
Meisels, S.J. 1989 Testing, Tracking, and Retaining Young Children: An Analysis of Research and Social Policy. Commissioned paper for the National Center for Education Statistics.
Messick, S. 1989 Validity. Pp. 13–103 in Educational Measurement, 3rd Edition, R. Linn, ed. New York: American Council on Education.
1995 Standards-based score interpretation: Establishing valid grounds for valid inferences. Joint Conference on Standard Setting for Large-Scale Assessments: Proceedings (Vol. 2: 291–305). Washington, DC: US Government Printing Office.
Metz, M.H. 1978 Classrooms and Coridors: The Crisis of Authority in Desegregated Secondary Schools. Berkeley: University of California Press.
Mitchell, V., and C. Benson 1989 Exemplary Urban Career-oriented High Schools. Berkeley, CA: National Center for Research in Vocational Education.
Moore, D., and S. Davenport 1988 The New Improved Sorting Machine. Madison, WI: National Center on Effective Secondary Schools.
Mosteller, F., R. Light, and J. Sachs 1996 Sustained inquiry in education: Lessons from skill grouping and class size. Harvard Educational Review 66 (4):797–843.
Murphy, J., and P. Hallinger 1989 Equity as access to learning: Curricular and instructional treatment differences, Journal of Curriculum Studies 21(2):129–149.
National Center for Education Statistics 1985 High School and Beyond: An Analysis of Course-taking Patterns in Secondary Schools as Related to Student Characteristics. Washington, DC: US Department of Education.
National Commission on Testing and Public Policy 1990 From Gatekeeper to Gateway: Transforming Testing in America. Chestnut Hill, MA: National Commission on Testing and Public Policy.
National Research Council 1982 Placing Children in Special Education: A Strategy for Equity, K.A. Heller, W.H. Holtzman, and S. Messick, eds. Committee on Child Development Research and Public Policy. Washington, DC: National Academy Press.
1999 Uncommon Measures: Equivalence and Linkage Among Educational Tests , M.J. Feuer, P.W. Holland, B.F. Green, M.W. Bertenthal, and F.C. Hemphill, eds. Committee on Equivalency and Linkage of Educational Tests, Board on Testing and Assessment. Washington, DC: National Academy Press.
Nystrand, M., and A. Gamoran 1988 A Study of Instruction as Discourse. Madison: Wisconsin Center for Education Research.
Oakes, J. 1985 Keeping Track: How Schools Structure Inequality. New Haven, CT: Yale University Press.
1986 Keeping track, part I: The policy and practice of curriculum inequality. Phi Delta Kappan 68(1):12–17.
1990 Multiplying Inequalities: The Effects of Race, Social Class, and Tracking on Opportunities to Learn Math and Science. Santa Monica, CA: Rand.
1995 Two cities' tracking and within-school segregation. Teachers College Record 96(4):681–693.
Oakes, J., A. Gamoran, and R. Page 1992 Curriculum differentiation: Opportunities, outcomes, and meanings. In P. Jackson, ed., Handbook of Research on Curriculum. New York: MacMillan Publishing Company.
Page, R. 1989 The lower-track curriculum at a "heavenly" high school: "Cycles of prejudice." Journal of Curriculum Studies 21(3):197–221.
Page, R., and L. Valli, eds. 1990 Curriculum Differentiation: Interpretive Studies in U.S. Secondary Schools. New York: State University of New York Press.
Powell, A., E. Farrar, and D.K. Cohen 1985 The Shopping Mall High School. Boston: Houghton Mifflin.
Sanders, N., N. Stone, and J. LaFollette 1987 The California Curriculum Differentiation: Paths Through High School . Sacramento: California State Department of Education.
Selvin, M.J., J. Oakes, S. Hare, K. Ramsey, and D. Schoeff 1990 Who Gets What and Why: Curriculum Decisionmaking at 3 Comprehensive High Schools. Santa Monica, CA: Rand.
Shepard, L. 1991 Negative policies for dealing with diversity: When does assessment and diagnosis turn into sorting and segregation? E. Hiebert, ed., Literacy for a Diverse Society: Perspectives, Practices and Policies . New York: Teachers College Press.
Shepard, L., et al. 1993 Evaluating test validity. Review of Research in Education 19:405–450.
Shepard, L., S. Kagan, and E. Wurtz, eds. 1998 Principles and Recommendations for Early Childhood Assessments. Washington DC: National Education Goals Panel.
Slavin, R.E., J. Braddock, C. Hall, and R. Petza 1989 Alternatives to Ability Grouping. Baltimore, MD: Center for Research on Effective Schooling for Disadvantaged Students.
Slavin, R.E., et al. 1996 Every Child, Every School: Success for All. Thousand Oaks, CA: Corwin Press.
Squires, J.R. 1966 National study of high school English programs: A school for all seasons. English Journal 55(2):282–290.
Sternberg, R. 1990 Metaphors of Mind: Conceptions of the Nature of Intelligence. New York: Cambridge University Press.
Trimble, K., and R.L. Sinclair 1986 Ability Grouping and Differing Conditions for Learning: An Analysis of Content and Instruction in Ability-grouped Classes. Amherst: University of Massachusetts Center for Curriculum Studies.
Valli, L. 1986 Tracking: Can It Benefit Low-achieving Students? Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
Vanfossen, B.E., J.D. Jones, and J.Z. Spade 1987 Curriculum tracking and status maintenance. Sociology of Education 60(2): 104–122.
Wehlage, G. 1982 The purpose of generalization in field study research. Pp. 211–226 in The Myth of Educational Reform: A Study of School Response to a Program of Change, Thomas Popkewitz, Robert Tabachnick, and Gary Wehlage, eds. Madison: University of Wisconsin Press.
Welner, K.G., and J. Oakes 1996 (Li)Ability grouping: The new susceptibility of school tracking systems to legal challenges. Harvard Educational Review 66(3):451–470.
White, P., A. Gamoran, J. Smithson, and A. Porter 1996 Upgrading the high school math curriculum: Math course-taking patterns in seven high schools in California and New York. Educational Evaluation and Policy Analysis 18(4):285–307.
Wilcox, K. 1982 Differential socialization in the classroom: Implications for equal opportunity. Pp. 268–309 in Doing the Ethnography of Schooling, George Spindler, ed. New York: Holt, Rinehart, and Winston.
Hobson v. Hansen, 269 F. Supp. 401 (D.D.C. 1967), aff'd sub nom. Smuck v. Hansen, 408 F.2d 175 (D.C. Cir. 1969) (en banc).
The Improving America's Schools Act of 1994, P.L. 103–382 (1994).
Larry P. v. Riles, 793 F.2d 969 (9th Cir. 1984).
PASE v. Hannon, 506 F. Supp. 831 (N.D. Ill. 1980).
People Who Care v. Rockford Board of Education, 111 F.3d 528 (7th Cir. 1997).
Title I, Elementary and Secondary Education Act, 20 U.S.C. section 6301(c)(1).