3
Legal Frameworks

Law plays a dual role as far as educational tests are concerned. First, law is typically the means by which policymakers define test policy. In Chapter 2 we discussed the principal objectives of federal, state, and local test policy, mentioning some of the statutes that aim to advance these objectives. This chapter looks at the second role that law can play with regard to testing: as a source of rules that define the circumstances in which test use may be discriminatory or otherwise inappropriate. Many of these rules are rooted in the U.S. Constitution, federal civil rights statutes, and judicial decisions. And although a comprehensive treatment of state law is beyond the scope of this report, many of the issues discussed are affected in significant ways by state law.

In terms of the committee's congressional mandate, the law constitutes one set of norms relevant to whether existing or new tests are used in a discriminatory manner or inappropriately for student promotion, tracking, or graduation. Legal considerations also play a part in discussions of how best to measure the reading and mathematics achievement of English-language learners and students with disabilities and whether to include them in large-scale assessments.

This chapter describes the legal frameworks that apply generally when tests have high-stakes consequences for students and considers how courts have applied these principles to situations involving student tracking,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 50
--> 3 Legal Frameworks Law plays a dual role as far as educational tests are concerned. First, law is typically the means by which policymakers define test policy. In Chapter 2 we discussed the principal objectives of federal, state, and local test policy, mentioning some of the statutes that aim to advance these objectives. This chapter looks at the second role that law can play with regard to testing: as a source of rules that define the circumstances in which test use may be discriminatory or otherwise inappropriate. Many of these rules are rooted in the U.S. Constitution, federal civil rights statutes, and judicial decisions. And although a comprehensive treatment of state law is beyond the scope of this report, many of the issues discussed are affected in significant ways by state law. In terms of the committee's congressional mandate, the law constitutes one set of norms relevant to whether existing or new tests are used in a discriminatory manner or inappropriately for student promotion, tracking, or graduation. Legal considerations also play a part in discussions of how best to measure the reading and mathematics achievement of English-language learners and students with disabilities and whether to include them in large-scale assessments. This chapter describes the legal frameworks that apply generally when tests have high-stakes consequences for students and considers how courts have applied these principles to situations involving student tracking,

OCR for page 50
--> promotion, and graduation.1 The first section considers issues of discrimination on the basis of race, national origin, or sex; it includes discussion of English-language learners. The second section explores other circumstances in which courts have invalidated tests having high stakes for students, either because students have received insufficient notice of test requirements or because the test measures knowledge and skills that students have not been taught. The third section describes testing requirements under the Improving America's Schools Act of 1994, which amends Title I. Chapter 8 considers the legal rights of students with disabilities under federal law. Several general points are worth noting from the outset. First, the standards of testing professionals—who practice the science of psychometrics (see Chapter 4)—are often invoked in legal challenges to high-stakes testing, and testing programs are more likely to withstand legal challenges if professional standards have been met. Indeed, legal standards and psychometric standards reflect many common concerns, including those of appropriate measurement, proper attribution of cause, and, in some contexts, the educational consequences of test use (National Research Council, 1982). Second, just as different test uses may raise particular legal concerns, so may the use of different kinds of assessments. Performance assessments, for example, may raise certain legal questions that traditional multiple-choice instruments do not (Phillips, 1996b). Finally, there is often no single legal view on what constitutes non-discriminatory or appropriate test use. The U.S. Supreme Court has settled certain questions, but legal rules, like psychometric norms and notions of sound educational policy and practice, are constantly evolving. If the Supreme Court has not resolved an issue, then courts in different jurisdictions may face similar disputes but reach different conclusions, or they may reach similar conclusions but on different grounds. The decision of a lower court is binding only in that court's jurisdiction, although it may influence judges, policymakers, and practitioners elsewhere. 1   Such high-stakes tests are likelier than low-stakes tests to raise legal concerns—if only because, by definition, these high-stakes tests can lead to adverse consequences for individuals. Thus, to the extent that the objectives of a testing program can be achieved through low-stakes test uses, legal problems become less likely.

OCR for page 50
--> Defining Discrimination in the Context of High-Stakes Testing The legal literature reveals several distinct arguments that courts have considered in determining whether the use of a test to make high-stakes decisions about individual students is illegally discriminatory. The outcomes of some cases depend on whether the decision to administer a high-stakes test is based on a present intent to discriminate. Other cases depend on whether a test carries forward or preserves the effects of prior illegal discrimination. A third claim, grounded in federal civil rights statutes and accompanying regulations, employs an "effects test" that considers whether a high-stakes test has a disproportionate, adverse impact; whether the use of a test having such an impact can be adequately justified on educational grounds; and whether there are equally feasible alternative tests that have less disproportionate impact. These legal claims bear directly on whether tests are used in a discriminatory manner for tracking, promotion, or graduation. Each type of claim is therefore considered separately below. Claims of Intentional Discrimination The equal protection clause of the Fourteenth Amendment forbids public employees and entities—including state and local school officials—from engaging in acts of intentional discrimination on the basis of race, color, national origin, or sex (United States v. Fordice, 1992; Personnel Administrator v. Feeney, 1979). Findings of current intentional discrimination have been rare, especially in recent decades; the applicable legal standard is a stringent one, and few courts have been prepared to find that educators are acting out of invidious motives. The plaintiffs' burden cannot be met merely by showing that a policy or practice has a disproportionate, adverse impact on some group, or even by demonstrating that the disproportionate impact was foreseeable or actually foreseen (Washington v. Davis, 1976). Thus, for example, lower courts have refused to find intentional discrimination solely on the basis of evidence showing that high-stakes graduation tests had a disproportionate, adverse impact by race or national origin (Debra P. v. Turlington, 1979; Anderson v. Banks, 1981). According to the Supreme Court, those who allege intentional discrimination must show not only foreseeable, disproportionate adverse impact but also that "the decisionmaker selected or reaffirmed a particular

OCR for page 50
--> course of action at least in part 'because of,' not merely 'in spite of,' its adverse effects" on the group disproportionately affected (Personnel Administrator v. Feeney, 1979:279) (emphasis added).2 Where high-stakes testing programs are concerned, courts have almost uniformly dismissed claims of intentional discrimination. Most often courts have found that there are legitimate, nondiscriminatory educational reasons for adopting such programs. In sustaining high-stakes graduation tests, for example, lower courts have found "no present intent to discriminate" (Phillips, 1991:178), accepting the defendants' view that such tests can help to improve students' educational performance, to identify students who need remedial assistance, and to evaluate the attainment of state educational objectives (Debra P. v. Turlington , 1979; Anderson v. Banks, 1981). This is true even when decisions to deny high school diplomas have been made automatically on the basis of one or more test scores. Similarly, despite legal challenges to tracking, whether based on tests or on other information, there are only a few reported decisions in which courts found that tracking, or student classification more generally, constituted intentional racial segregation (Welner and Oakes, 1996:455). One of these is an older case (Hobson v. Hansen, 1967),3 and one is a decision that an appellate court later reversed in pertinent part (People Who Care v. Rockford Board of Education, 1997). A third such case is Larry P. v. Riles (1984). It involved a challenge to the use of an IQ test as a basis for assigning California public school students to classes for the educable mentally retarded (EMR). Use of the 2   Recognizing that each situation "demands a sensitive inquiry into such circumstantial and direct evidence of intent as may be available," the Supreme Court has identified criteria to aid courts in determining whether a decision maker has acted "because of" the disproportionate adverse effects its policy or practice will have (Village or Arlington Heights v. Metropolitan Housing Development Corp., 1977:266–270). As applied in the testing context, these criteria include (1) whether the test produces a disproportionate, adverse impact on the group that alleges discrimination; (2) whether the test's disproportionate impact was reasonably foreseeable or actually foreseen; (3) whether adoption or administration of the test can be explained on grounds other than an intent to discriminate; (4) whether the historical background of the decision supports a claim of intentional discrimination; (5) whether adoption or use of the test represents a departure from the decision maker's normal policies or procedures; and (6) whether there is direct evidence of intent to discriminate, such as statements evincing discriminatory intent. 3   Current standards for proving intentional discrimination evolved nearly a decade after Hobson v. Hansen, beginning with Washington v. Davis (1976) and other Supreme Court decisions noted above.

OCR for page 50
--> test resulted in the disproportionately high assignment of black students to such classes. A federal district court, affirmed by the U.S. court of appeals, found California's use of the IQ test for EMR placements to be intentionally discriminatory, based on a number of factors. First, state department of education officials had foreseen that the test would have a significant disproportionate impact by race. Second, they had failed "to ascertain or attempt to ascertain the validity of the tests for minority children." Third, "the adoption of [a] mandatory IQ testing requirement was riddled with procedural and substantive irregularities, in which no outside sources were consulted … and … the person who oversaw [test] selection was not an expert in IQ testing." Fourth, the state had failed to use alternative tests that were "less discriminatory than the IQ-centered standard." Fifth, "the [state department of education's] actions revealed a complacent acceptance of [racial differences in intelligence] that was built on easy assumptions about the incidence of retardation or at least low intelligence among black children" (Larry P. v. Riles, 1984:974–976). Sixth, the court regarded EMR classes "as 'dead-end' classes. . . . [A] misplacement in E.M.R. causes a stigma and irreparable injury to the student" (Larry P. v. Riles, 1984:973). A similar case, brought in California on behalf of Hispanic children and involving allegations of linguistic discrimination, was resolved through a settlement entered by a federal district court on June 18, 1973 (Diana v. Board of Education, 1973).4 An Illinois district court, however, reached opposite conclusions when faced with facts similar in many respects to those in Larry P. That court accepted the defendants' contentions (1) that the tests typically used to measure IQ were not racially biased,5 (2) that IQ test scores were only one of several factors used to determine placements, (3) that erroneous placements of black children in EMR classes occurred infrequently and for reasons other than intentional discrimination, (4) that the referral and placement process in Chicago was not carried out hastily, and (5) that EMR classes, rather than being dead ends, were beneficial placements 4   In Larry P. itself, an injunction issued in 1986 prevented California from using IQ tests as any part of the special education process. This injunction was later vacated, leaving in place the prohibition against using IQ tests as a basis for placing black children in EMR classes (Crawford v. Honig, 1994). 5   The court, which examined the IQ tests for item bias and found very little, faulted the Larry P. court for not having examined more thoroughly whether the tests were, in fact, biased.

OCR for page 50
--> to which students had a federal and state legal entitlement (Parents in Action on Special Education (PASE) v. Hannon, 1980:150–164). Although they reach different conclusions, these decisions are consistent in several important respects. First, even under a stringent intent standard, liability findings may turn in part on the extent to which courts believe that educators have complied with generally accepted standards and procedures governing proper test use. In Larry P. and PASE, for example, the outcomes depended in part on such measurement issues as test validity, item bias, and whether educators were relying on single test scores in making student placement decisions. The outcome in Larry P. also turned on an issue of proper attribution of cause, with the court questioning the defendants' claim that black students' IQ scores were an accurate reflection of mental retardation among blacks. Third, the decision in Larry P. and PASE both rest partly on the courts' views' of whether the resulting placements were beneficial or dead ends; both courts were interested in the educational consequences of test use for students (see National Research Council, 1982). More generally, the courts' concern with tracking, remediation, and special education is plainly focused on whether or not students will receive enhanced and effective educational opportunities as a result of the educational intervention. Furthermore, complying with relevant professional testing standards reduces the risk of legal liability for high-stakes assessments.6 Claims that Tests Preserve the Effects of Prior Discrimination The Supreme Court has long held that the Constitution forbids practices that, although seemingly neutral, serve to preserve, or carry forward, the effects of prior illegal school segregation. This suggests that it would be unlawful for school officials to use tests to track minority students, deny them high school diplomas, or retain them in grade if those students' low test scores are traceable to their having attended illegally segregated schools. 6   Based on an analysis of relevant law, Phillips advises state and local education agencies involved in high-stakes testing to "[f]ollow professional standards in all technical matters, including, but not limited to, item development, item selection, validity, reliability, item bias review, equating, scaling, setting passing standards, test security, accommodations, test administration, scoring, and score reporting" (Phillips, 1993a:xxi).

OCR for page 50
--> Such claims were more common in the 1970s and 1980s than they are today. Since there are relatively few school districts that still enroll students who attended illegally segregated schools, this legal claim will be available in relatively few situations.7 The leading court decision on competency testing illustrates a "preserve the effects" approach. In the mid-1970s, Florida had adopted a minimum competency test that students needed to pass in order to receive high school diplomas. The failure rate among black students, 20 percent, was 10 times that for white students. Black high school juniors who had attended illegally segregated schools for the first five grades argued that the test results reflected the discrimination they had suffered and claimed that the diploma sanction served to preserve the effects of the prior illegal segregation. The appeals court agreed, ruling that Florida could begin to withhold diplomas from black students only four years later, when the students taking the test would not have attended illegally segregated schools (Debra P. v. Turlington, 1981; see also Anderson v. Banks, 1981). Courts in several judicial circuits have applied the same principle to many cases involving tracking,8 particularly in the years after initial school desegregation. If a state or school district has had a recent history of segregation or intentional discrimination, judges will scrutinize more closely test-use policies that produce disproportionate adverse impact. Even in formerly 7   It is, however, one legal ground on which Mexican-American students in Texas are currently challenging the use of a state test as the basis for granting or withholding high school diplomas (G.I. Forum v. Texas Education Agency, 1997). 8   In one leading case, an appeals court reviewed the use by a recently desegregated school district of an arrangement under which students were assigned to classes within schools on the basis of teacher evaluations. This produced racially identifiable classrooms at every grade level in virtually every school. The court held that even a neutral student classification system could not be used if "children who have been the victims of educational discrimination in the dual systems of the past … find themselves resegregated in any school … solely because they still wear a badge of their old deprivation—under-achievement" (McNeal v. Tate County School District, 1975). Courts have reached similar results with other student assignment procedures that preserve the effects of past discrimination. Examples include assignment to schools within a previously segregated system on the basis of standardized tests (Singleton v. Jackson Municipal Separate School District, 1970) (en banc), rev'd per curiam on other grounds sub nom. Carter v. West Feliciana Parish School Board, 1970); assignments to classes based on test scores and teacher recommendations (United States v. Gadsden County School District, 1978); and assignments to classes for students with mental retardation on the basis of IQ tests (Hobson v. Hansen, 1967; aff'd sub nom. Smuck v. Hansen, 1969) (en banc).

OCR for page 50
--> segregated school districts, however, there are several arguments that educators may invoke to defend the use of high-stakes tests that have racially disproportionate impact (McNeal v. Tate County School District, 1975). First, it is permissible to use such a test if the state or school district can demonstrate that enough time has passed that the racially disproportionate impact no longer results from prior illegal segregation. Such an argument succeeded in Georgia State Conference of Branches of NAACP v. Georgia, 1985, in which a circuit court allowed racially identifiable within-school student grouping because the black children in low-track classes had begun attending school only after the start of court-ordered desegregation. Second, lower courts have ruled that it is permissible to use a classification mechanism that has disproportionate impact if the classes that are disproportionately minority provide bona fide remedial instruction—that is, if the consequences of tracking decisions are beneficial rather than adverse. Thus the Debra P. court approved remedial education programs for students who had failed Florida's competency test, even though most of the students needing remedial help were black, because it believed that the programs would help remedy the effects of prior illegal segregation. Although the court did not ask whether remedial classes constituted the most effective available placements, it mattered to the Debra P. court whether tracking decisions produced beneficial educational consequences for the students placed. As noted from the outset, claims of this nature are increasingly rare, if only because there are fewer children each year who can show that they themselves attended illegally segregated schools. Nonetheless, recent court decisions, such as Simmons on Behalf of Simmons v. Hooks (1994),9 and the fact that school desegregation cases remain active in many other jurisdictions, suggest that such claims remain viable in some communities. Claims of Disparate Impact Several federal civil rights statutes prohibit recipients of federal funds, including state education agencies and public school districts, from discriminating 9   In Simmons, the district court rejected the arguments of school officials who claimed that low-track placements were educationally beneficial for black children. The court also found no educational justification for grouping classes of children for all subjects ( Simmons on Behalf of Simons v. Hooks, 1994).

OCR for page 50
--> against students. Title VI of the Civil Rights Act of 1964 prohibits discrimination on the basis of race, color, or national origin, including limited English proficiency (Lau v. Nichols, 1974). Title IX of the Education Amendments of 1972 forbids sex discrimination, and two federal civil rights statutes10 (discussed in Chapter 8) prohibit discrimination against students with disabilities. These statutes forbid intentional discrimination against students, as does the Constitution's equal protection clause, but federal regulations go further: they provide that a federal fund recipient may not "utilize criteria or methods of administration which have the effect of subjecting individuals to discrimination."11 In interpreting this Title VI regulation and similar regulations under Title IX, courts have drawn on interpretations of a federal employment discrimination statute, Title VII.12 This method of proving a legal violation is known as a disparate impact claim, and lower courts in many jurisdictions have recognized a three-part legal test for judging such claims (Debra P. v. Turlington , 1981; Larry P. v. Riles, 1984; American Association of Mexican-American Educators v. California, 1996). First, plaintiffs must show by a preponderance of the evidence that some policy or practice, such as the use of a test, has disproportionate adverse impact on a protected group. Whether a test's impact is disproportionate is not always easy to determine;13 generally, it depends on a comparison of the entire pool of test takers with those the test identifies 10   Section 504 of the Rehabilitation Act of 1973, and Title II of the Americans with Disabilities Act of 1990. 11   34 C.F.R. section 100.3(b)(2) [emphasis added]. 12   42 U.S.C. 2000(e) et seq. Courts have generally applied the standards applicable to disparate impact cases under Title VII to disparate impact cases arising under Title VI: Larry P. v. Riles, 1984; accord, New York Urban League, Inc. v. New York, 1995; Elston v. Talladega County Board of Education, 1993; Groves v. Alabama State Board of Education, 1991; Georgia State Conference of Branches of NAACP v. Georgia, 1985. 13   In some cases, the relevant question is whether the mean score for one group was lower than that for another group, or whether members of one group were misclassified at a significantly higher rate than members of another group (Georgia State Conference of Branches of NAACP v. Georgia, 1985). In other cases, courts have had to decide how to account for individuals who were discouraged from taking a test that they alleged was discriminatory: Groves v. Alabama State Board of Education, 1991. If it is impossible to determine the pool precisely, courts typically make informed estimates.

OCR for page 50
--> for some educational placement or treatment.14 If statistical analysis shows that the success rate for members of a protected class is significantly lower (or the failure rate is significantly higher) than what would be expected from a random distribution, then the test has disproportionate adverse impact.15 Even if the plaintiffs can establish disparate impact, the case is not over; rather, the burden of proof shifts to the defendant to justify its policy or practice; according to the Supreme Court, the legal standard of justification is one of educational necessity ( Board of Education of New York v. Harris, 1979:151).16 Federal regulations do not define the term "educational necessity"; some lower courts interpret it to mean that defendants must show "a substantial legitimate justification" for the challenged policy or practice (New York Urban League, Inc. v. New York, 1995; American Association of Mexican-American Educators v. California, 1996), whereas others require proof of a "manifest relationship" between the policy or practice and the defendants' educational objectives (Larry P. v. Riles, 1984; Sharif v. New York State Education Department, 1989). In the testing context, defendants can usually meet their burden of proof by showing that the test in question meets professional standards that apply given the purpose for which the test is being used. Thus, psychometric standards—those that apply generally and those that apply to particular test uses (see American Educational Research Association et al., 1985, 1998; Joint Committee on Testing Practices, 1988)—are also relevant in the legal context. Courts have invoked such standards in upholding or invalidating 14   In American Association of Mexican-American Educators v. California (1996), the plaintiffs argued that the appropriate pool was first-time test takers, whereas the defendants argued that cumulative, rather than first-time, pass rates should be used in determining whether the test had disproportionate adverse impact. The court ruled for the plaintiffs on this issue (American Association of Mexican-American Educators v. California, 1996:31, 38). 15   Another common rule of thumb for assessing disparate impact is set forth in guidelines of the Equal Employment Opportunity Commission (1978); disparate impact is generally found if the success rate of a protected group is less than four-fifths, or 80 percent, of the rate at which the most highly selected group (usually whites or males) is selected (29 C.F.R. section 1607.4(D)). This standard was used in American Association of Mexican-American Educators v. California (1996), which involved a teacher certification test. 16   This requirement is analogous to the "business necessity" requirement under Title VII that employers must show when tests for hiring or promotion have adverse impact.

OCR for page 50
--> particular test uses having disproportionate adverse impact. In a Title IX sex discrimination case, for example, a court invalidated New York's use of the Scholastic Assessment Test (SAT) as a measure of high school achievement, finding that "the SAT was not designed to measure achievement in high school and has never been validated for that purpose" (Sharif v. New York State Education Department, 1989:362). Similarly, a California court upheld the use of a test as part of the teacher certification process once it concluded that the test in question was a "valid, job-related test for the teaching and non-teaching positions in the public schools for which it is a requirement" and that cutoffs or cutscores had been set properly despite the disproportionate impact they produced (American Association of Mexican-American Educators v. California, 1996:1403).17 Applying similar standards in a different context, a court struck down Alabama's use of a fixed cutoff score on the American College Test (ACT) for admission to undergraduate teacher education programs. The court found both that the ACT was not valid for the purpose for which it was being used and that cutscores had been set arbitrarily rather than on the basis of professionally accepted norms (Groves v. Alabama State Board of Education, 1991).18 Thus, under a disparate impact standard, legal liability may depend in part on whether the test raises problems of measurement, which may be the case if the test has not been validated for the particular purpose for which it is being used or has not been validated for all parts of the test-taking population (American Educational Research Association et al., 1998:12;19 Larry P. v. Riles, 1984). It may also depend in part on whether 17   This was based in part on the court's finding that the defendants (1) had reviewed with teacher educators and content experts items to be included on the test; (2) had conducted several content validation studies and job analysis surveys, revising the test to eliminate items that were found not to be job-related; and (3) had set cutoff scores using acceptable standards and procedures (American Association of Mexican-American Educators v. California, 1996:1416–1417, 1420–21). 18   "There is no rational basis, let alone any professional research or study … from which to infer that [students] scoring at or above this level will be competent to teach … while those failing to achieve a 16 will not …" Groves v. Alabama State Board of Education, 1991:1531). 19   Standard 7.1 of the draft standards states that "[w]hen previous research has established a substantial prior probability that test scores may differ in meaning across examinee subgroups, then to the extent feasible, the same forms of validity evidence collected for the examinee population as a whole should also be collected for each relevant subgroup…."

OCR for page 50
--> test users make high-stakes decisions about students based on one test score or on multiple factors; in United States v. Fordice, for example, the Supreme Court rejected Mississippi's exclusive reliance on ACT composite scores in making college admissions decisions because the ACT User's Manual called instead for admissions standards based on ACT subtest scores, self-reported high school grades, and other factors (United States v. Fordice, 1992; see also American Educational Research Association et al., 1985:Standard 8.12; 1998:Draft Standard 13.6). Similarly, whether a particular test use is proper depends in part on making attributions of cause: "It is imperative to account for various 'plausible rival interpretations of low test performance [such as] anxiety, inattention, low motivation, fatigue, limited English proficiency, or certain sensory handicaps' other than low ability" (National Research Council, 1996a:4, quoting Messick, 1989; American Educational Research Association et al., 1998:Draft Standard 16).20 Thus, for example, "if students with limited English proficiency are tested in English—in areas other than language arts—and then classified on the basis of their test scores … [t]his constitutes discrimination under Title VI" (National Research Council, 1996a:4, citing Diana v. State Board of Education, 1970). Finally, the likelihood of an adverse court ruling increases if the consequence of test use is a low-quality program or placement rather than one that is "educationally necessary." For example, using a Title VI disparate impact analysis in Larry P., the district court ruled that, although tests having predictive validity may be the basis for denying a job, "if tests suggest that a young child is probably going to be a poor student,     Draft Standard 7.2 states that "[w]hen the evidence indicates that the test does not measure the intended construct with equal fidelity across subgroups of test takers, the test should only be used for those subgroups for which the intended construct is reasonably well measured." 20   Draft Standard 7.10 states that "[w]hen the use of a test results in outcomes that affect the life chances or educational opportunities of examinees, evidence of mean test score differences between relevant subgroups of examinees should be examined. Where mean differences are found an investigation should be undertaken to determine that such differences are not attributable to a source of construct-underrepresentation or construct irrelevant variance. In educational settings, potential differences in opportunity to learn should be investigated as a source of mean differences." See Chapters 4 to 7 for discussion of how these standards can be met in practice.

OCR for page 50
--> the school cannot on that basis alone deny the child the ability to improve and develop the academic skills necessary to success in our society" (Larry P. v. Riles, 1979:969). Whether a particular educational placement or treatment is beneficial or harmful depends on empirical evidence about that program and, in court, on a judge's interpretation of that evidence. Even if a state or school district can establish that its use of a test is educationally necessary, plaintiffs may nonetheless prevail by showing that there exists "an equally effective alternative that would result in less disproportionality" (Georgia State Conference of Branches of NAACP v. Georgia, 1985:1403). In the testing context, such showings have been infrequent.21 English-Language Learners Title VI covers situations in which educational tests have a disproportionate adverse effect on English-language learners (Lau v. Nichols, 1974). Therefore the general legal principles discussed above apply to them as well. There are complexities, however, and Chapter 9 describes many issues of test validity that can arise when English-language tests are used to assess students whose native language is not English. 22 These include norming bias, content bias, linguistic and cultural biases, and the great difficulty of determining what bilingual students know in their native languages. The challenges involved in validating high-stakes tests for English-language learners raise special concerns about compliance with Title VI. Such difficulties may also raise questions of compliance with two other federal statutes. One is the Improving America's Schools Act of 21   In the Georgia case, the court rejected the argument that heterogeneous grouping was an equally effective alternative to tracking that would result in less disproportionality. The court relied on testimony to the effect (1) that heterogenous grouping would be harmful to higher-achieving students and (2) that "intraclass grouping is not as beneficial as interclass grouping" (Georgia State Conference of Branches of NAACP v. Georgia, 1985:1420). In Sharif v. New York State Education Department (1989), however, the court declared that New York's exclusive reliance on SAT scores in awarding scholarships for high school achievement was illegal, partly because a combination of students' grades and scores had less disparate impact on the basis of sex. 22   See also American Educational Research Association et al. (1998:Draft Standards, section 9).

OCR for page 50
--> 1994, which, as discussed more fully below, amends Title I of the Elementary and Secondary Education Act of 1965. A second federal statute that may be relevant is the Equal Educational Opportunities Act of 1974, which provides, in part, that No State shall deny equal educational opportunity to an individual on account of his or her race, color, sex, or national origin by … (f) the failure by an educational agency to take appropriate steps to overcome language barriers that impede equal participation by its students in its instructional programs. There is no reported decision in which a court has invalidated a high-stakes test use under this statute. Nonetheless, given the difficulties involved in assessing English-language learners, such a claim could be available if tests of questionable validity were used as the basis for making placement or promotion decisions for such students, or if the resulting educational settings were of questionable educational value. Chapter 9 discusses more fully both (1) the challenges of assessing English-language learners validly, particularly when tests have high-stakes consequences for students and (2) what is known about accommodations that may increase the validity of such tests. Due Process Challenges to High-Stakes Tests High-stakes tests may be illegal even if they are not discriminatory. For example, high school graduation tests have been challenged successfully under the due process provisions of the U.S. Constitution (Fifth and Fourteenth Amendments). Such claims usually hinge either on whether students have received sufficient advance notice of high-stakes test requirements or on whether they have been taught the knowledge and skills that a high-stakes test measures. These claims rest on the proposition that students have a constitutionally protected property interest in receiving diplomas (Debra P. v. Turlington, 1981). Adequate Notice One concern, first raised in the context of high-stakes graduation tests, is that school officials must ensure fairness by giving students prior notice of a new high-stakes assessment requirement. In Debra P. v. Turlington (1981), the court found that four years constituted sufficient notice; courts in Georgia and New York have found that two years did not

OCR for page 50
--> constitute adequate notice (Phillips, 1996a). As to the content of such notice (p. 6): [I]t is probably not necessary to communicate specific passing scores ahead of time, [but] students and school personnel should be provided with clear indications of the specific content … and performance for which they will be held accountable. General scoring guidelines and examples that demonstrate attainment of the standards should also be disseminated. Curricular frameworks, assessment specifications, sample tasks, and model answers may also be helpful in communicating expectations. And although the issue has not been litigated to date, similar notice may be called for when states or school districts are adopting new high-stakes tests for promotion. Curricular Validity A second due process requirement concerns what the Debra P. court referred to as "curricular validity": "a state may condition the receipt of a public school diploma on the passing of a test so long as it is a fair test of that which was taught" (Debra P. v. Turlington , 1981:406).23 There have been disagreements over how educational entities can demonstrate that a test measures what students have been taught. Some argue that it is sufficient for a state or school district to show that the formal written curriculum mentions the knowledge and skills that the test is designed to measure. Others assert that what matters most is not the formal written curriculum but the actual curriculum and instruction in each classroom (Madaus, 1983)—that instructional rather than curricular validity is required. The Debra P. court accepted something in between: evidence that the test measured skills included in the official curriculum coupled with a showing that most teachers considered the skills to be ones they should teach (Debra P., 1983:186). Similar evidence may be called for when a high-stakes test for promotion is involved. The matter of curricular or instructional validity has several important implications for high-stakes testing of individual students. First, to the extent that new assessments designed to induce changes in curriculum and instruction are used for high-stakes purposes, there is a danger 23   Conceptually, a claim that students have not been taught what the test measures is similar to a claim that students have been denied a fair opportunity to learn.

OCR for page 50
--> that the new instruments will lack the curricular or instructional validity that the Constitution requires. This is an important point, of which educators and policymakers must be aware as they design and implement new assessments. Use of the proposed voluntary national test for high-stakes purposes, although not recommended by the U.S. Department of Education, would almost certainly raise questions of this kind, if only because it would take time for states and school districts to align their curricula and their teaching with the requirements of a national test. Policymakers who wish to use tests for high-stakes purposes must therefore allow enough time for such alignment to occur. The time needed, probably several years, would in practice depend on several factors, including the extent of the initial discrepancy and the availability of resources needed to bring curriculum and instruction into alignment with the new standards. A second concern, potentially at odds with the first, is that administrators and teachers, wishing to ensure curricular and instructional validity, may teach students the very material that is on the test. "[I]f test exercises are used in instruction, the usefulness of the test as an instrument for measuring student achievement is destroyed … [and if] there is too close a match between the instructional materials and the test, 'the capacity to measure such important constructs as the understanding of a topic may be lost'" (Linn, 1983:127). According to Linn, "the challenge [is] to convince the courts that knowledge was taught—without precluding the possibility of measuring it" (Linn, 1983:129). The fine line between what is required and what is impermissible—coupled with the existence of incentives to boost student scores by "teaching to the test"—suggests the need for careful policymaking, teacher training, and test security measures. Requirements of the Improving America's Schools Act The Improving America's Schools Act of 1994 made major changes in Title I, which serves millions of low-achieving students, chiefly though not entirely at the elementary level. Among the most important modifications are new requirements relating to testing and accountability. "[S]tates will need to develop their own assessments for Title I and ensure that they are aligned with challenging state standards for content and performance linked to state reforms affecting all students" (National Research

OCR for page 50
--> Council, 1996b:vii). The stated purpose of the change in federal law is "to enable schools to provide opportunities for children served to acquire the knowledge and skills contained in the challenging content standards and to meet challenging state performance standards for all children (Improving America's Schools Act, 20 U.S.C. section 6301 (d), 1994)." It requires that Title I students receive "accelerated," "enriched," and "high-quality" curricula, "effective instructional strategies," "highly qualified instructional staff," and "high-quality" staff development24 (Weckstein, in press). Under the new law, states had until the 1997–1998 school year to set content and performance standards, and they still have until the 2000–2001 school year to adopt new systems of assessment. There is wide recognition that "creating a new Title I testing system is one of the most demanding aspects of the new law" (National Research Council, 1996b: vii). The assessments (p. 1-2): must be administered at some time during grades 3 through 5, 6 through 9, and 10 through 12 . . . . Moreover, such assessments must also (1) be used only for purposes for which they are valid and reliable; (2) be consistent with nationally recognized professional and technical standards; (3) to the extent practicable, assess limited-English proficient children in the language and form most likely to yield accurate and reliable information on what such students know and can do; [and] (4) make reasonable adaptations for students with diverse learning needs. Moreover, states will have to define what constitutes acceptable yearly progress for Title I students and work with school districts to take corrective action regarding schools and teachers (not students) when progress is insufficient. It remains to be seen how states will satisfy the many objectives of Title I testing. The tests used to assess Title I students may become the subject of legal challenges if they do not meet the requirements of the Improving America's Schools Act. Questions have arisen, for example, about whether the proposed voluntary national tests will satisfy requirements governing assessment of Title I students who are English-language learners or students with disabilities (Hoff, 1997). 24   Improving America's Schools Act, 20 U.S.C. sections 6314(b)(1),(a), 6315(c)(1), 6320 (a)(1), 1994.

OCR for page 50
--> Conclusion and Implications In reviewing the circumstances under which the law may define certain high-stakes test uses as discriminatory or inappropriate, we have relied in part on psychometric definitions of appropriate test use. Subsequent chapters of this report discuss more fully what constitutes psychometrically appropriate use of tests for student tracking, promotion, and graduation. Because psychometric issues can play an important role in the determination of test legality under federal civil rights law, it is appropriate to consider here the advice that the National Research Council's Board on Testing and Assessment (BOTA), through a letter report, has offered the U.S. Department of Education's Office for Civil Rights as it drafts its own standards on fairness in testing. The letter report notes that "establishing the validity of test scores as a basis for classifying students and placing them in different educational programs poses [a] … formidable challenge" (National Research Council, 1996a:3). It goes on to point out that "[t]he inferences regarding specific test uses are validated, not the test itself . . . . Because of the importance of linking test design to specific test uses, validation must be designed to provide evidence that test results provide a sound basis for inferences and action. Test validation is often costly, but it is a critical undertaking" (National Research Council, 1996a:5, citing Office of Technology Assessment, 1992). More generally, the letter report states that, in reviewing the use of tests with disparate impact to classify students, the Office for Civil Rights "should make a determination not only about the test itself but about whether the entire process for classifying students is fair and nondiscriminatory and whether students are being provided an equal opportunity to learn" (National Research Council, 1996a:3). 25 In the committee's view, although litigation over test use has not been common, the increasing reliance on high-stakes tests as an instrument of school reform could lead to new legal challenges by individuals and groups who are adversely affected by test outcomes. Given the need to establish the validity of these high-stakes uses, including the need to 25   This is consistent with Draft Standard 7.10 (1998:16), which states that "[i]n educational settings, potential differences in opportunity to learn should be investigated as a source of mean differences."

OCR for page 50
--> show that such tests are a fair measure of what has been taught, it is essential that educators and policymakers alike be aware of both the letter of the laws and their implications for test takers and test users. References American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1985 Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association. 1998 Draft Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association. Hoff, D. 1997 National tests, Title I at odds on language: Some experts see mismatch in policy. Education Week, October 22, 1997:1,15. Joint Committee on Testing Practices 1988 Code of Fair Testing Practices in Education. Washington, DC: National Council on Measurement in Education. Linn, R.L. 1983 Curricular validity: Convincing the court that it was taught without precluding the possibility of measuring it. Pp. 115–132 in The Courts, Validity, and Minimum Competency Testing, G. Madaus, ed. Boston: Kluwer-Nijhoff Publishing. Madaus, G., ed. 1983 The Courts, Validity, and Minimum Competency Testing. Boston: Kluwer-Nijhoff Publishing. Messick, S. 1989 Validity. Pp. 13–102 in Educational Measurement, 3rd Edition, R.L. Linn, ed. New York: American Council on Education and Macmillan Publishing Company. National Research Council 1982 Placing Children in Special Education: A Strategy for Equity, K.A. Heller, W.H. Holtzman, and S. Messick, eds. Committee on Child Development Research and public Policy, National Research Council. Washington, DC: National Academy Press. 1996a Letter Report from Richard Shavelson, Chair, Board on Testing and Assessment, to Norma Cantu, Assistant Secretary of Education for Civil Rights (June 10, 1996). Washington, DC: National Academy of Sciences. 1996b Title I Testing and Assessment: Challenging Standards for Disadvantaged Children, J. Kober and M. Feuer, eds. Board on Testing and Assessment. Washington, DC: National Academy Press. Office of Technology Assessment 1992 Testing in American Schools: Asking the Right Questions. OTA-SET-519. Washington, DC: U.S. Government Printing Office.

OCR for page 50
--> Phillips, S.E. 1991 Diploma sanction tests revisited: New problems from old solutions. Journal of Law and Education 20(2):175–199. 1993a Legal Implications of High-Stakes Assessment: What States Should Know. Oak Brook, IL: North Central Regional Educational Laboratory. 1993b Legal issues in performance assessment. Education Law Reporter 79: 709–738. 1996a Legal defensibility of standards: Issues and policy perspectives. Educational Measurement and Practice 5, Summer 1996:5–14. 1996b Legal defensibility of standards: Issues and policy perspectives. Proceedings of the Joint Conference on Standard Setting for Large-Scale Assessments, September 1996:379–398. Weckstein, P. in press School reform and enforceable rights to an adequate education. In Law and School Reform: Six Strategies for Promoting Educational Equity , J. Heubert, ed. New Haven: Yale University Press. Welner, K., and J. Oakes 1996 (Li)Ability grouping: The new susceptibility of school tracking systems to legal challenges. Harvard Educational Review 66(3):451–470. Legal References American Association of Mexican-American Educators v. California, 937 F. Supp. 1397 (N.D. Cal. 1996). Anderson v. Banks, 540 F. Supp. 472 (S.D. Ga. 1981). Board of Education of New York v. Harris, 444 U.S. 130 (1979). Crawford v. Honig, 37 F.3d 485 (9th Cir. 1994). Debra P. v. Turlington, 474 F. Supp. 244 (M.D. Fla. 1979); aff'd in part and rev'd in part, 644 F.2d 397 (5th Cir. 1981); rem'd, 564 F. Supp. 177 (M.D. Fla. 1983); aff'd, 730 F.2d 1405 (11th Cir. 1984). Diana v. State Board of Education, No. C-70-37 RFP, Consent Decree (N.D. Cal. June 18, 1973). Equal Educational Opportunities Act of 1974, 20 U.S.C. sections 1703 et seq. Equal Employment Opportunity Commission, Uniform Guidelines on Employment Selection Procedures, 29 C.F.R. sections 1607 et seq. (1978). Elston v. Talladega County Board of Education, 997 F.2d 1394 (11th Cir. 1993). Georgia State Conference of Branches of NAACP v. Georgia, 775 F.2d 1403 (11th Cir. 1985). G.I. Forum v. Texas Education Agency, C.A. No. SA97CA1278, Complaint (W.D. Tex. October 14, 1997). Groves v. Alabama State Board of Education, 776 F. Supp. 1518 (M.D. Ala. 1991). Hobson v. Hansen, 269 F. Supp. 401 (D.D.C. 1967), aff'd sub nom. Smuck v. Hansen, 408 F.2d 175 (D.C. Cir. 1969) (en banc). Improving America's Schools Act of 1994. Individuals with Disabilities Education Act, 20 U.S.C. section 1401 et seq. Larry P. v. Riles, 495 F. Supp. 926 (N.D. Cal. 1979); aff'd, 793 F.2d 969 (9th Cir. 1984). Lau v. Nichols, 414 U.S. 563 (1974).

OCR for page 50
--> McNeal v. Tate County School District, 508 F.2d 1017 (5th Cir. 1975). New York Urban League, Inc. v. New York, 71 F.3d 1031 (2d Cir. 1995). Parents in Action on Special Education (PASE) v. Hannon, 506 F. Supp. 831 (N.D. Ill. 1980). People Who Care v. Rockford Board of Education, 111 F.3d 528 (7th Cir. 1997). Personnel Administrator v. Feeney, 442 U.S. 256 (1979). Quarles v. Oxford Municipal Separate School Dist., 868 F.2d 750 (5th Cir. 1989). Section 504 of the Rehabilitation Act of 1973, 29 U.S.C. sections 794 et seq. Sharif v. New York State Education Department, 709 F. Supp. 345 (S.D.N.Y. 1989). Simmons on Behalf of Simmons v. Hooks, 843 F. Supp. 1296 (E.D. Ark. 1994) Singleton v. Jackson Municipal Separate School District, 419 F.2d 1211 (5th Cir.) (en banc), rev'd per curiam on other grounds sub nom. Carter v. West Feliciana Parish School Board, 396 U.S. 290 (1970). Title I, Elementary and Secondary Education Act, 20 U.S.C. sections 6301 et seq. Title II, Americans with Disabilities Act of 1990, 42 U.S.C. sections 12131 et seq. Title VI, Civil Rights Act of 1964, 42 U.S.C. sections 2000(d) et seq. Title VI Regulations, 34 C.F.R. sections 100 et seq. Title VII, Civil Rights Act of 1964, 42 U.S.C. sections 2000(e) et seq. Title IX, Education Amendments of 1972, 20 U.S.C. sections 1681 et seq. U.S. Constitution, Amendments V and XIV. United States v. Fordice, 505 U.S. 717 (1992). United States v. Gadsden County School District, 572 F.2d 1049 (5th Cir. 1978). Village of Arlington Heights v. Metropolitan Housing Development Corp., 429 U.S. 252 (1977). Washington v. Davis, 426 U.S. 229 (1976).