Assessment Policy and Politics
Current concerns about proper test use represent only the latest round in a continuing debate over the use of standardized assessments to advance education policy goals.1 Beginning with the introduction in the mid-19th century of written examinations given to large numbers of students, standardized tests have served as an instrument for accomplishing a variety of policy purposes, including determining the types of instruction individual students receive, shaping the content and format of that instruction, and holding schools and students accountable for their performance.
Standardized tests are believed to be one of the most powerful levers that elected officials and other policymakers have for influencing what happens in local schools and classrooms. A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett and Wilson, 1991; Madaus, 1988; Herman and Golan, 1993; Smith and Rottenberg, 1991), although such changes may or may not
improve student learning (Mehrens, 1998). Furthermore, compared with other interventions, standardized tests are inexpensive. Now required by all three levels of government, tests have become a central feature of American public schooling.
At the same time, some testing experts and others concerned about the effects of inappropriate test use caution against using tests to promote broader policy goals. They warn that, if test scores are used to bestow rewards or impose sanctions, there are several risks: widening the gap in educational opportunities between haves and have-nots, narrowing the curriculum, centralizing educational decision making, and deprofessionalizing teachers (Haertel, 1989; Airasian, 1987).
Two Persistent Dilemmas
The tension between the enthusiasm of policymakers and the caution of experts is symptomatic of two fundamental dilemmas posed by standardized tests when they are used as policy strategies. First, policy and public expectations of testing generally exceed the technical capacity of the tests themselves. One of the most common reasons for this gap is that policymakers, under constituent pressure to improve schools, often decide to use existing tests for purposes for which they were neither intended nor adequately validated. So, for example, tests designed to produce valid measures of performance only at the aggregate level—for schools or classrooms—are used to report on and make decisions about individual students. In such instances, serious consequences (such as retention in grade) may be unfairly imposed on individual students. That injustice is further compounded if the skills being tested do not reflect or validly measure what students have been taught.
Policymakers sometimes acknowledge these problems and the need for more research. Nevertheless, they often choose to rely on an available test because they see only a fleeting opportunity for action, or because they believe that, even with imperfect tests, more good than harm will be done. From this perspective, technical constraints are problems that should be remedied to the extent possible, but in an iterative fashion simultaneous with the implementation of the test-based policy (McDonnell, 1994).
In one recent case in point, Paul G. Vallas, the chief executive officer of the Chicago Public Schools, decided to continue use of the nationally norm-referenced Iowa Test of Basic Skills (ITBS) to identify low-performing schools and students, even though it has not been validated for
that purpose. He agrees with researchers who argue that the ITBS should be replaced with a test directly linked to the city's academic standards. Vallas noted, however, that developing such a test would take three years; in the meantime, the ITBS will continue to be used for accountability (Olson, 1998).
Moreover, Philip Hansen, chief accountability officer for the Chicago Public Schools, told the committee that "we are committed to use the Iowa forever and ever." He went on to explain that, if the district were to drop the ITBS, it would lose credibility with the media and the public, who would view with suspicion any change to a new test. The new assessments, he said, would probably be used as course midterms and finals and be factored in as one component of a student's course grade.
The second dilemma stems from tensions between two motives for testing: the desire for more fairness and efficiency and the impulse to sort and classify students. Achievement testing first became a fixture of American public schools during the huge growth in mass education between 1870 and 1900, when enrollments more than doubled as waves of immigrants created a newly diverse student population. Demand grew for more efficient school management, including "the objective and efficient classification, or grading of pupils" (Tyack, 1974:44). Relying on tests was seen as fairer and more efficient than the prevailing system, in which children of varying ages and levels shared classrooms, and essay exams received widely varying grades from different teachers (Office of Technology Assessment, 1992; Haney, 1984).
The introduction of widespread intelligence testing during World War I allowed schools to begin measuring what testers believed to be students' aptitude for future learning, the IQ, in addition to using achievement tests to measure their past learning.2 As the technology of intelligence
or IQ tests developed, they were quickly adopted by schools nationwide and became an entrenched component of educational administration. Fass (1980:446) explains their appeal:
The IQ grew out of the many issues and concerns facing American society in the early century. It was almost inevitable that it be adopted by the schools, which were the arena in which these problems were played out and which were also expected to solve them. The IQ established a meritocratic standard which seemed to sever ability from the confusions of a changing time and an increasingly diverse population, provided a means for the individual to continue to earn his place in society by his personal qualities, and answered the needs of a sorely strained school system to educate the mass while locating social talent.
As well-intentioned as some motivations for the IQ and other tests may have been, they were not actually measures of innate ability, and their use sometimes caused harm. In their worst manifestations, the uses were racist and xenophobic. In the early part of the century, prominent scientists argued on the basis of test results that blacks and immigrants from Southern and Eastern Europe were mentally inferior, with these pronouncements contributing to laws restricting immigration from countries assumed to be sources of inferior mental stock (Haney, 1984). Later, tests were used by Southern schools resisting desegregation, as a way to resegregate black students into lower tracks (Office of Technology Assessment, 1992).
The misuse of test data in policy debates continues today. The publication of The Bell Curve, arguing that social and economic inequality among racial and ethnic groups can be explained by differences in intelligence as measured by tests, is a recent example (Hernnstein and Murray, 1994). Despite detailed critiques of the authors' statistical analysis, their conception and measurement of intelligence, and their explanations of the causes of inequality (Fischer et al., 1996), the book fueled a highly charged, racialized debate. No policy actions can be attributed directly to inferences that the authors drew from their analysis of test data. Nevertheless, Hernnstein's and Murray's argument that the sources of inequality are largely immutable has served as a rationale for those seeking to limit education and social welfare policies aimed at reducing inequalities.
Neither the gap between expectations and capacity nor past misuses of tests mean that we should give up on testing as an education policy strategy. As Fischer and his colleagues (1996) note, the history of testing shows that, although it has been used for discriminatory purposes, testing has also been a tool for equalizing educational opportunities. Wisconsin's
use of standardized aptitude tests to encourage students to apply to college was mentioned in the preceding chapter. Similarly, many of the country's leading state university systems admit all students with a minimum grade-point average, but they also enable those with averages below the cutoff to apply based on their standardized admissions test scores. This type of "second pathway" is quite commonplace. As Donald Stewart, the president of the College Board, argued in a recent letter to the New York Times, "More than 50 million college applicants have taken the SAT since 1926, and most have arrived on campus, including millions of disadvantaged students who had often been excluded in the past" (May 8, 1998).
Furthermore, policymakers, who have few instruments at their disposal to affect schools directly, are unlikely to abandon a tool potentially as powerful as tests simply because people sometimes use them badly. The challenge for the policy community, then, is to make decisions about test use that allow them to pursue their broader objectives within a constrained political environment, staying mindful of both the limitations of any given test and its capacity to influence classroom behavior and students' educational opportunities. In the remainder of this chapter, we survey the range of assessment policies and describe the political context in which contemporary debates over appropriate test use are occurring.
Testing as a Policy Instrument
Current federal, state, and local policies use student assessments for seven distinct purposes, with the same test often serving multiple functions.3 The first is aiding in instructional decisions about individual students. For example, teachers may use test results in grouping students or in identifying areas in which particular students need additional or different instruction. But some form of standardized diagnostic test is typically used as one basis for deciding whether students are eligible for services provided by a variety of programs, including those related to the Individuals with Disabilities Education Act and state programs for students with disabilities, state and federal bilingual education programs for English-language learners, and the federal and state compensatory education programs for poor, underachieving students. Testing is thus used to
allocate an educational benefit and to decide what form that benefit should take in a student's program.
The second purpose is providing information about the status of the education system. One such test is the National Assessment of Educational Progress (NAEP). Since its inception in 1969, NAEP has served as "the nation's report card," periodically assessing a nationally representative sample of students, ages 9, 13, and 17, in several core academic subjects, with additional subjects being tested on a rotating basis. NAEP reports on achievement trends over time and across different subgroups. Over the past decade, NAEP has also included representative samples for 44 states, so that these states can compare their students' performance to the national sample. To provide a context for interpreting information about student achievement, NAEP also surveys students, their teachers, and school administrators about their backgrounds and the teaching in their schools.4 Similarly, 46 states administer standardized assessments in three to five core subjects and publicly report the results, usually disaggregated to the school building level. The purpose of these assessments is to inform the public about how well the schools and students in their communities are performing over time and compared with those in other places.
Closely linked to assessments documenting the status of the educational system is a third function: tests as motivation for change. In a study of policymakers' expectations about the effects of student assessments, federal and state officials said they hoped test results would "shake people up." One respondent spoke of state policymakers who see assessments as a way to "embarrass people into change." Still others felt that, if assessments were tied to specific performance standards, even parents in affluent communities would be surprised to find that their children were not learning as much as they had assumed (McDonnell, 1994:9). But when policymakers in this study talked about the motivational purpose of assessments, they typically had parents, not students, in mind. Assessment results were seen as a way to influence parents to take action to improve the quality of local schools.5
Standardized tests play a fourth policy role in program evaluations. Because many educational interventions are intended to produce improved achievement, the results of standardized tests, administered to participants before and after the intervention, constitute a critical indicator of program effectiveness. The most widespread use of standardized testing in program evaluation is in the federal Title I program, first enacted as part of the Elementary and Secondary Education Act. For over 20 years, the law has required that local districts test Title I students yearly and report the results. Local districts are expected to use these data in making their programs more effective, and trend data aggregated across states and districts inform congressional deliberations each time the program is reauthorized. The evaluation requirements in Title I and other federal and state programs are a major factor in explaining the growth of local testing systems.
A fifth function of assessment is to hold schools, as public institutions, and educators accountable for student performance. Standardized tests are an integral part of this process. Providing information to the public about school performance is one aspect of accountability. But 23 states now attach consequences at the school level to assessment results, such as funding gains and losses, warnings, assistance from outside experts, loss of accreditation, and, in a few places, the eventual state takeover of schools (Bond et al., 1996).
These five purposes offer examples of low- and high-stakes tests that represent two fundamentally different ways of using testing in the service of policy goals. A low-stakes test has no significant, tangible, or direct consequences attached to the results, with information alone assumed to be a sufficient incentive for people to act. The theory behind this policy is that a standardized test can reliably and validly measure student achievement; that politicians, educators, parents, and the public will then act on the information generated by the test; and that actions based on test results will improve educational quality and student achievement. In contrast, high-stakes policies assume that information alone is insufficient to motivate educators to teach well and students to perform to high standards. Hence, it is assumed, the promise of rewards or the threat of sanctions is needed to ensure change. Rewards in the form of financial
students are ones used to certify that students have attained particular levels of mastery and that have personal consequences attached. We discuss this type of testing as a final policy purpose of assessment.
bonuses may be allocated to schools or teachers; sanctions may be imposed through external oversight or takeover by higher-level authorities.
In a sixth policy use, testing acts as a lever to change classroom instruction and may be implemented with either a high- or a low-stakes assessment. Although standardized tests have long been used as an education reform strategy for changing classroom instruction, this use has become more central with the advent of the standards-based reforms now promoted by states and the federal government. This movement seeks to improve educational quality by setting high content standards that define the knowledge and skills that teachers should teach and students should learn, and by holding educators and teachers accountable for meeting performance standards that set the expectations for proficiency. It assumes that educators and the public can agree on a set of curricular values; that those values can be translated into a set of standards; and that assessments can measure how well students perform on the standards (National Research Council, 1997).
About half the states have revamped their assessment systems over the past decade to align them more closely with specific content and performance standards, and most of the rest are in the alignment process or planning to begin it. In an effort to increase the authenticity of tasks on assessments, many states have also diversified their testing format beyond a sole reliance on multiple-choice items; 34 now require writing samples of tested students, and 10 include constructed, open-response items (Bond et al., 1996).
Most standards-based assessments have only recently been implemented or are still being developed. Consequently, it is too early to determine whether they will produce the intended effects on classroom instruction. A recent review of the available research evidence by Mehrens (1998) reaches several interim conclusions. Drawing on eight studies that used teacher surveys, classroom observations, or analysis of teachers' classroom assignments to examine the implementation of new assessments in six states, he found that, if stakes are high enough and if teachers deem the content appropriate, curriculum and instruction are likely to change to reflect more closely the content sampled by the test. If the stakes are low, if teachers believe that the test is measuring developmentally inappropriate content, or if teaching consistent with the assessment would reduce the amount of time teachers could spend on what they consider to be more important content, then the assessment apparently has less impact on teaching and curriculum.
The effects of standards-based assessments on practice depend not only on teachers' willingness to teach what is being tested but also on their capacity to do so. The curriculum standards now being adopted by many states, however, expect teachers to teach very different material in ways that are fundamentally different from their accustomed practice. In most cases, teachers have not been adequately prepared for the reforms. Therefore, even those who are willing to change may lack a sufficient understanding of what the reforms require and may filter their teaching innovations through traditional approaches to instruction (Cohen and Peterson, 1990). As a result, tests are likely to produce only modest effects as incentives for curricular change without a considerable investment in teacher training.
The seventh policy use for standardized tests is certifying individual students as having attained specified levels of achievement or mastery. These are high-stakes uses with rewards to individual students: special diplomas, graduation from high school, or promotion to the next grade. Sanctions typically consist of the withholding of those rewards or benefits. Currently, 18 states require that students pass an exit examination before they graduate from high school, and 4 offer honors diplomas on the basis of examination results (Bond et al., 1996). The requirement that students pass a test as a condition for high school graduation is typically imposed by states, but some local districts use tests to decide whether students should be promoted to the next grade. Although there are no national data summarizing how local districts use standardized tests in certifying students, we do know that several of the largest school systems have begun to use test scores in determining grade-to-grade promotion (Chicago) or are considering doing so (New York City, Boston). In addition, in a survey of 85 of the largest school systems, the American Federation of Teachers (AFT) found that, at the elementary level, 39 percent of the districts use standardized test scores, usually in combination with other information, in deciding whether to retain a student in grade. But both teacher-assigned grades and teacher recommendations were reportedly used at the elementary level in a higher proportion of districts: 48 percent. These other factors become even more significant in the higher grades, so that in high school, teacher-assigned grades are used in 65 percent of the districts and standardized tests in only 24 percent (American Federation of Teachers, 1997). With only a few notable exceptions, such as Chicago, districts typically use multiple indicators in making promotion decisions.
Current testing of students for high-stakes decisions is the latest version of a policy strategy that began with the state minimum competency tests implemented between 1975 and 1985. These tests were a response to public concerns about students leaving school without basic reading and mathematics skills, combined with a widespread perception that educational quality had declined (Office of Technology Assessment, 1992). Minimum competency tests, which coincided with the "back to basics" movement of the 1970s, typically tested students on basic literacy and numeracy skills considered essential in life. The tests were often calibrated to measure the skills expected of an 8th grader, and they required that students attain a specific score to pass.
Many of the legal challenges to testing that have arisen over the past 20 years were prompted by the minimum competency movement. These tests raise due process and equal protection issues when they serve a gatekeeping function, such as determining whether students can graduate from high school. One of the most important cases posing these questions grew out of a challenge to Florida's minimum competency test. In the case of Debra P. v. Turlington (1981), a U.S. court of appeals ruled that if a high school graduation test covers material not taught to the students, then it is unfair and violates the Fourteenth Amendment to the U.S. Constitution. These and other legal issues related to the use of standardized tests are discussed in Chapters 3, 8, and 9.
Research has found the effects of minimum competency tests to be mixed (Chapter 7). Newer assessments aligned with more rigorous content standards are in part a response to the shortcomings of these earlier tests. States can now usually point to test score gains, particularly in the proportion of students passing high school graduation tests between the time they are first tested in grade 8 or 9 and when they must finally pass the test in grade 12. Scholars (e.g., Koretz et al., 1991) have questioned these gains, however, observing that score increases could be due in part to "teaching to the test," whereby students are drilled on questions that mirror those on the actual test (Mehrens, 1998).
Much has been written about the narrowing effect of minimum competency tests on the curriculum and on the drill-and-practice instruction that it encourages. These outcomes result from a combination of the low-level skills tested and the policy assumption that a student's failure on the test is the school's responsibility to remedy (Office of Technology Assessment, 1992). In states in which policymakers believe that schools should certify student mastery of required skills and that tests can adequately
gauge such mastery, the response to the shortcomings of minimum competency tests has been to design tests that measure higher-order analytical skills. Although most high school exit exams still measure basic skills, the trend is toward more difficult and sophisticated tests. Some states, including Maryland and New York, are now implementing high school graduation assessments tied to demanding state standards and requiring greater mastery of more complex skills. A few other states, including North Carolina, are moving toward requiring high school students to pass standardized, statewide end-of-course exams in all the subjects—algebra, English, U.S. history, and so on—needed for high school graduation, rather than passing a single exit exam. Some states have postponed implementation of tests carrying high stakes for students until they have put in place systems of accountability for schools and educators.
The close links between education policy and testing are clear. Although standardized tests are merely measurement tools to obtain information about student and school performance, they have come to function also as symbols. So, for example, assessments are often portrayed as synonymous with accountability policies or with high school graduation requirements, even though the imposition of rewards and sanctions constitutes the core of these policies, and the test results merely inform decisions about their allocation.
Precisely because of the tight connection between testing and policy, standards for proper test use are essential. All seven policy uses require that assessments measure student performance consistently across tasks (reliability), that the scores are meaningful and reflect the domains being measured (validity), and that the meaning of the test scores does not differ across individuals, groups, or settings (fairness). These standards are explored in Chapters 4 through 9.
Meeting these standards is both more important and more problematic when a test is used for high-stakes purposes, particularly if it involves consequences for individuals rather than institutions. If students are not afforded the opportunity to learn the content on which they are tested (a growing possibility as curriculum and performance standards are raised), or if tests are not interpreted consistently from one locale to another (as is often the case in decisions about special education placement), then testing can create new inequalities or exacerbate existing ones.
Avoiding this outcome may be difficult, however. Politicians are elected to solve problems, and often that means acting with the tools available under severe time pressures and fiscal and political constraints.
The result is that tests are used for purposes for which they were not intended. In such cases, the outcome for individuals may be unfair. Moreover, the tests themselves may be corrupted as valid and reliable measuring devices (Linn, 1998).
Current Policy Landscape
As student assessment becomes a more prominent part of education reform strategies, several trends stand out as having significant implications. One is the goal of including most, if not all, students in assessment systems. A variety of recent policy initiatives aims to test even those students who were previously exempted from common assessments or who were tested with alternative instruments. If all students are included in the same assessment system, it is assumed, system accountability will be greater, particularly for students who have often been shortchanged in their schooling. Including more students in large-scale assessments, however, does not necessarily mean that all students will be subject to the high stakes that some states and school districts attach to scores on such tests.
At the federal level, Title I of the Elementary and Secondary Education Act and the Individuals with Disabilities Education Act of 1997 are being used as levels to ensure that students participating in these programs take tests that incorporate the same content and performance standards that apply to other students. They are also to be included in state assessment systems, and the states are to determine whether local districts and schools are helping these students make adequate yearly progress toward meeting the common standards. This strategy, in effect, combines several policy purposes of assessment: program evaluation, school-level accountability, and changing classroom instruction. Federal law does not, however, require that all students be subject to high-stakes test requirements.
Policy discussions have focused mainly on whether a standards-based strategy will work for all students, what testing accommodations are needed, and how test scores should be reported. The question of how tests are used is likely to become especially salient in this context, because many of the students who will be included in expanded assessment systems are English-language learners or students with disabilities. For these students, it is important to ensure that the tests truly measure their achievement and are not corrupted by language barriers or lack of appropriate modifications. Appropriate test use for these students, as for all
students, requires that their scores not lead to decisions or placements that are educationally detrimental.
The connection between assessment as a reform strategy and appropriate test use has been joined in the debate over voluntary national tests (VNTs). The Clinton administration, in proposing the development of national assessments to test 4th graders in reading and 8th graders in mathematics, argued that by testing students in two critical subjects at two critical grades, using national standards, parents would know how their own children were doing, and policymakers, educators, and the public would know how well their schools were performing. The underlying assumption was that those concerned about education could act more effectively to raise standards and improve instruction if better information were available.
Critics of the VNTs have charged, among other things, that national testing is unnecessary, that it will lead to more centralized control of education, and that it will usurp the prerogatives of states and local communities. But the criticism most relevant to our charge comes from civil rights groups: that implementing national tests could harm poor students and minority students if test scores are linked to high-stakes consequences for individual students, unless there are protections to ensure that all students receive access to high-quality curriculum and instruction. From this perspective, the VNTs are surely problematic: under the proposed arrangement, in which the test would be licensed to private test publishers, the federal government would be unable to regulate how states and local districts would use the test results.6
The education policy landscape is also dominated at present by efforts to end social promotion, in part through testing. In his 1998 State of the Union speech, President Clinton asserted that "when we promote a child from grade to grade who hasn't mastered the work, we don't do that child any favors. It is time to end social promotion in America's schools." The president thus joined a host of other political leaders, from the Democratic mayor of Chicago to the Republican governor of Texas, all calling for an end to the promotion of students whose achievement does not meet expectations for that grade level.
Advocates have argued that ending social promotion does not necessarily mean retaining students in grade. They maintain, in fact, that one can be equally opposed both to social promotion and to retention in grade. Indeed, Michael Cohen, the president's special assistant for education, told the committee that social promotion versus retention was a "false choice" because "we know that [retention] doesn't do them a lot of good either." The answer, he said, was "to find a sort of a middle ground, where you're actually starting early to provide kids who need it with extra help, putting effective practices in place, giving them extended opportunities, and in [the] process assuming that all kids you're dealing with can meet standards if you give them the right opportunities."
The Clinton administration recommends that schools use specific grade-by-grade standards and a challenging curriculum aligned with those standards, smaller classes, well-prepared teachers, and after-school and summer-school programs for those students who need them (Clinton, 1998). Similarly, in its report on district promotion policies and practices, the AFT (1997:21) notes:
Policy alternatives must ensure that students learn what they need to know to be successful in the next grade, and ultimately, in life. Ignoring the problem of failure (social promotion) and doing again what failed to work the first time (simple retention) is not the answer. Policy changes must address the underlying problem of why children do not achieve and what changes in school organization, curriculum, instruction, and educational programs are necessary if children are to succeed.
Despite these assurances and suggested alternatives to retaining students in grade, a number of researchers and advocacy groups have argued that, even though districts may rely on a variety of interventions for ending social promotion, many will also retain students in the same grade for an additional year. Yet most research on retention shows that retained students are generally worse off than their promoted counterparts on both personal adjustment and academic outcomes (Shepard and Smith, 1989).
Chicago has become a focal point for the social promotion debate. It was the first large district to announce its intention to end the practice, basing promotion decisions solely on a test that its developers maintain was not designed for that purpose. In other districts that are moving to end social promotion, debate has focused more on the merits of the policy, because test scores are only one criterion for decisions. For example, under a proposal now being considered in New York City, 4th and 7th graders' readiness to advance to the next grade would be measured by
a new state reading test as well as by a comprehensive evaluation of their course work and a review of their attendance records (Steinberg, 1998).
In short, testing policy has become a focal point for political debates over schooling. Its role as an electoral campaign issue, the position of major interest groups on assessment questions, and public attitudes toward testing all shape the context in which policymakers decide what constitutes appropriate test use.
Politics of Assessment
With the growth of testing as a policy strategy, discussions about its use have moved more and more from the technical realm to the political world of electoral campaigns, interest groups, and public opinion. It is now quite common for those running for public office to call for greater test-based accountability, to take stands on which tests should be used for which students, and to support the use of testing for specific purposes, such as ending social promotion. Although the extent to which politicians are leading public opinion or following it is unclear, their focus on testing has certainly tapped a strong vein of support among the American public.
In a variety of national and state public opinion polls, large majorities of respondents favor using tests to identify student and teacher weaknesses, to decide who is promoted, and to rank schools. For example, requiring students to pass tests for grade-to-grade promotion (70 percent) and for high school graduation (80 percent) was strongly supported in a 1994 Public Agenda survey (Johnson and Immerwahr, 1994). In the 1995 Phi Delta Kappan/Gallup poll, 65 percent of the respondents supported requiring students in their local communities to pass standardized tests for promotion from one grade to another, a proportion that has remained constant over the four times since 1978 that the question has been asked (Hochschild and Scott, 1998). In the same poll, 60 percent reported believing that raising standards would encourage students from low-income backgrounds to do better in school (with no statistically significant differences by race of respondent).
The public also seems willing to accept some of the negative consequences associated with this kind of high-stakes testing: 65 percent of those queried in the 1995 poll favored stricter requirements for high school graduation even if fewer students graduate, again with no differences by race (Elam and Rose, 1995). In a March 1997 NBC/Wall Street
Journal poll, 70 percent of the respondents reported that requiring students to pass standardized tests in order to move on the next grade would represent a "big improvement" (Ferguson, 1997).
Recent state polls show similar results. In a 1997 poll of Massachusetts residents (Mass Insight, 1997), 61 percent of the respondents supported passing a 10th grade competency test as a condition of high school graduation. About half of those with an opinion thought that no more than 10 percent of the students in their own communities would fail the exam, and 25 percent though more than 20 percent would fail. But 61 percent of the respondents said that, even if 25 percent of their hometown students failed the exam, they would still require students to pass it. In the same poll, about the same proportion (65 percent) approved of students not being promoted to the next grade until they pass a required test.
Similarly, a 1998 PACE/Field Institute poll of Californians found that 62 percent favored setting uniform student promotion requirements "based on students passing an achievement test, rather than leaving this up to teachers" (Fuller et al., 1998). Likewise, 82 percent of the general public and 67 percent of teachers surveyed in South Carolina in 1997 felt that standards for promotion from elementary to middle school should be raised and that students should be allowed to move ahead only if they pass a test showing that they have reached those standards (Immerwahr, 1997).
Poll data present a consistent picture of strong public support for the use of tests for high-stakes decisions about individual students.7 Despite some evidence that the public would accept some of the potential tradeoffs, it seems reasonable to assume that most people are unaware of the full range of negative consequences related to this kind of high-stakes test use. Moreover, it seems certain that few people are aware of limits on the information that tests can provide. No survey questions, for example, have asked how much measurement error is acceptable when tests are used to make high-stakes decisions about individual students. The support
for testing expressed in polls might decline if the public understood these things. Nevertheless, public opinion continues to play an important legitimating function in support of these high-stakes uses of tests.
Interest groups with a direct stake in the educational enterprise express a wider range of views than the public's broad support of high-stakes testing. We learned, for example, that the AFT supports the high-stakes use of tests for making promotion and graduation decisions, arguing that, unless there are consequences, the rigorous content standards it espouses will not be real to students. The organization also sees high school exit exams, based on high standards, as a way to avoid the costly remediation now being undertaken by postsecondary institutions and business. At the same time, it believes that decisions about promotion or graduation should not be based solely on a single test score and that students who do not meet the standards should receive remedial education that would enable them to do so.
Although the National Education Association (NEA) takes no official position on the desirability of using tests for high-stakes decisions about individual students, it opposes their use "as a single criterion for high-stakes decision making," or when "they do not match the developmental levels or language proficiency of the student" (NEA 1997 Resolutions—B-55, Standardized Testing of Students). The national Parent-Teachers Association (PTA), which opposes federal legislation or regulations that mandate standardized testing or that would lead to such testing, takes a similar position on test use: "valid assessment does not consist of only a single test score, and … at no time should a single test be considered the sole determinant of a student's academic or work future, e.g., high school graduation, scholarship aid, honors programs, or college admissions" (PTA position statement, 1996).
Several civil rights organizations strongly oppose the high-stakes use of standardized tests, at least when test scores are the sole factor used in making high-stakes decisions for students or when students do not have equal access to high-quality instruction. For more than 20 years, the National Association for the Advancement of Colored People (NAACP) has called the use of testing as a sole criterion for the nonpromotion of students and the use of competency testing for high school graduation "another way of blaming the student victim." Rhonda Boozer, the NAACP education coordinator, reports that the organization is on record as opposing "the use of testing results in an adverse fashion and any movement geared to [the] use of scores on a national test as [a] prerequisite for high school graduation" (personal communication).
The Mexican American Legal Defense and Educational Fund has filed suit against the state of Texas for its use of the Texas Assessment of Academic Skills as an exit test for high school graduation. It argues that the test denies diplomas to students without sufficient proof that the policy will enhance students' education or life opportunities, and that the test does not correspond to what is actually taught in schools in many minority communities. The National Association for Bilingual Education has more specific concerns about the nature of standardized tests: students should be assessed with appropriate, performance-based tests, and English-language learners should not be assessed with tests that are inappropriate at their level of language competency.8
The contrast between strong public support for high-stakes testing of individual students and the more qualified positions of major education interest groups suggests a significant disjuncture between these organizations and their constituencies.9 Whether this gap reflects incomplete information on the public's part or true differences between organizational policymakers and the public is unknown. In either case, test use will surely continue to be a highly politicized issue. If elected officials decide to pursue high-stakes strategies, they will be able to draw on latent public support, but they may also face considerable opposition from some quarters.
In a policy memo prepared for the committee, University of Wisconsin political scientist Donald Kettl argued that "performance measures and educational tests are not really about measurement. They are about political communication." Although he may have overstated the case, Kettl makes a telling point. Whether tests are used for high-or low-stakes purposes, the information they provide will feed public debate
about educational goals and curricula and about whether schools are-accomplishing their mission.
When serious personal consequences are attached to test results, test use enters the political realm in yet another way. Fundamental questions about what constitutes equal treatment and who should receive valued societal benefits come to the forefront. Moreover, high-stakes test uses force us to confront trade-offs between potential societal benefits, such as a better-trained workforce and a more informed citizenry, and potential costs to individuals who do not meet the common performance standards as measured by an assessment.
The technical standards for appropriate test use outlined in Chapter 4 should inform the search for answers to those questions. In the end, however, decisions about how we choose to use tests rest largely with political institutions—with legislatures, courts, and school boards. The resulting policies will be interpreted and implemented by technical experts and professional educators, but their underlying intent will be the result of political choices.
Airasian, P.W. 1987 State mandated testing and educational reform: Context and consequences. American Journal of Education 95(3):393–412.
American Federation of Teachers 1997 Passing on Failure: District Promotion Policies and Practices. Washington, DC: American Federation of Teachers.
Bond, L.A., D. Braskamp, and E. Roeber 1996 The Status Report of the Assessment Programs in the United States. Washington, DC: The Council of Chief State School Officers and Oak Brook, IL: North Central Regional Educational Laboratory.
Clinton, W.J. 1998 Public Papers of the Presidents of the United States. Washington, DC: Government Printing Office.
Cohen, D.K., and P.L. Peterson 1990 Special issue of Educational Evaluation and Policy Analysis 12(3):233–353.
Corbett, H.D., and B.L. Wilson 1991 Testing, Reform, and Rebellion. Norwood, NJ: Ablex Publishing.
Elam, S.M., and L.C. Rose 1995 The 27th annual Phi Delta Kappan/Gallup Poll of the public's attitudes toward the public schools. Phi Delta Kappan 77(1):41–56.
Fass, P.S. 1980 The IQ: A cultural and historical framework. American Journal of Education 88(4):431–458.
Ferguson, G.A. 1997 Searching for consensus in education reform. The Public Perspective 8(4):49–52.
Fischer, C.S., M. Hout, M.S. Jankowski, S.R. Lucas, A, Swidler, and K. Voss 1996 Inequality by Design: Cracking the Bell Curve Myth. Princeton, NJ: Princeton University Press.
Fuller, B., G. Hayward, and M. Kirst 1998 Californians Speak on Education and Reform Options. PACE/Field Institute School Reform Poll. Berkeley, CA: Policy Analysis for California Education.
Haertel, E. 1989 Student achievement tests as tools of educational policy: Practices and consequences. Pp. 25–50 in Test Policy and Test Performance: Education, Language and Culture, B.R. Gifford, ed. Boston: Kluwer Academic Publishers.
Haney, W. 1984 Testing reasoning and reasoning about testing. Review of Educational Research 54(4):597–654.
Herman, J.L., and S. Golan 1993 The effects of standardized testing on teaching and schools. Educational Measurement, Issues and Practice 12(4):20–25, 41–42.
Hernnstein, R.J., and C. Murray 1994 The Bell Curve: Intelligence and Class Structure in American Life. New York: Free Press.
Hochschild, J., and B. Scott 1998 Trends: Governance and reform of public education in the United States. Public Opinion Quarterly 62(1):79–120.
Immerwahr, J. 1997 What Our Children Need: South Carolinians Look at Education. New York: Public Agenda.
Johnson, J., and J. Immerwahr 1994 First Things First: What Americans Expect from the Public Schools. New York: Public Agenda.
Kettl, D.F. 1998 Uses of Educational Tests. Memorandum to the Board on Testing and Assessment.
Koretz, D.M., R.L. Linn, S.B. Dunbar, and L.A. Shepard 1991 The Effects of High-Stakes Testing on Achievement: Preliminary Findings About Generalization Across Tests. Paper presented at the annual meeting of the American Educational Research Association and the National Council on Measurement in Education, Chicago, IL.
Linn, R.L. 1998 Assessments and Accountability. Paper presented at the annual meeting, American Educational Research Association, San Diego, CA.
Madaus, G.F. 1988 The influence of testing on the curriculum. Pp. 83–121 in Critical Issues in Curriculum, Eighty-Seventh Yearbook of the National Society for the Study of Education, L.N. Tanner, ed. Chicago: University of Chicago Press.
Mass Insight 1997 The Public's View of Standards and Tests: Executive Summary. Cambridge, MA: Mass Insight.
McDonnell, L.M. 1994 Policymakers' Views of Student Assessment. Santa Monica, CA: RAND.
Mehrens, W.A. 1998 Consequences of Assessment: What Is the Evidence? Vice Presidential Address for Division D, annual meeting of the American Educational Research Association, San Diego.
National Research Council 1997 Educating One and All: Students with Disabilities and Standards-Based Reform, L.M. McDonnell, M.L. McLaughlin, and P. Morison, eds. Board on Testing and Assessment. Washington, DC: National Academy Press.
Office of Technology Assessment 1992 Testing in American Schools: Asking the Right Questions. OTA-SET-519. Washington, DC: U.S. Government Printing Office.
Olson, L. 1998 Study warns against reliance on testing data. Education Week (March 25):10.
Shepard, L.A., and Smith, M.L. 1989 Flunking Grades: Research and Policies on Retention. Philadelphia: Falmer Press.
Smith, M.L., and Claire Rottenberg 1991 Unintended consequences of external testing in elementary schools. Educational Measurement, Issues and Practice 10(4):7–11.
Steinberg, J. 1998 New York's chancellor vows to end routine promotions to the next grade. New York Times April 21:A23.
Tyack, D.B. 1974 The One Best System. Cambridge: Harvard University Press.
Debra P. v. Turlington, 644 F.2d 397 (5th Cir. 1981)
Individuals with Disabilities Education Act, 20 U.S.C. section 1401 et seq.
Title I, Elementary and Secondary Education Act, 20 U.S.C. sections 6301 et seq.