5
Accountability and Assessment

Public accountability has always been a hallmark of public schooling in the United States, although it takes a variety of forms. For example, in casting its votes for school board and state legislative candidates, the public holds elected officials accountable for educational quality. Policy makers, in turn, hold professional educators accountable when they decide under what conditions schools will be funded, how curriculum and instruction will be regulated, and how high performance will be rewarded and low performance sanctioned. The assumption in all these transactions is that a social contract exists between communities and their schools: the public supports and legitimates the schools and, in exchange, the schools meet the community's expectations for educating its children.

No matter what type of accountability mechanisms are imposed on schools, information about performance lies at the core. Only with public reporting on performance can policy makers and the public make informed decisions, and only with reliable and useful data do educators have the information necessary to improve their work. Data on school performance are varied and include revenue and expenditure reports, descriptions of school curricula, and student attendance records. But assessments of student achievement are the most significant indicator for accountability purposes. In fact, over the past 20 years, student scores on standardized tests have become synonymous with the notion of educational accountability.

The accountability system for general education differs in two major ways from that for special education: it is public, and it typically focuses on aggregate student performance. In contrast, for special education, accountability is centered on the individualized education program (IEP), an essentially private document that structures the educational goals and curriculum of an individual student



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform 5 Accountability and Assessment Public accountability has always been a hallmark of public schooling in the United States, although it takes a variety of forms. For example, in casting its votes for school board and state legislative candidates, the public holds elected officials accountable for educational quality. Policy makers, in turn, hold professional educators accountable when they decide under what conditions schools will be funded, how curriculum and instruction will be regulated, and how high performance will be rewarded and low performance sanctioned. The assumption in all these transactions is that a social contract exists between communities and their schools: the public supports and legitimates the schools and, in exchange, the schools meet the community's expectations for educating its children. No matter what type of accountability mechanisms are imposed on schools, information about performance lies at the core. Only with public reporting on performance can policy makers and the public make informed decisions, and only with reliable and useful data do educators have the information necessary to improve their work. Data on school performance are varied and include revenue and expenditure reports, descriptions of school curricula, and student attendance records. But assessments of student achievement are the most significant indicator for accountability purposes. In fact, over the past 20 years, student scores on standardized tests have become synonymous with the notion of educational accountability. The accountability system for general education differs in two major ways from that for special education: it is public, and it typically focuses on aggregate student performance. In contrast, for special education, accountability is centered on the individualized education program (IEP), an essentially private document that structures the educational goals and curriculum of an individual student

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform and then serves as a device for monitoring his or her progress. The accountability mechanisms for general and special education are not inconsistent with one another and, for students with disabilities, the IEP serves as the major vehicle for defining their participation in the common, aggregated accountability system. Nevertheless, if students with disabilities are to participate in standards-based reform, their individualized educational goals must be reconciled with the requirements of large-scale, highly standardized student assessments. The education standards movement has emphasized assessment as a lever for changing curriculum and instruction, at the same time continuing and even amplifying its accountability purposes. Indeed, assessment has often been the most clearly articulated and well-publicized component of standards-based reform. The appeal of assessment to policy makers who advocate education reform is understandable. Compared with other aspects of education reform, such as finding ways to implement and fund increased instructional time; improve recruitment, professional development, and retention of the most able teachers; and reduce class size, assessments are relatively inexpensive, can be externally mandated and implemented quickly, and provide visible results that can be reported to the press (Linn, 1995). The preeminent role of assessment in standards-based reform has also attracted considerable controversy. Some observers have cautioned that a heavy reliance on test-based accountability could produce unintended effects on instruction. These include ''teaching to the test" (teachers giving students practice exercises that closely resemble assessment tasks or drilling them on test-taking skills) and narrowing instruction to emphasize only those skills assessed rather than the full range of the curriculum. Indeed, research suggests that raising assessment stakes may produce spurious score gains that are not corroborated by similar increases on other tests and do not reflect actual improvements in classroom achievement (Koretz et al., 1991, 1992; Shepard and Dougherty, 1991; Shepard, 1988, 1990). Analysts have also questioned the potential effects of assessment-based accountability on low-achieving students. Will schools choose to focus their efforts on students closest to meeting acceptable performance levels? What happens to students who fail to meet performance standards? Observers have questioned whether the same assessments can fulfill both their intended roles of measuring performance and inducing instructional change. Researchers have also raised concerns about the technical difficulties of designing and implementing new forms of assessment (Hambleton et al., 1995; Koretz et al., 1996a). These potential effects do not appear to have dampened enthusiasm for assessment as a lever for reform; the basic purposes and uses of assessment in standards-based reform are proceeding unchanged. Many students with disabilities, however, are exempted from taking common assessments for a variety of reasons, including confusion about the kinds of testing accommodations that are available or allowable, local concerns about the

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform impact of lower scores on average performance, concerns about the impact of stressful testing on children, and difficulties in administering certain tests to students with severe disabilities. But regardless of the reason, many students with disabilities who are exempted from assessments are not considered full participants in other aspects of the general curriculum. And if the performance of these students does not count for accountability purposes, then there may be less incentive for educational agencies to try to enhance their educational offerings and improve their performance. Eliminating these assessment barriers is therefore an important component of efforts to include more students with disabilities in standards-based reform. Efforts to increase participation of students with disabilities in assessment programs reflect two distinct goals. One goal is to improve the quality of the educational opportunities afforded students with disabilities. For example, some reformers maintain that holding educators accountable for the assessment scores of students with disabilities will increase their access to the general education curriculum. A second goal is to provide meaningful and useful information about the performance of students with disabilities and about the schools that educate them. Although they recognize that student test scores alone cannot be used to judge the quality of a particular school's program, reform advocates assume that school-wide trends in assessment scores and the distribution of those scores across student groups, such as those with disabilities, can inform parents and the public generally about how well a school is educating its students. Ideally, an assessment program should achieve both goals. With efforts to include increasing numbers of students with disabilities in standards-based reform, questions about assessment remain pivotal. For example, are assessments associated with existing standards-based reform programs appropriate for students with disabilities? The answer to this question may well depend on the nature of a student's disability, the nature of the assessment program, whether accommodations (i.e., modified testing conditions) are provided, and whether accountability rests at the student, school district, or state level. If accommodations are provided, what are their effects on the validity of the assessment? Should scores earned when accommodations are provided be so indicated with a special notation in score reports? Many students with disabilities spend part of their school day working on basic skills, reducing their opportunity to learn the content tested by standards-based assessments. Is it fair, then, to hold them to standards of performance comparable to their peers without disabilities? In the remainder of this chapter, we first provide an overview of accountability systems in standards-based reform. We then consider the role of assessment systems in standards-based reform. The next section describes the current participation of students with disabilities in state assessment programs. The fourth, and longest, section focuses on the necessary conditions for increasing their participation in large-scale assessments, with particular attention to reliability and validity considerations, the design of accommodations, test score reporting, the

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform legal framework, and resource implications. The following section discusses implications of increased participation, and a final section presents the committee's conclusions. Our focus on the assessment of students with disabilities in the context of standards-based reform has precluded consideration of a number of more general issues concerning assessment of children with disabilities. Examples of key issues that are not addressed include proposed changes in the IQ-achievement discrepancy criterion used to identify students with learning disabilities (see Morison et al., 1996) and other issues related to assessment for program eligibility purposes and preparation of the IEP. OVERVIEW OF ACCOUNTABILITY SYSTEMS Accountability systems are intended to provide information to families, elected officials, and the public on the educational performance of students, teachers, schools, and school districts, to assure them that public funds are being used legitimately and productively. In addition, some accountability systems are intended to provide direct or indirect incentives to improve educational outcomes. Assessment results are usually the centerpiece of educational accountability systems. The intended purpose and the design of accountability systems affect the type of assessments that are used, how the assessment data are collected, how they are reported, and the validity standard to which assessment results are held. The different purposes of accountability systems lead to distinctions that result in quite different assessment system designs. The first critical factor is the unit to which accountability is directed. Although some systems are geared to provide state-level accountability, these systems build on data collected about districts, schools, and individuals. Most standards-based reforms rest accountability at the district and school levels. Some systems, such as that of Tennessee, focus on classrooms. In addition, some reform programs seek to provide individual-level accountability by giving parents explicit information about the current status, progress, and relative educational performance of their children. This latter kind of accountability is particularly relevant for students with disabilities. The second important distinction is the relevant comparison group in the accountability system. There are three common alternatives. The most basic system provides information that simply allows comparisons among similar units (districts, schools, teachers, or individuals). A more elaborate system also includes comparisons among subgroups, either at the system level or within units. For example, one may wish to compare performance indicators broken down by gender, income, or racial/ethnic groups. Comparisons could also be made between students with and without disabilities or among types of disabilities. Finally, the appropriate time frame for accountability information is an issue. Variables to be decided include how often single period information is collected,

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform whether multiple years are employed, and whether accountability relies on measures of individual student progress over time. These distinctions yield considerable variation in assessment and accountability systems across states. For example, Tennessee has implemented a "value-added" assessment system that measures changes in classroom-level achievement over time. The system also has the unique characteristic of holding teachers accountable not only for the year they teach the students tested, but also for three subsequent years of student performance after students leave their classrooms. Most state accountability systems, however, hold schools responsible for student performance only in the grades in which state assessments are administered, with comparisons made among grade cohorts (e.g., fourth graders) in different years, rather than of the same students over time. According to a recent survey of state assessment programs, nearly every state and many school districts and schools now have some kind of assessment-based accountability framework in place (Bond et al., 1996). In 1994–95, 45 states had active statewide assessment programs. Most of the remaining states were in some stage of developing or revising their assessment programs. Two of the states without active assessments (Colorado and Massachusetts) suspended them while they were being revised. Nebraska is developing its first assessment program. Two states had no plans to implement a statewide assessment program of any kind (Iowa and Wyoming). The assessments that form the basis of these statewide accountability programs are extremely diverse in the content covered, the grades assessed, testing format, and purpose. In general, students are assessed most often at grades 4, 8, and 11; five subjects (mathematics, language arts, writing, science, and social studies) are likely to be assessed. Most states use their assessments for multiple purposes, with the most common based on school- or program-level data: "improving instruction and curriculum" (n = 44), "program evaluation" (n = 39), and ''school performance reporting" (n = 35). Twenty-three states report that they attach consequences at the school level to assessment results; these consequences include funding gains and losses, loss of accreditation status, warnings, and eventual state takeover of schools. Thirty states report that they use individual students' assessment results to determine high school graduation (18 states), grade promotion decisions (5), or awards or recognition (12) (Bond et al., 1996). ASSESSMENT IN STANDARDS-BASED REFORM Because standards-based assessments are diverse, it is difficult to generalize about them. Nonetheless, some common themes are discernible. Dual Purposes—in standards-based reform, large-scale assessment programs usually have two primary, sometimes competing purposes. First, they are expected to provide a primary basis for measuring the success of schools, educators, and students in meeting performance expectations. Second, they are also ex-

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform pected to exert powerful pressure on educators to change instruction and other aspects of educational practice. In this respect, many current standards-based reforms echo the themes of "measurement-driven instruction" (Popham et al., 1985) that shaped state testing programs during the minimum-competency testing movement of the 1970s and the education reform movement of the 1980s (Koretz, 1992). Current assessments differ from those of previous reform movements, however, in their emphasis on higher standards, more complex types of performance, and systemic educational change. Externally Designed and Operated—the assessments that are most central to the standards-based reform movement are external testing programs—that is, they are designed and operated by authorities above the level of individual schools, often by state education agencies. Internal assessments—those designed by individual teachers and school faculties—also play an important role in many standards-based reforms; indeed, one explicit goal of some standards-based reforms is to encourage changes in internal assessments. External assessments, however, are typically considered the critical instrument for encouraging changes in practice, including changes in teachers' internal assessments. Use for Individual or Group Accountability—many large-scale external assessments are used for accountability, although the means of doing so vary greatly. Some assessments have high-stakes accountability for individuals, meaning that individual students' results are used to determine whether a student will graduate from high school, be promoted to the next grade, or be eligible for special programs or recognition. An example is the recently announced high school assessments in Maryland, which will be required for graduation. Other assessments impose serious accountability consequences for educators, schools, or districts but not for students. For example, schools that use aggregated student results to show sufficiently improved performance on the Kentucky Instructional Results Information System (KIRIS) assessments receive cash rewards, and, beginning in 1997, schools that fail to show improvement will be subject to sanctions. In yet other instances, the publicity from school-by-school reporting of assessment results is the sole or primary mechanism for exerting pressure. As we discuss later in this chapter, the method used to enforce accountability—in particular, whether consequences are attached to group performances (schools or classrooms) or individual students—has important implications for the participation of students with disabilities. Infrequently Administered—in many standards-based systems, the external assessments used for accountability are administered infrequently. For example, Maryland's School Performance Assessment Program (MSPAP) is administered in only three grades (third, fifth, and eighth). Kentucky's KIRIS was originally administered in three grades (fourth, eighth, and twelfth); in the last several years, the assessments have been broken into components that are administered in more grades, but a given component, such as writing portfolios, is still administered in only three grades. These assessments are intended to assess a broad range of

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform skills and knowledge that students are expected to have mastered by the grades at which they are administered. In this respect, they differ from course-based examinations, such as the College Board advanced placement tests and the former New York Regents examinations, and they contrast even more sharply with various types of assessments given throughout the school year to assess individual progress. Reporting by Broad Performance Levels—in keeping with the central focus of standards-based reform, these assessments typically employ standards-based rather than normative reporting. That is, student results are reported in terms of how they compare against predetermined standards of what constitutes adequate and exemplary performance, rather than how they compare with the performance of other students in the nation or other distributions of performance. Moreover, the systems typically employ only a few performance standards. For example, Kentucky bases rewards and sanctions primarily on the percentages of students in each school reaching four performance standards (novice, apprentice, proficient, and distinguished) on the KIRIS assessments; Maryland publishes the percentages of students in schools and districts reaching the satisfactory level. In these systems, gradations in performance within one level—that is, between one standard and the next—are not reported. In Kentucky, for example, variations among students who have reached the apprentice level but not the proficient level are not reported. Reporting of results in normative terms, such as national percentile ranks, is downplayed, although it is not always abandoned altogether. For example, the Kentucky Education Reform Act required that the results of the proposed assessment that has become KIRIS be linked to the National Assessment of Educational Progress to provide a national standard of comparison, and the most recent version of KIRIS will include some use of commercial tests, the results of which are reported in terms of national norms. Performance Assessment—the standards-based reform movement has been accompanied by changes in the character of assessments to reflect the changing goals of instruction, as discussed in Chapter 4. In an effort to better measure higher-order skills, writing skills, and the ability to perform complex tasks, large-scale assessments are increasingly including various forms of performance assessment, either in addition to or in lieu of traditional multiple-choice testing. The term performance assessment encompasses a wide variety of formats that require students to construct answers rather than choose responses; these include conventional direct assessments of writing, written tasks in other subject areas (such as explaining the solution to a mathematical problem), hands-on tasks (such as science tasks that require the use of laboratory equipment), multidisciplinary tasks, small-group tasks, and portfolios of student work. In some instances, the specific skills or bits of knowledge that would have been assessed by short, specific items in traditional tests are instead embedded in complex tasks that take students a longer time to complete.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform CURRENT PARTICIPATION OF STUDENTS WITH DISABILITIES IN ACCOUNTABILITY AND ASSESSMENT SYSTEMS Although several studies have documented that the participation of students with disabilities in statewide assessments generally has been minimal, it is also extremely variable from one state to another, ranging from 0 percent to 100 percent (Erickson et al., 1995; McGrew et al., 1992; Shriner and Thurlow, 1992). Inconsistent data collection policies make it difficult to compare participation rates from place to place or to calculate a rate that has the same meaning across various locations; in addition, states tend to use methods that inflate the rates (Erickson et al., 1996).1 Forty-three states have written guidelines about the participation of students with disabilities in state assessments. Most states rely to some extent on the IEP team to make the decision, but only about half the states with guidelines require that participation decisions be documented in the IEP (Erickson and Thurlow, 1996). A number of other factors also affect (and often complicate) these decisions, including vague guidelines that can be interpreted in a variety of ways, criteria that focus on superficial factors rather than on student educational goals and learning characteristics, and concerns about the potentially negative emotional impact of participation on the student (Ysseldyke et al., 1994). In addition, anecdotal evidence suggests other influences, such as pressures to keep certain students out of accountability frameworks because of fears that these students will pull down scores. In most states, nonparticipation in the assessment means that students are also excluded from the accountability system (Thurlow et al., 1995b). Indeed, in many states, even some students who participate in a statewide assessment may still be excluded from "counting" in the accountability framework. Sometimes states or school districts simply decide to exclude from aggregated test scores any students who are receiving special education services (see Thurlow et al., 1995b). For example, in one state the scores of students with IEPs who have taken the statewide accountability assessment are flagged and removed when aggregate scores are calculated for reporting back to districts and to the media; the scores of the students with IEPs are totaled separately and given to principals, who then can do with them what they wish (which frequently means discarding them). Unfortunately, these practices, if unmonitored, may lead to higher rates of exclusion of students with disabilities from accountability frameworks, particularly when incentives encourage exclusion (e.g., if high stakes are associated with aggregated test scores without regard to rates of exclusion). In fact, researchers (Allington and McGill-Franzen, 1992) have demonstrated that the exclusion of 1   These authors suggest that participation rates should be calculated by dividing the number of test-takers with IEPs by the total number of students with disabilities at the relevant age or grade level. Interpretations vary about how to define test-takers and how to define the eligible population in the denominator.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform students with disabilities from high-stakes assessments in New York has led to increased referrals to special education, in part to remove from accountability decisions students who are perceived to be performing at low levels. One of the avenues for increasing participation of students with disabilities in assessments is allowing accommodations. Accommodations currently in use fall into four broad categories (Thurlow et al., 1993). Changes in presentation include, for example, braille forms for visually impaired students and taped versions for students with reading disabilities. Changes in response mode include use of a scribe or amanuensis (an individual who writes answers for the examinee) or computer-assisted responses in assessments that are not otherwise administered by computer. Changes in timing include extra time within a given testing session and the division of a session into smaller time blocks. Changes in setting include administration in small groups or alone, in a separate room. In addition, some students with disabilities may be administered an assessment in a standard setting with some form of physical accommodation (e.g., a special desk) but with no other alteration. Within the past five years, increasing numbers of states have written guidelines outlining their policies on the use of accommodations. In 1992, 21 states indicated they had written guidelines on the use of accommodations by students with disabilities in their statewide assessments; by early 1995, 39 states had such written guidelines (Thurlow et al., 1995a). An analysis of these state accommodations guidelines found a great deal of variation in their format and substance (Thurlow et al., 1995a). Some are one sentence long, and others take up numerous pages. States use diverse terms (e.g., nonstandard administration, mediation, modification, alteration, adaptation, accommodation), sometimes indistinguishably. Some states vary their guidelines depending on the type or purpose of the assessment, and others use the same guidelines for all purposes. States also classify accommodations in different ways: by category of disability, by the specific test being administered, or by whether accommodations are appropriate for score aggregation. Perhaps most important, states take different approaches regarding which accommodations they allow or prohibit and how they treat the scores of students with disabilities who use accommodations. An accommodation that is explicitly permitted in one state might be excluded in another (Thurlow et al., 1995a). This variation is not surprising, given that little research exists on the impact of specific accommodations on the validity of various elementary and secondary achievement tests (Thurlow et al., 1995d). States also have divergent policies about whether to include the scores of students with disabilities who used accommodations in assessment-based accountability frameworks. Some states exclude these scores because of concerns about their validity (Thurlow et al., 1995a). Despite the variability of state guidelines on accommodations, some generalizations can be made (Thurlow et al., 1995a). First, the majority of states with guidelines (n = 22) recognize the importance of the IEP and the IEP team in

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform making decisions about accommodations for individual students. Second, many states (n = 14) specifically refer to a link between accommodations used during assessment and those that are used during instruction. Third, relatively few states (n = 4) require written documentation about assessment accommodations beyond what is written in the IEP. Even without such a requirement, however, many state assessment directors still document the use of assessment accommodations. A 1995 survey found that 17 of the 21 states that collect data on individual students with disabilities in their statewide assessment database also document whether an individual student used an accommodation. Not all of these states, however, can identify exactly which accommodations a student used (Elliott et al., 1996a; Erickson and Thurlow, 1996). In most states, the net effect of policies on exclusion and accommodation is to keep at least some students with disabilities out of the accountability framework. However, most states are now reviewing the participation of students with disabilities in their assessment and accountability systems and the use of accommodations.2 (We examine the design of assessment accommodations in the next section of this chapter.) The large-scale assessments that typify standards-based reform are in many ways unlike those typically used in special education. Although including students with disabilities in these assessments may benefit them, the assessments themselves are not designed to manage the instruction delivered to individual students with disabilities. Large-scale assessments are not intended to track the progress of individual students. Assessments are infrequent; often they are administered late in the school year, so that teachers do not see results until the following school year. They are not designed to provide longitudinal information about the progress of individual students, and in many cases, they do not place results from different grades on a single scale to allow measurement of growth over time. In fact, some large-scale assessments used in standards-based reform are also not designed to provide high-quality measurement of individual performance; measurement quality for individuals was deliberately sacrificed in the pursuit of other goals, such as broadening the content coverage of the assessment for schools and incorporating more time-consuming tasks. Moreover, unlike some assessments used to manage special education, the large-scale assessments in standards-based reform focus on high-performance standards that are applied without distinction or differentiation to most students, including low-achieving students and students with disabilities. In contrast, a fundamental tenet of the education of students with disabilities is individualization and differentiation, as reflected in the IEP. The educational goals for each student with disabilities are required to reflect his or her capabilities and needs, as 2   Both the Office of Special Education Programs and the Office of Educational Research and Improvement in the U.S. Department of Education are now supporting research on these issues.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform should the instructional plan and assessments. For example, one IEP may call for a sign language interpreter to assist a student in the advanced study of history; another IEP may call for training in basic functional skills, such as telling time. Standards-based reform calls for uniformity in outcomes, allowing educators variations only in the path to those ends. Many of the new, large-scale assessments deliberately mix the activities and modes of presentation required by individual tasks to mirror real-world work better and to encourage good instruction. A task may require substantial reading and writings as well as mathematical work, group work as well as individual work, or hands-on activities as well as written work. This mixture of modes is a reaction against the deliberately isolated testing of skills and knowledge found in traditional tests. But the instructional programs of many students with disabilities focus on developing very special skills, which are tested most effectively with narrowly focused tasks. The methods used to report assessment results may limit their utility for tracking the progress of some students with disabilities. New large-scale assessments typically use only a few performance levels, wherein the lowest level is high relative to the average distribution of performance. Consequently, no information is provided about modest gains by the lowest-performing students, including some students with disabilities, and the reporting rubric signals that modest improvements are not important unless they bring students above the performance standard. Ideally, the tests used in special education should track the kinds of modest improvements that a student can reasonably achieve in the periods between measurements. INCREASING THE PARTICIPATION OF STUDENTS WITH DISABILITIES IN LARGE-SCALE ASSESSMENTS Including more students with disabilities in large-scale assessments in a way that provides meaningful and useful information will require confronting numerous technical issues. In addition, these assessments must be designed and implemented within the legal framework that defines the educational rights of students with disabilities and with consideration of the resources that new assessments will require for development, training, and administration. We address these technical and political issues in this section. Assessment Design Assessment programs associated with standards-based reform should satisfy basic principles of measurement, regardless of whether the assessment is traditional, performance-based, or a combination. Performance assessments, which comprise the bulk of standards-based assessments, are relatively new, and empirical evidence on their quality, although growing, is limited. Nonetheless, mea-

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform district, school, classroom, individual student), the frequency of testing and grade levels tested, and the uses of assessment data. The design of reporting mechanisms always involves critical choices because schools and groups of students are typically compared with each other, with themselves over time, or against a set of performance standards. In some cases, these comparisons may also result in rewards and sanctions. Consequently, ensuring fair comparisons becomes a major issue. The public's right to know and to have accountable schools must be balanced against individual student rights and the disparate resources and learning opportunities available to different schools and students. Creating a fair and responsible reporting mechanism is one of the major challenges associated with expanding the participation of students with disabilities in large-scale assessments and public accountability systems. In this section, we examine two issues that must be considered in reporting on the performance of students with disabilities. One issue pertains to flagging—making a notation on the student's score report that identifies scores as having been obtained with accommodations or under other nonstandard conditions. A second issue relates to disaggregation—the separate reporting of scores for groups such as students with disabilities. In part, the resolution of these issues hinges on the uses to which scores are put, such as whether scores are reported at the aggregate or individual level. However, in many instances, there is no unambiguous resolution of these issues. The research base that might guide decisions is limited and, perhaps more important, an emphasis on different values leads to different conclusions about the best resolution. Flagging Flagging is a concern when a nonstandard administration of an assessment—for example, providing accommodations such as extra time or a reader—calls into question the validity of inferences (i.e., the meaning) based on the student's score. Flagging warns the user that the meaning of the score is uncertain. The earlier section on validity and accommodations identified factors that suggest uncertainty about the meaning of scores from accommodated assessments. However, since flagged scores are typically not accompanied by any descriptive detail about the individual or even the nature of accommodations offered, flagging may not really help users to interpret scores more appropriately. It may confront them with a decision about whether to ignore or discount the score simply because of the possibility that accommodations have created unknown distortions. Moreover, in the case of scores reported for individual students, flagging identifies the individual as having a disability, raising concerns about confidentiality and possible stigma. In some respects, flagging is less of a problem when scores are reported only at the level of schools or other aggregates. Concerns about confidentiality and unfair labeling are lessened. Moreover, to the extent that the population with

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform disabilities and assessment accommodations is similar across the aggregate units being compared (say, two schools in one year, or a given school's fourth grades in two different years), flagging in theory would have little effect on the validity of inferences. In practice, however, the characteristics of the group with disabilities may be quite different from year to year or from school to school. Moreover, decisions about accommodations and other modifications may be made inconsistently. Thus, even in the case of scores reported only for aggregates, flagging may be needed to preserve the validity of inferences. When testing technology has sufficiently advanced to ensure that accommodations do not confound the measurement of underlying constructs, then score notations will be unnecessary. Until then, however, flagging should be used only with the understanding that the need to protect the public and policy makers from misleading information must be weighed against the equally important need to protect student confidentiality and prevent discriminatory uses of testing information. Disaggregation It is not yet clear what kinds of policies states and districts will adopt about disaggregating results for students with disabilities and other groups with special needs, but they will have to do at least some disaggregation under the Title I program. The new federal Title I legislation requires the results of Title I standards-based assessment to be disaggregated at the state, district, and school levels by race and ethnicity, gender, English proficiency, migrant status, and economic disadvantage and by comparisons of students with and without disabilities. There are several arguments in favor of disaggregating the scores of students with and without disabilities. The first argument is one of validity: if the scores of some students with disabilities are of uncertain meaning, the validity of comparisons for the whole group would be enhanced by separating those scores. The second is about fairness: schools have varying numbers of students with disabilities from one cohort to another, and, to the extent that some of these students face additional educational burdens, disaggregation would lead to fairer comparisons. The third argument is one of accountability: separately reporting the scores of students with disabilities will increase the pressure on schools to improve the education offered to them. (Note that these same arguments apply for any group of students for whom scores are of uncertain meaning or who could benefit from a separate analysis of their performance, for example, students with limited English proficiency and Title I students.) Whatever its merits, however, disaggregation confronts serious difficulties pertaining to the reliability of scores. One reason is simply the small number of students involved. The problems of low numbers will be most severe for school-level reporting. An elementary school that has 50 students in a tested grade, for example, is likely to have perhaps 4 to 6 students with disabilities in that grade. The unreliability of disaggregated scores is exacerbated by the ambiguous and

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform variable identification of students as having a disability; a student identified and hence included in the score for students with disabilities in one school or cohort may well not be identified in another. The diversity of these students also augments the problem of reliability; in one cohort of five students with disabilities, there might be one with autism and one with retardation, whereas another cohort might include none with autism or retardation but a highly gifted student with a visual disability. Thus, for example, a change that would appear to indicate improvement or deterioration in the education afforded students with disabilities in fact could represent nothing but differences in the composition of the small groups of students with disabilities. The enormous diversity among those in the category of students with disabilities is the primary argument in favor of a more detailed disaggregation of scores by type of disability. In theory, detailed disaggregation could alleviate some of the distortions caused by cohort-to-cohort differences in disabilities. Moreover, it could provide more meaningful comparisons. Students whose disability is partial blindness, for example, might be more meaningfully compared with students without disabilities than with students with mental retardation or autism. Detailed disaggregation exacerbates the problem of small numbers, however, particularly for the less common disabilities. For example, the national prevalence rate for identified visual disabilities served under the Individuals with Disabilities Education Act (IDEA) or the state-operated programs of Chapter 1 in the 6–17 age range was 0.05 percent in the 1993–94 school year (U.S. Department of Education, 1995:Table AA16). Thus, in our hypothetical example of an elementary school with 50 students in a tested grade, one can expect a student identified as visually impaired to appear in that grade, on average, once every 40 years. Although detailed disaggregation may improve the meaningfulness of results for larger groups and larger aggregates, it will not provide useful aggregate comparisons for smaller disability groups or smaller aggregates. Detailed disaggregation also would run counter to the current movement within special education to avoid formal classifications and to focus instead on individual students' functional capabilities and needs. As with flagging, those making decisions about data disaggregation in state reporting systems should weigh the need for valid and useful information equally with consideration of any potentially adverse effects on individuals. Care must be taken so that disaggregated data do not allow identification of results for individual students. The usual approach to this problem is not to report results for any cell in a table with a sample size below a certain number of students (e.g., five). Legal Framework for Assessing Students with Disabilities19 The federal statutes and regulations governing the education of students with disabilities recognize the importance of the validity of tests and assessments. The 19   This section is based on the legal analysis prepared for the committee (Ordover et al., 1996).

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform regulations implementing both the IDEA and Section 504 of the Rehabilitation Act of 1973 require that tests and other evaluation materials must be validated for the specific purpose for which they are used. Both sets of regulations also require that, when a test is administered to a student with impaired sensory, manual, or speaking skills, the test results accurately reflect the child's aptitude or achievement level or whatever other factors the test purports to measure, rather than reflecting the student's disabilities. Accommodations for disabilities in testing or assessment are also required by these federal statutes and regulations. Both Section 504 and the Americans with Disabilities Act (ADA) require that individuals with disabilities be protected against discrimination on the basis of disability and be allowed access to equally effective programs and services as received by their peers without disabilities. The ADA regulations require that public entities must make ''reasonable modification" in policies, practices, and procedures when "necessary to avoid discrimination on the basis of disability, unless the public entity can demonstrate that making the modifications would fundamentally alter the nature of the service, program, or activity" (28 CFR 35.130[b] [7]). Alternate forms or accommodations in testing are required, but alterations of the content of what is tested are not required by law. For purposes of analyzing potential legal claims on behalf of students with disabilities, distinctions among the various purposes and uses of assessments become critical. Assessments may, for example, be designed primarily as an accountability mechanism for schools and school systems. They may also be used as an integral part of learning, instruction, and curriculum. Or a particular test or tests may be used as a basis for making high-stakes decisions about individual students, including who is placed in the honors curriculum, who is promoted from grade to grade, who receives a high school diploma or a certificate indicating that a student has mastered a certain set of skills deemed relevant to the workplace. Each use raises its own set of legal issues and has different implications. As a general rule, the greater the potential harm to students, the greater the protection that must be afforded to them and the more vulnerable the assessment is to legal challenge. One set of federal courts has already addressed the constitutional issues arising when a state links performance on a statewide test to the award of a high school diploma. A federal appellate court held unconstitutional a Florida law requiring students to pass a statewide minimum competency test in order to receive a high school diploma. The court in Debra P. v. Turlington held that the state's compulsory attendance law and statewide education program granted students a constitutionally protected expectation that they would receive a diploma if they successfully completed high school. Since the state possessed this protected property interest, the court held that the state was barred under the due process clause of the federal Constitution from imposing new criteria, such as the high school graduation test, without adequate advance notice and sufficient educational opportunities to prepare for the test. The court was persuaded that

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform such notice was necessary to afford students an adequate opportunity to prepare for the test, to allow school districts time to develop and implement a remedial program, and to provide an opportunity to correct any deficiencies in the test and set a proper cut score for passing (644 F. 2d 397, 5th Cir. 1981; see also Brookhart v. Illinois State Bd. of Ed., 697 F. 2d 179, 7th Cir. 1983).20 The court in Debra P. further held that, in order for the state's test-based graduation requirements to be deemed constitutional, the high school test used as its basis must be valid. In the view of the court, the state had to prove that the test fairly assessed what was actually taught in school. Under this concept, which the court referred to as "curricular validity," the test items must adequately correspond to the required curriculum in which the students should have been instructed before taking the test, and the test must correspond to the material that was actually taught (not just supposed to have been taught) in the state's schools. As the court in Debra P. held: "fundamental fairness requires that the state be put to the test on the issue of whether the students were tested on material they were or were not taught…. Just as a teacher in a particular class gives the final exam on what he or she has taught, so should the state give its final exam on what has been taught in its classrooms" (644 F.2d at 406). In reaching this ruling, the court specifically rejected the state's assurance that the content of the test was based on the minimum, state-established performance standards, noting that the state had failed to document such evidence and that no studies had been conducted to ensure that the skills being measured were in fact taught in the classrooms (Pullin, 1994). The same types of issues addressed by the court in Debra P. were also assessed in federal litigation on the impact of a similar test-for-diploma requirement imposed by a local school district in Illinois. The Illinois case, Brookhart v. Illinois State Board of Education (697 F. 2d 179), specifically assessed the impact on students with disabilities who had been in special education of using a minimum competency test to determine the award of high school diplomas. The court held that students with disabilities could be held to the same graduation standards as other students, but that their "programs of instruction were not developed to meet the goal of passing the [minimum competency test]" (697 F. 2d at 187). The court found that "since plaintiffs and their parents knew of the [test] requirements only one to one-and-a-half years prior to the students' anticipated graduation, the [test] objectives could not have been specifically incorporated into the IEP's over a period of years." The court counseled that the notice or opportunity to learn requirement could be met if the school district could ensure that students with disabilities are sufficiently exposed to most of the material that 20   It is important to note that, although Debra P. has shaped legal thinking about students' entitlement to the teaching of the content on which they will be tested, this decision and the one in Brookhart apply only within the jurisdiction of the Fifth and Seventh federal Circuit Courts respectively. If the same questions were posed to the U.S. Supreme Court, a different decision might result.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform appears on the test. These constitutional principles are consistent with the opportunity-to-learn requirements derived from the IDEA, Section 504, the ADA, and state constitutions. The expanded participation of students with disabilities in state assessments, coupled with the curriculum and performance standards embodied in standards-based reform, are likely to raise new legal questions and require additional interpretations of existing statutes. Nevertheless, it is clear that several legal principles will continue to govern the involvement of students with disabilities in state assessments. Chief among them are the requirements that reasonable accommodations or alternate testing forms be provided consistent with the content being measured and that, in the case of assessments with individual consequences, students be afforded the opportunity to learn the content tested. Resource Implications As states and school districts implement new forms of assessment, they face both development and operations costs. Performance-based assessments need to be developed, field tested, and made available to teachers and schools. While most development costs are incurred in the first few years, item pools need to be replenished and upgraded. The cost of replenishing the pool will be driven in part by the use of, and thus the need to secure, the items. Operational costs are ongoing. Teachers must be trained in how to administer and score new assessment formats, as well as how to integrate performance-based tasks into their daily teaching. Teachers also need to be shown how to make appropriate modifications and adaptations in assessments for students with special needs, including students with disabilities. Unlike standardized tests, which are scored externally and have computer-generated reports, teachers must then be given the time to score and interpret the results of the new assessments. We know little about the cost of developing and implementing large-scale performance-based assessment systems, and we have no empirical data on the cost of including students with disabilities in these assessments. Estimated costs of performance-based assessment programs range from less than $2 to over $100 per student tested. This variation reflects differences in the subjects tested, how many students are tested, how they are assessed (e.g., mix of multiple-choice, open-ended questions, performance tasks, portfolios), who is involved in the development, administration, and scoring of the test (e.g., paid contractors or volunteer teachers), how much and what kind of training is provided, and the type and source of materials used in the assessment tasks. We do know, however, that compared with machine scoring of traditional tests, scoring costs for performance tasks are much greater. In addition, because of the large number of items on traditional tests, individual test items can be retained over several years. But tasks used for performance assessments must be replaced more frequently, compounding costs associated with item development and equating.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform Comfort (1995, as cited in Stecher and Klein, 1997), for example, reported that the science portion of the California Learning Assessment System (CLAS)—half multiple-choice and half hands-on testing—cost the state just $1.67 per student, but much of the time needed to develop, administer, and score the science performance tasks was donated by teachers, and many of the materials used in the assessment were contributed as well. Picus (1995) found that Kentucky spent an average of $46 per student tested for each annual administration between 1991 and 1994, or about $9 per student for each of the five subjects tested. This figure also does not include any teacher or district expenditures (e.g., for training or teacher time for scoring student portfolios). In contrast, Monk (1995) projects the cost of implementing the New Standards Project assessment system at $118 per tested student; this approach, involving a consortium of states and local districts, incorporates a considerable level of professional development (about 20 percent of operating costs) and a heavy emphasis on cumulative portfolio assessment. Stecher and Klein (1997) estimate that one period of hands-on science assessment for a large student population, administered under standardized conditions, would cost approximately $34 per student, about 60 times the cost of a commercial multiple-choice science test. Although one session of performance assessment is sufficient to generate reliable school or district scores, three to four periods of performance tasks are needed to produce an individual student score as reliable as one period of multiple-choice testing, potentially raising the cost of performance assessment even higher. Accommodations in assessment and instruction generally entail additional costs. Sometimes these costs are minimal, such as providing a student with a calculator. But often the costs are more significant and involve additional personnel, equipment, and materials; examples include providing a reader or scribe, preparing a braille or large-print editions of an assessment, and providing high-tech equipment. IMPLICATIONS OF INCREASED PARTICIPATION OF STUDENTS WITH DISABILITIES As noted earlier, many people have encouraged the participation of students with disabilities in large-scale assessments with the hope that it will increase their participation in the general education curriculum and result in greater accountability for their educational performance. At this time, evidence is scarce about how the participation of students with disabilities in assessments affects their educational opportunities. Research is currently under way in a few states that have taken the lead with policies to increase participation, but it will be some time before those efforts can provide substantial information. Greater participation of students with disabilities in large-scale assessments could have both positive and negative effects on aggregated test scores. To some degree, the effects will hinge on the extent to which valid scores can be provided

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform for individual students with disabilities—for example, by determining which accommodations can contribute to more accurate measurement. On one hand, if rules pertaining to accommodations (or modifications) are too permissive, they may falsely inflate scores for students who should not get the accommodation. This result could provide an escape valve, lessening the pressure on educators to bring students with disabilities up to the performance standards imposed on the general education population. On the other hand, policies that guide educators toward providing appropriate accommodations in both assessment and instruction could improve the validity of scores for students with disabilities. Linking accommodations in assessment and instruction—for example, by requiring, as Kentucky does, that accommodations be provided in the state's large-scale assessment only if they are also offered in ongoing instruction—may help limit inappropriate accommodation in assessment and encourage appropriate instructional accommodation. Evidence on the effects of these policies, however, is still lacking. Decisions about participation and accommodations will need to be linked to decisions about reporting and, ultimately, accountability. Keeping track of who is included in the data being reported and under what conditions will be of central importance to ensuring fair comparisons between aggregates. Current decisions about which students with disabilities will participate in assessments are made inconsistently from place to place. This variation makes comparisons between two districts problematic if, for example, one has excluded only 2 percent of its students, and the other has excluded 10 percent. In addition to making results noncomparable from place to place, high rates of exclusion create an incomplete, and possibly inaccurate, view of student performance. For example, a recent study of four states with widely different exclusion rules for the 1994 NAEP reading assessment was conducted by the National Academy of Education (1996). The study found that applying a consistent rule for excluding students with low reading levels increased the number of participating students with disabilities by an average of 4.3 percent in each state; furthermore, when these students were included in the reporting, the mean fourth grade NAEP reading scores were somewhat lower. The size of the decrease varied from state to state (ranging from 1.5 to 3.1 points on the NAEP scale); predictably, the lowest decrease occurred for the state that was already including more students with disabilities. Reporting participation rates of students with disabilities in a consistent and systematic manner is important if comparisons are to be made fairly. Increased participation rates could also contribute to a more accurate description of student performance. If greater participation of students with disabilities is achieved through the use of highly permissive policies about accommodations, the aggregated results may not be accurate, either. For example, the 1995 NAEP field test results suggested that a combination of stricter rules for exclusion and permissive rules about accommodations apparently led some schools to use accommodations for students who could have participated without them (Phillips, 1995). Although em-

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform pirical evidence is limited, it has been suggested (as reviewed earlier) that some accommodations may inflate scores for some students. If accommodations are offered to a number of students who do not really need them, their scores may be artificially inflated, offering an overly optimistic view of progress. Parents, teachers, and schools clearly need meaningful information and do not want to become falsely complacent about the progress of students with disabilities. Careful policies about what accommodations can be offered and to whom is important, as is keeping track of who has been tested with what accommodations. CONCLUSIONS If students with disabilities are to gain any benefits from standards-based reform, the education system must be held publicly accountable for every student's performance. Although the IEP will remain the primary accountability tool for individual students with disabilities, the quality of their learning should also count in judgments about the overall performance of the education system. Without such public accounting, schools have little incentive to expand the participation of students with disabilities in the common standards. Therefore, regardless of the different ways that students with disabilities may be assessed, they should be accounted for in data about system performance. The presumption should be that all students will participate in assessments associated with standards-based reform. Assessments not only serve as the primary basis of accountability, but also they are likely to remain the cornerstone and often the most well-developed component of the standards movement. The decision to exclude a student from participation in the common assessment should be made and substantiated on a case-by-case basis, as opposed to providing blanket exclusions on the basis of categories of disability, and should be based on a comparison of the student's curriculum and educational goals with those measured by the assessment program. Existing data are inadequate to determine participation rates for students with disabilities in extant assessments associated with standards-based reform or to track the assessment accommodations they have received. What few data do exist suggest considerable variability in participation rates among states and among local educational agencies within states. Policies pertaining to assessment accommodations also vary markedly from one state to another, and there is little information indicating the consistency with which local practitioners in a given state apply those guidelines. Variability in participation rates and accommodations threatens the comparability of scores, can distort trends over time as well as comparisons among students, schools, or districts, and therefore undermines the use of scores for accountability. Significant participation of students with disabilities in standards-based reform requires that their needs and abilities be taken into account in establishing standards, setting performance levels, and selecting appropriate assessments.

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform Mere participation in existing assessments falls short of providing useful information about the achievement of students with disabilities or for ensuring that schools are held accountable for their progress. Assessments associated with standards-based reform should be designed to be informative about the achievement of all students, including those with low-incidence, severe disabilities whose curriculum requires that they be assessed with an alternate testing instrument. Adhering to sound assessment practices will go a long way toward reaching this goal. In particular, task selection and scoring criteria need to accommodate varying levels of performance. However, it may also prove essential that the development of standards and assessments be informed by knowledge about students with disabilities. Representatives of students with disabilities should be included in the process of establishing standards and assessments. Assessment accommodations should be used only to offset the impact of disabilities and should be justified on a case-by-case basis. Used appropriately, accommodations should be an effort to improve the validity of scores by removing the distortions or biases caused by disabilities. In some instances, accommodations may also permit inclusion of students who otherwise would not be able to participate in an assessment; for example, braille editions of tests permit the assessment of blind students who would otherwise be excluded. Although accommodations will often raise scores, raising scores per se is not their purpose, and it is inappropriate to use them merely to raise scores. Research on the effects of accommodations, although limited, is sufficient to raise concerns about the potential effects of excessive or poorly targeted accommodations. The meaningful participation of students with disabilities in large-scale assessments and compliance with the legal rights of individuals with disabilities in some instances require steps that are beyond current knowledge and technology. For example, regulations implementing the IDEA and Section 504 require that tests and other evaluation materials must be validated for the specific purpose for which they are used. Individuals with disabilities are also entitled to "reasonable" accommodations and adaptations that do not fundamentally alter the content being tested. Even in the case of traditional assessments, testing experts do not yet know how to meet these two requirements for many individuals with disabilities, particularly those with cognitive disabilities that are related to measured constructs. Moreover, the nature of assessments associated with standards-based reform is in flux. The validity of new forms of assessment has not yet been adequately determined for students in general, and we have even less evidence available for students with disabilities, particularly when testing accommodations are provided. A critical need exists for research and development on assessments associated with standards-based reform generally, and on the participation of students with disabilities in particular. The recent development of assessments associated with standards-based reform, combined with the existence of legal rights governing the education of students with disabilities, has required that state education

OCR for page 151
Educating One & All: Students with Disabilities and Standards-Based Reform agencies, local education agencies, and local school personnel design and implement assessment procedures that in some cases are beyond the realm of existing, expert knowledge. The sooner the research base can match the demands of policy, the more likely that students with disabilities can participate meaningfully in standards-based assessments.