Read "Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments" at NAP.edu

Page 1 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

Executive Summary

U.S. public schools are responsible for educating large numbers of students with disabilities and English language learners—some 20 percent of the nation’s 46 million public school students fall into one or both of these categories. Both of these populations have been increasing, and the demand for evidence of their academic progress has also grown. In response to both changing public expectations and legal mandates, the federal government, states, and districts have attempted to include more such students in educational assessments.

Testing these two groups of students, however, poses particular challenges. Many of these students have attributes—such as physical, emotional, or learning disabilities or limited fluency in English—that may prevent them from readily demonstrating what they know or can do on a test. In order to allow these students to demonstrate their knowledge and skills, testing accommodations are used. For the purpose of this report, we have defined testing accommodations by drawing from the definition in the AERA/APA/NCME Standards for Educational and Psychological Testing (American Educational Research Association et al., 1999). Our adapted definition is as follows: accommodation is used as the general term for any action taken in response to a determination that an individual’s disability or level of English language development requires a departure from established testing protocol.¹

¹

The actual definition of accommodation in the standards appears in the chapter that deals with the testing of individuals with disabilities and reads as follows: “accommodation is used as the general term for any action taken in response to a determination that an individual’s disability requires a departure from established testing protocol” (American Educational Research Association et al., 1999, p. 101).

Page 2 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

The No Child Left Behind Act of 2001 has established the goal for states of including all of their students with disabilities and English language learners in their assessments.² At the same time, the sponsors of the National Assessment of Educational Progress (NAEP) hope to increase the participation of these groups of students in NAEP assessments. The use of accommodations provides an important means for increasing inclusion rates for these groups. In identifying appropriate accommodations, policy makers must consider the specific characteristics of the test-takers and the nature of the skills and knowledge (referred to as “constructs”) to be tested. Effective accommodations should not materially alter the nature of the task or the required response, and they should yield scores that are valid indicators of the constructs being assessed. Both state assessment programs and the sponsors of NAEP have set policies regarding the accommodations they will allow. NAEP also has policies for identifying students who cannot meaningfully participate, even with accommodations, and excluding them from the assessment.

However, the existing base of research about the effects of accommodations on test performance and the comparability of scores obtained under standard and accommodated conditions is insufficient to provide empirical support for many of the decisions that must be made regarding the testing of these students. Thus it has been difficult for both state and NAEP officials to make these decisions, and the result has been considerable variation in what is allowed, both from state to state and between NAEP and the state assessments.³ These kinds of variations in policy, combined with an insufficient research base, create significant impediments to the interpretation of assessment results for both students with disabilities and English language learners.

STUDY APPROACH

At the request of the U.S. Department of Education, the National Research Council formed the Committee on the Participation of Students with Disabilities and English Language Learners in NAEP and Other Large-Scale Assessments. The charge to the committee was to (1) synthesize research findings about the effects of accommodations on test performance, (2) review the procedures used for making inclusion and accommodation decisions for large-scale assessment programs, and (3) determine the implications of these findings for NAEP inclusion and accommodation policies.

²	The No Child Left Behind Act requires that “not less than 95 percent” of students in each identified subgroup who are enrolled in the school be required to take the assessments used to meet its provisions (P.L. 107-110, Jan. 8, 2002, 115 STAT 1448-1449).
³	It is important to note that some of this variation can be accounted for by differences in assessment goals, particularly constructs measured, from program to program.

Page 3 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

The committee’s report addresses three broad areas related to the committee’s charge:

The policies and practices for the inclusion and provision of accommodations provided for students with disabilities and English language learners that are followed in the National Assessment of Educational Progress and other large-scale assessments conducted by states.
The research that has been conducted to date on the effects of accommodations on test performance and the comparability of results from accommodated and standard administrations.
The validity of inferences that are made from the results of accommodated assessments.

POLICIES AND PROCEDURES FOR INCLUSION AND ACCOMMODATION

States’ policies and procedures for including students with disabilities and English language learners in large-scale assessments have evolved in recent years, and these policies remain in flux as officials strive to refine their procedures for inclusion and accommodation to comply with legislative mandates. These policies and procedures vary widely from state to state, in part because of differences among assessments and assessment systems, and state policies are different from those used for NAEP assessments.

While NAEP’s policies are in many cases different from those in place for state assessments, NAEP results are nevertheless affected by state guidelines in two ways. First, NAEP sampling is based on information from the states regarding the characteristics of all of their students. Thus, the samples used to ensure that the population assessed in NAEP is representative of the nation’s student population as a whole are dependent on state policies for classifying students as having a disability or being an English language learner, both because states’ classification policies and practices vary and because samples from different states may differ in ways that are not explicitly recognized. Second, once NAEP officials identify the sample of students to be included in the assessment, they provide the schools in which those students are enrolled with guidance as to how to administer the assessment. NAEP officials rely on school-level coordinators, who organize the administration of NAEP at schools, to make consistent and logical decisions about which of the students selected in the original sample can meaningfully participate in the assessment. NAEP officials also rely on school coordinators to make decisions about how participating students will be accommodated, on the basis of their individual needs, NAEP’s policies, and the accommodations available in that school.

This variability in policies and procedures is important for several reasons. First, NAEP results are reported separately for states so that comparisons can be

Page 4 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

made from state to state. If there are differences across states in the characteristics of the sample and in the conditions under which students participate, then the results may not be comparable. Second, national NAEP results are based on the results for each state, so the accuracy of national results is dependent on the consistency of sampling and administration across states. Finally, the policies that govern whether students are included in or excluded from NAEP assessments differ from the policies for inclusion in state assessments. Comparisons of results from a state assessment with those from state NAEP are likely to be affected by these differences.

The accuracy of all data regarding the academic progress of students with disabilities and English language learners is dependent on the uniformity of both the criteria with which students are selected for participation in testing and the administration procedures that are used, including accommodation procedures. In order for the inferences made from assessments of these students to be justifiable, test administration procedures must be uniform. The committee addresses several aspects of this problem with recommendations regarding both policy and research. We address assessment policies first.

The goal of maximizing the participation rates of students with disabilities and English language learners in all testing is widely shared, and is certainly one that the committee endorses. Moreover, the variation in both inclusion and accommodation policies and procedures is too great and has a number of negative effects. The committee therefore makes the following recommendations:⁴

Recommendation 4-1: NAEP officials should

review the criteria for inclusion and accommodation of students with disabilities and English language learners in light of federal guidelines;
clarify, elaborate, and revise their criteria as needed; and
standardize the implementation of these criteria at the school level.

Recommendation 4-2: NAEP officials should work with state assessment directors to review the policies regarding inclusion and accommodation in NAEP assessments and work toward greater consistency between NAEP and state assessment procedures.

Because NAEP is intended to report on the educational progress of students in the United States, it is important to evaluate the extent to which the results fully represent the student population in each state and in the nation. To evaluate this, the committee reviewed policy materials available from NAEP, NAEP reports,

⁴	The recommendations are numbered according to the chapter in which they are discussed and the sequence in which they appear in each chapter.

Page 5 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

and data available from external data sources (e.g., data reported to Congress under the mandates of the Individuals with Disabilities Education Act, data available from the Office of English Language Acquisition, Language Enhancement, and Academic Achievement for Limited English Proficient Students, and U.S. census data). However, our review revealed a number of issues about which we were concerned. First, our review of NAEP policy materials revealed that there is no clear definition of the target population to which NAEP results are intended to generalize. The policy guidance supplied is not sufficiently specific for making judgments about the extent to which inclusion and exclusion decisions affect the generalizability of the results to the targeted population. Specifically, this guidance does not make clear whether it is intended that all students with disabilities and English language learners should be part of the target population or, if not, which of them are excluded. We therefore recommend that:

Recommendation 4-3: NAEP officials should more clearly define the characteristics of the population of students to whom results are intended to generalize. This definition should serve as a guide for decision making and the formulation of regulations regarding inclusion, exclusion, and reporting.

Our review of NAEP reports also revealed that both national and state NAEP reports now indicate the percentages of the NAEP sample that are students with disabilities and English language learners. This is a recent revision to the reports and represents a first step toward making it possible to evaluate the degree to which NAEP samples conform to the definition of the NAEP population. However, the data currently available from state and federal agencies are insufficient to complete the desired comparisons. In the committee’s view, it is important to know the extent to which the percentages in the NAEP reports correspond to the percentages of students with disabilities and English language learners reported in other sources. Furthermore, the committee believes that states are undertaking additional efforts at collecting such data, partly in response to the requirements of legislation such as the No Child Left Behind Act of 2001. We encourage all parties (NAEP as well as state and federal agencies) to collect and compile such data so that the desired comparisons can be made. We make two recommendations related to this point:

Recommendation 4-4: NAEP officials should evaluate the extent to which their estimates of the percentages of students with disabilities and English language learners in a state are comparable to similar data collected and reported by states, to the extent feasible given the data that are available. Differences should be investigated to determine the causes.

Recommendation 4-5: Efforts should be made to improve the availability of data about students with disabilities and English language learners. State-

Page 6 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

level data are needed that report the total number of English language learners and students with disabilities by grade level in the state. This information should be compiled in a way that allows comparisons to be made across states and should be made readily accessible.

RESEARCH REGARDING ACCOMMODATED ASSESSMENTS

The effects of accommodations on test performance have been researched, but the findings that emerge from the existing research are inconclusive. These findings provide little guidance to those who must make decisions about which accommodations are suitable for particular kinds of students participating in particular assessments. What is lacking is research that directly examines the effects of accommodations on the validity of inferences to be made from scores. Overall, existing research does not provide definitive evidence about which procedures would produce the most valid estimates of performance. Moreover, it does not establish that scores for students with disabilities and English language learners obtained under accommodated conditions are as valid as scores for other students obtained under unaccommodated conditions.

For the most part, existing research focuses on comparisons of the scores obtained under standard and accommodated conditions. We conclude that this research design is useful for understanding the effects of accommodations and does provide evidence of differential group performance, but we also conclude that it does not directly address the validity of inferences made from accommodated assessments.

In the committee’s judgment, additional types of validity evidence should be collected. Validation studies in which evidence of criterion relatedness is collected have been conducted with the ACT and the SAT; similar studies should be conducted for NAEP and state assessments as well. We acknowledge that identification of appropriate criterion variables is more straightforward in the context of college admissions than in the K-12 context; however, we encourage efforts to identify and obtain reliable data on concurrent measures that can provide evidence of criterion validity for K-12 achievement results, such as grades, teacher ratings, or scores on other assessments of similar constructs. In addition, analyses of test content and test-takers’ cognitive processes would provide further insight into the validity of results from accommodated administrations in the K-12 context. We note that NAEP’s sponsors have initiated several studies of this kind since our committee began its investigations, and we encourage them to continue in this vein. Specifically, the committee makes the following recommendation:

Recommendation 5-1: Research should be conducted that focuses on the validation of inferences based on accommodated assessments of students with disabilities and English language learners. Further research should be guided by a conceptual argument about the way accommodations are intended

Page 7 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

to function and the inferences the test results are intended to support. This research should include a variety of approaches and types of evidence, such as analyses of test content, test-takers’ cognitive processes, and criterion-related evidence, and other studies deemed appropriate.

THE VALIDITY OF INFERENCES REGARDING ACCOMMODATED ASSESSMENTS

In an evaluation of a testing program’s policies regarding the accommodation of students with disabilities and English language learners, the validity of interpretations of the results should be the primary consideration. A test administered with an accommodation is intended to yield results that are equivalent to the results of a standard administration of the test to a student who has no disability and is fluent in English. However, accommodations can have unintended consequences.

For example, an accommodation might not only allow the student to demonstrate his or her proficiency with regard to the construct being assessed but might also provide that student with an unwarranted advantage over other test-takers. In this case, the resulting score would be an inflated estimate, and hence a less valid indicator, of the test-taker’s proficiency.

Thus, determining which accommodation is right for particular circumstances is difficult. The accommodation must at the same time be directly related to the disability or lack of fluency for which it is to compensate and be independent of the constructs on which the student is to be tested. The appropriateness of accommodations might best be understood in terms of a conceptual framework that encompasses both the inferences a test score is designed to support (e.g., the test-taker reads at a particular skill level) and alternative inferences (e.g., the test-taker could not complete the work in the allotted time because of a disability unrelated to his or her skill level on the construct being assessed) that might actually account for the score and therefore impede the collection of the desired information about the test-taker.

Thus the validity of inferences made from the results of any accommodated assessment must be evaluated in terms of the general validation argument for the assessment. That is, there should be a clear definition of the construct the assessment is designed to measure (the targeted skills and knowledge) and the ancillary skills required to demonstrate proficiency on the targeted construct (such as the reading level required to decode the instructions and word problems on an assessment of mathematics skills). Furthermore, the inferences that test designers intend the test results to support should be specified, and evidence in support of claims about how the test results are to be interpreted should be provided.

When accommodations operate as intended, the same kinds of inferences can be drawn from accommodated results as from results based on standard administrations. Only when validation arguments are clearly articulated can the validity

Page 8 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

of results from accommodated assessments be evaluated. For this reason, the committee examined the available documentation of the constructs to be assessed and the validity evidence laid out for NAEP assessments.

The committee concludes that the validation argument for NAEP in general is not as well articulated as it should be. NAEP officials have not explicitly described the kinds of inferences they believe their data should support, and we found insufficient evidence to support the validity of inferences made from accommodated NAEP scores. While arguments in support of the validity of accommodated administrations of NAEP are discussed in some NAEP materials, more extensive and systematic investigation of the validity of inferences made from these scores is needed. At the same time, as has been noted, existing research does not provide definitive evidence about which procedures will, in general, produce the most valid estimates of performance for students with disabilities and English language learners.

The committee presents a model for evaluating the validity of inferences made from accommodated assessments, based in part on the evidence-centered design approach that has been developed by Hansen, Mislevy, and Steinberg (Hansen and Steinberg, 2004; Hansen et al., 2003; see also Mislevy et al., 2003). This model offers a means of disentangling the potential explanations for observed performance on an assessment and using this analysis to discern the effects of accommodations on the validity of inferences to be based on the observed performance. This approach provides a first step in laying out validity arguments to be investigated through empirical research.

We make three recommendations regarding validity research on accommodations. Although these recommendations are specific to NAEP, we strongly urge the sponsors of state and other large-scale assessment programs to consider them as well.

Recommendation 6-1: NAEP officials should identify the inferences that they intend should be made from its assessment results and clearly articulate the validation arguments in support of those inferences.

Recommendation 6-2: NAEP officials should embark on a research agenda that is guided by the claims and counterclaims for intended uses of results in the validation argument they have articulated. This research should apply a variety of approaches and types of evidence, such as analyses of test content, test-takers’ cognitive processes, criterion-related evidence, and other studies deemed appropriate.

Recommendation 6-3: NAEP officials should conduct empirical research to specifically evaluate the extent to which the validation argument that underlies each NAEP assessment and the inferences the assessment was designed to support are affected by the use of particular accommodations.

Page 9 Cite

Suggested Citation:"Executive Summary." National Research Council. 2004. Keeping Score for All: The Effects of Inclusion and Accommodation Policies on Large-Scale Educational Assessments. Washington, DC: The National Academies Press. doi: 10.17226/11029.

×

CONCLUSION

The difficulties related to assessing students with disabilities and English language learners are not new, but the consequences of relying on scores whose accuracy cannot be ensured have become even greater because of the provisions of the No Child Left Behind Act of 2001. Under that legislation, states are responsible for tracking the academic progress of the students with disabilities and English language learners in every school. The consequences for a school of failing to ensure that these students make progress every year toward ambitious targets of performance are serious. However, regardless of that legislation or any modifications that may be made to it, the validity of test-based inferences made about the performance of students with disabilities and English language learners will be critical for those who seek to understand the academic progress of these students, as well as for those who make policies that affect them.

Under the present circumstances, the need for tests results in which users can have justifiable confidence is, if not more critical, at least more immediate. The No Child Left Behind Act requires schools and jurisdictions to take their legal obligations to assess English language learners and students with disabilities more seriously than many have done in the past. While the committee considers this renewed attention to the needs of both groups of students an important development in the effort to close persistent achievement gaps, the goal cannot be met without accurate data. Credible assessment results can play a crucial role in revealing not only where schools are failing these students, but also where they are succeeding with them. Thus it is essential that evidence of the validity of assessment results be thoroughly investigated to be sure that these results can provide useful information regarding students with disabilities and English language learners for schools, local jurisdictions, and the nation.