students, including some with disabilities. All such assessments should be designed to be informative about the achievement of all students. In particular, task selection and scoring criteria should be designed to accommodate varying levels of performance. These reliability concerns are magnified when high-stakes decisions will be based on individual test score results.

Promising Approaches in Test Design5

Research and development in the field of educational testing is continually experimenting with new modes, formats, and technologies. Continued development of new forms of test construction, such as new ways of constructing test items and using computers, may hold promise for accommodating the needs of students with disabilities in large-scale assessment programs.

Item response theory (IRT), which is rapidly displacing classical test theory as the basis for modern test construction, is one promising development. IRT models describe "what happens when an examinee meets an item" (Wainer and Mislevy, 1990:66). They are based on the notion that students' performance on a test should reflect one latent trait or ability and that a mathematical model should be able to predict performance on individual test items on the basis of that trait.6

To use IRT modeling in test construction and scoring, test items are first administered to a large sample of respondents. Based on these data, a model is derived that predicts whether a given item will be answered correctly by a given individual on the basis of estimates of the difficulty of the item and the skill of the individual. A good model yields information about the difficulty of items for individuals with differing levels of skill. Items for which the model does not fit—that is, for which students' estimated mastery does not predict performance well—are discarded. This information is later used to score tests given to actual examinees.

Item response theory offers potential for including students with disabilities

5  

This section is taken from pp. 182–183 of Educating One and All (National Research Council, 1997).

6  

Most IRT models are predicated on the notion that a test is unidimensional and that scores should therefore reflect a single latent trait. Recently, however, IRT models have been extended to multidimensional domains as well.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement