The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
A more fundamental problem is that a test can only be an effective inducement to better instruction if teachers understand its content. Indeed, one of the axioms of advocates for accountability-based performance assessments is that the assessments can serve as models of desired student performance and instruction. To serve those functions, tasks very similar to those in the performance assessment must be publicized, and when dropped, they must be replaced by very similar tasks. Thus, accountability and test security impose competing pressures on assessment design.
Another approach to avoid narrowing of instruction and the attendant inflation of scores is to design a broad test while administering only a portion of it to each student. In theory, if a test of a given domain were made broad enough and specific items on it were secure, teaching to the test and teaching the domain would not differ. In practice, most domains are large enough that this ideal cannot be achieved within reasonable limits of expense and testing time, but it can be approximated by making a test several times as long as could be administered to any one student and giving each student a systematic sample of the test's items. In a common variant of this approach, usually called "matrix sampling," a test is broken into several different forms that are distributed randomly to students within the unit for which performance is to be reported, usually a school or district.
Matrix sampling and other sampling approaches, however, have an important limitation that forces a politically difficult trade-off. While sampling approaches can provide better estimates of group performance than can be obtained with a traditional test, they typically do not provide adequate assessments of individual students. Scores for individual students may be unreliable because of short test length, and students' scores may be dependent on the particular form they are given. In theory, one might separate the two functions, using a sampling-based assessment to provide aggregate estimates and a second, linked assessment to provide scores for individual students. In practice, however, it has been politically difficult to maintain an expensive and time-consuming sampling-based testing program that does not provide reliable scores for individual students. For example, Governor Wilson cited the lack of scores for individual students as one reason for terminating California's well-known assessment program, and both Kentucky and Maryland are now wrestling with the question of how to respond to pressure to report student-level scores from their matrix-sampled assessments.
The extent to which techniques such as test security, novel test content, and matrix sampling can avoid inflated test scores remains a matter of debate. There are as yet no good data on score inflation in accountability systems that rely heavily on all of these techniques. It seems likely, however, that they will provide only a partial solution for the foreseeable future.