High Stakes is a report by the National Research Council developed in response to a congressional request for such a study and for recommendations “on appropriate methods, practices, and safeguards to ensure that (a) existing and new tests that are used to assess student performance are not used in a discriminatory manner or inappropriately for student promotion, tracking, or graduation; and (b) existing and new tests adequately assess student reading and mathematics comprehension” (p. 1).
The report serves as a primer for the sensible use of high-stakes tests— capitalizing on their positive characteristics and minimizing their negative aspects. As noted in the introduction:
Most people seem to agree that America's public schools are in need of repair. How to fix them has become a favorite topic of policymakers, and for many the remedy includes increased reliance on the testing of students. The standards-based reform movement, for example, is premised on the idea of setting clear, high standards for what children are supposed to learn and then holding students—and often educators and schools—to those standards.
The logic seems clear: Unless we test students' knowledge, how will we know if they have met the standards? And the idea of accountability, which is also central to this theory of school reform, requires that the test results have direct and immediate consequences: A student who does not meet the standard should not be promoted, or awarded a high school diploma. This report is about the appropriate use of tests in making such high-stakes decisions about individual students, (p. 13)
High Stakes considers what constitutes appropriate use of tests in making teaching, promotion, and graduation decisions affecting individual students and emphasizes three criteria for judging the appropriateness of a particular test (p. 23):
“Measurement validity. Is the test appropriate for a particular purpose? Is there evidence that the constructs to be measured are relevant in making a decision? Does the test measure those constructs? Is it confounded with other constructs that are not relevant to the decision? Is the test reliable and accurate?
Attribution of cause. Does a student's performance on a test reflect knowledge and skills based on appropriate instruction, or is it attributable to poor instruction? Or is it attributable to factors such as language barriers or disabilities that are irrelevant to the construct being measured?