from these shortcomings (Klein and Hamilton, 1999). The NRC Committee on Title I Testing and Assessment (National Research Council, 1999c) examined a variety of issues involved in using assessments for accountability, including the thorny question of how to determine how much improvement it is reasonable to expect schools to achieve in a given period and how such expectations can be determined.
High-stakes testing can have negative as well as positive effects on classroom practice. Teachers may focus on the content to be covered on the test to the exclusion of other relevant material or spend inordinate amounts of time administering worksheets and drilling students on basic facts in preparation for multiple-choice tests (Smith, 1991; Koretz, 1996; Linn and Herman, 1997). Teachers also may coach students on test items. Such coaching appears to explain why Kentucky, a leading state in using tests for school accountability and education reform, found that large score gains on its state assessment were not reflected in gains on NAEP or on college admission tests. Moreover, gains were far higher on items that had been administered the previous year (Hambleton et al., 1995; Koretz and Barron, 1998). Avoiding these negative effects requires that policy makers desiring to use test scores for high-stakes purposes be aware of such potential misuses and ensure that testing programs build in the necessary features to minimize distortions in both classroom practice and test results.
Another danger in high-stakes testing is that tests may be misused. Tests are created with specific uses in mind. Experts agree that the validity, reliability, and fairness of a test can only be assessed in the context of how the scores on that test are used. Policy makers, practitioners, and the press, however, are prone to use test scores to meet a variety of needs, many of which may not have been anticipated by test developers.
North Carolina, which developed tests specifically for its new school-based accountability program, provides an example. Faced with the pressure from that program, several school districts are now trying to shift the pressure for performance down to the student level. Some school districts, for example, are now requiring that students who do poorly on the state test go to summer school and, if they continue to fail the test, to be held back. The controversial issue here is whether the state test, which uses matrix sampling and was developed for the purpose of school-wide accountability, is valid for the purpose of individual accountability.
The NRC Committee on Appropriate Test Use concluded that existing mechanisms for enforcing appropriate test use (mainly professional norms and legal action through administrative enforcement or litigation) are inadequate and suggested consideration of possible new methods, practices, and safeguards (National Research Council, 1999a). The committee did not recommend a particular strategy or combination of strategies, but it noted that promoting proper test use will require multiple strategies.