related to technical quality (e.g., reliability, fairness, and validity), were discussed in a previous workshop session (see Chapter 2). Tests with innovative characteristics (like any tests) send signals to educators, students, and parents about the learning that is most valued in the system—and in many cases innovative testing has led to changes in practice. Testing also has costs, including a burden in both time and resources, which are likely to be different for different innovative assessments. Testing also provokes reactions from stakeholders, particularly politicians.
Performance and other kinds of alternative assessments were popular in the 1990s, when 24 states were using, developing, or exploring possibilities for using one of these approaches (Stecher and Hamilton, 2009). Today, those kinds of alternative assessments are much less prevalent. States have moved away from these approaches, primarily for political and budget reasons, but a look at several of the most prominent examples highlights some lessons, Brian Stecher explained. Individuals who had experience with several of the programs added their perspectives.
Vermont was a pioneer in innovative assessment, having implemented a portfolio-based program in writing and mathematics in 1991 (Stecher and Hamilton, 2009). The program was designed both to provide achievement data that would permit comparison of schools and districts and to encourage instructional improvements. Teachers and students in grades 4 and 8 collected work to represent specific accomplishments, and these portfolios were complemented by a paper-and-pencil test.
Early evaluations raised concerns about scoring reliability and the validity of the portfolio as an indicator of school quality (Koretz et al., 1996). After efforts to standardize scoring rubrics and selection criteria, the reliability improved, but evaluators concluded that the scores were not accurate enough to support judgments about school quality.
The research (Koretz et al., 1996) showed that teachers did alter their practice in response to the assessment: for example, they focused more on problem solving in mathematics. Many schools began using portfolios in other subjects, as well, because they found them useful. However, some critics said that teachers did not uniformly demonstrate a clear understanding of the intended criteria for selecting student work, and others commented that teachers began overemphasizing the specific strategies that were included in the standardized rubrics. Costs were high—$13 per student just for scoring. The program was