Another issue is that the implications of computer-based approaches for validity and reliability have not been thoroughly evaluated.
The challenge of developing innovative assessments that are high in quality and cost-effective has not yet been fully resolved, in Hamilton’s view. “Recent history suggests that the less you constrain prompts and responses, the more technically and logistically difficult it can be to obtain high-quality results,” she observed. Yet past and current work has explored ways to measure important constructs that were not previously accessible, and the sheer number and diversity of innovations has been “impressive and likely to shape future test development in significant ways,” she said. Work in science has been at the forefront, in part because problem solving and inquiry are important components of the domain but have been difficult to assess. She suggested that many pioneering efforts in science assessments are likely to find applications in other subjects, over time. This is an important development because the current policy debate has emphasized the role that data should play in decision making at all levels of the education system, from determining teacher and principal pay to informing day-to-day instructional decisions. Assessments that provide deeper and richer information will better meet the needs of students, educators, and policy makers. The possibility of broadening the scope of what can be assessed for accountability purposes is also likely to reinforce other kinds of reform efforts.
Hamilton also pointed out the tradeoffs inherent in any proposed use of large-scale assessment data. Test-score data are playing an increasingly prominent role in policy discussions because of their integral role in statistical analyses that can be used to support inferences about teachers’ performance and other accountability questions. Growth, or value-added, models (designed to isolate the effects of teachers from other possible influences on achievement) rely on annual testing of consecutive grades—a need that may mean significant constraints on the sorts of innovative assessments that can be used. Using a combination of traditional and innovative assessments may provide a suitable tradeoff, Hamilton said. She also called for improved integration between classroom and large-scale assessments. “No single assessment is likely to serve the needs of a large-scale program and classroom-level decision making equally well,” she argued. A coordinated system that includes a variety of assessment types to address the needs of different user groups might be the wisest solution.