There is wide public acceptance of the value of a system in which assessments measure student progress in meeting education standards and the test results are used to hold students, schools, educators, and jurisdictions to account for their performance. But, Lorrie Shepard pointed out in the summary session, two very different theories of action regarding the way such a system will actually bring about improvements have been put forward. And neither the differences between them nor the implications of adopting one or the other have been widely recognized.
THEORY AND GOALS
The incentives theory, as she called the first approach, is that given sufficient motivation, teachers and other school personnel will develop ways to improve instruction. This perspective was the basis for the Elementary and Secondary Education Act of 1994, which required states to establish standards and assessments. The other approach, which Shepard called the coherent capacity-building theory, posited that an additional step—beyond establishing clear expectations and the motivation to meet them—was needed. Educators would also need the capacity, in the form of professional development and other supports, to improve their teaching in order for the accountability measures to have the desired effect (see, e.g., National Research Council, 1995). Shepard suggested that the incentives theory is dominant, and that capacity building has been neglected.
Similar imprecision is evident in the possible interpretations of some of the top reform goals of the present moment, Shepard suggested, including:
reforming assessments using conceptually rich tasks,
integrating 21st-century skills and academic content,
creating coherence between large-scale and classroom assessments, and
using data to improve classroom instruction.
For example, treating the first two bullets as distinct enterprises makes little sense, given that the research on the developmental nature of learning seems to suggest the importance of weaving content and higher-order thinking skills together (see Chapter 2).
Shepard said she believes that policy makers do not completely understand that effective teaching relies on a model for how learning proceeds, in which cognitive skills and the knowledge of when and how to use them develop together with content knowledge and understanding of how to generalize from it. She suggested that, without this theory of learning, policy makers are likely to accept current modes of assessment. They may believe, for example, that narrowing the curriculum is necessary because basic reading and mathematics skills are so important. They may not be aware that excessive drill on worksheets that resemble summative tests does not give students the opportunity to understand the context and purpose for what they are learning—which would enhance their skill development (see Elmore, 2003; Blanc et al., 2010; Bulkley et al., 2010; Olah et al., 2010). Similarly, although policy makers are in favor of data-driven decision making, Shepard said, she believes that many educators lack the substantive expertise to interpret the available data and use it to make meaningful changes in their practice.
During the workshop discussions, many presenters drew attention to the churning that affects education policy because of shifts in political goals and personnel at the state level. Given that reality, coherence will have to come at a lower level, Shepard argued. The United States does not have a common curriculum, she suggested, because it has no tradition of relying on subject matter experts in many decisions about education. Psychometricians and policy makers have typically taken the lead in the development of assessments, for example: subject matter experts have generally been involved in some way, but they are not usually asked to oversee the development of frameworks, item development, and the interpretation of results.
Now, however, the interests of subject-matter experts and cognitive researchers who have been developing models of student learning within particular disciplines have converged, and this convergence offers the possibility of a coherence that could withstand the inevitable fluctuations in political interests. However, the practical application of this way of thinking about
learning is not yet widely understood, Shepard observed. Thus, for Shepard, the opportunity of the present moment is to take the first steps in inventing and implementing the necessary innovations. It is not practical to expect that any one state or consortium could develop an ideal system for all grades and subject areas on the first try, so the focus should be on incremental improvements. She suggested that each consortium grant award should be focused on the development of a system of “next-generation, high-quality” classroom and summative assessments for one manageable area—say, for mathematics for grades 4 through 8.
She noted that Lauren Resnick has proposed a way of implementing innovative approaches incrementally. Resnick has suggested that content-based “modules” that incorporate both a rich curriculum and associated assessments could be adopted one by one and incrementally incorporated into an existing full curriculum. In the near term, this would leave existing assessments unchanged, but, over time, the accumulating body of new modules would eventually lead to a completely transformed system, in which accountability information could be drawn from the assessment components of the innovative curriculum modules. This approach would allow educators to proceed gradually, as the research to support the development of such modules grows, and also to sidestep many of the political and practical challenges that have hampered past programs.
Shepard also emphasized the importance of considering curriculum along with new and improved assessment models. She cautioned that establishing higher standards means not only setting cut-points at a higher level than they are currently, but also incorporating material of a substantively different character into assessments. If this is done without corresponding changes to curriculum and instruction, the result will be predictable—students are likely not to succeed on the new assessment. In the end, after all, the purpose of the improvements, she said, is to “change the character of what we teach and then make those opportunities available to all students and make sure that the assessment can track any changes over time.”
Shepard closed by reminding everyone that “to truly transform learning opportunities in classrooms in ways that research indicates are possible, it will be necessary to remove [existing impediments], especially low-level tests that misdirect effort; provide coherent curricula consistent with ambitious reforms; and take seriously the need for capacity-building at every level of the education system.”
In Diana Pullin’s summary remarks, she also focused on the opportunity presented not only by the Race to the Top funds, but also by what appears to be an important evolution in the thinking of many policy makers and educators about the purposes and potential of assessment. The federal funding, she observed, has presented an opportunity, but, “we are on a fragile edge
[between] being able to do something new and better and dramatically different, or something that is only a slight improvement or perhaps a step back.”
A number of challenges complicate the picture, she said. Limitations in teacher preparation and in-service development have left teachers not yet ready to interpret and use the kind of rich information hoped for from innovative assessments. The capacity of the testing industry to keep pace with a rapid shift in priorities for state testing is not clear. The workshop discussions did not offer any formulas for the necessary innovation, she noted, but innovation by definition cannot be accomplished by formula. Pullin said the real challenge may be to push past the boundaries that may have confined people’s thinking. Those in the assessment community may not have the knowledge and skills about leaning theory and the education of students with disabilities or English language learners, and those in the discipline and curriculum communities may not have thorough understanding of assessment. Yet these intellectual traditions and perspectives must be integrated if a new generation of assessments is to be successful.
Others shared the concern that there is risk in the current situation. Discussant Joe Willhoft noted that there is little doubt that assessments influence instruction and learning—and that existing ones can do so to good effect. For example, he said, he believes that a writing assessment used in Washington state yielded significant changes in instruction and in expectations, and, in turn, marked improvements in students’ writing skills. His concern is that many questions about how new, consortium-based assessment systems might work have yet to be answered.
Discussant Deborah Seligman addressed a similar theme. She noted that the education community appears to be ready for a change in thinking about assessments, but that states’ economies may not be robust enough to sustain the full-bore effort necessary for it to be a success. She noted that even though most educators and policy makers would agree that writing is one of the most important domains to assess, California cut this assessment first: it did so not for substantive reasons but because the program is expensive and easy to separate form other elements of the assessment program. Politics, she observed, is either the factor that can make things happen or the largest obstacle to progress.
Nevertheless, Gene Wilhoit commented, there is a general political consensus to move rapidly in this new direction. The common core standards are laying the groundwork for this change, but the policy decisions that will follow—and need to be made quite rapidly—will have a profound impact on public education for the next generation. He said that those making these decisions should be urged to pay close attention to the guidance of experts and the examples of countries that are far ahead of the United States as they proceed. Many others agreed, and a representative final word of caution might be, in the words of Rebecca Zwick, “don’t put all your eggs in that basket. Have a plan B.
Have something else you know you can score and report, but at the same time have a piece that you are using to explore innovative ideas.”
Shepard and other discussants were asked to reflect on their highest priorities for research that would support progress in developing and implementing innovative assessments. Many of the ideas overlapped, and they fell into a few categories: measurement; content; teaching and learning, and experimentation.
Measurement Many participants emphasized the need for psychometric models that were developed generations ago to be updated in light of recent research on learning and cognition. New ways of thinking about what should be measured and what sort of information would be useful to educators have been put forward (see Chapter 2), and it is clear that current psychometric models do not fit them well. The new models illustrate, for example, the importance of each of the stages that students go through in learning complex material. This idea implies that teachers (and students) need information about students’ developing understanding of concepts and facts and how they fit into a larger intellectual structure. Yet educational measurement has tended to focus on one-dimensional rankings according to students’ mastery of specific knowledge and skills at a given time. The goals of traditional psychometrics remain important, but perhaps need to be stretched. Means of establishing the validity of new kinds of assessments for new kinds of uses are needed.
Discussants pointed to the need for a strategy for making sure that the information an assessment provides is being used to good effect and a strategy for checking the links in a proposed learning trajectory, to be sure each stage in the progression is reasonable and well supported. The capacity to compare results across assessments is already being stretched, and the introduction of more innovative modes of assessment may present challenges that cannot be solved with current procedures. But the policy demand for comparative information suggests a need for new thinking about the precise questions that are important and the kinds of information that can provide satisfactory answers.
Other fields, one participant noted, have grappled with similar issues. In medicine, for example, simulations are used in credentialing assessments, despite the lack of procedures for equating precisely across assessments that use simulations. It would be worthwhile to explore the decisions that the medical profession made and their outcomes. It may be, for example, that the technical standards for modes of assessment could vary somewhat, according to the intended purpose to which the results will be put.
A final thought offered on measurement was that the measurement community should be conducting basic research that addresses not only immediate
problems, but also the challenges and technological changes that are likely to emerge a decade from now. Some participants responded that the capacity of the testing profession is already being stretched and that there is little leisure for this kind of thinking—while others stressed the importance of looking ahead.
Content The measurement community may need to catch up with advances in cognitive research, but the overall picture of what students should learn is perhaps even less complete, Mark Wilson and others noted. Deeper cognitive analysis of the content to be taught and assessed is still needed. Detailed learning trajectories have been put forward in a few areas of science and mathematics, but they are only a beginning. Understanding of the barriers to advancing along a trajectory, and of the efficacy of different approaches to teaching students to overcome those barriers, are in the beginning stages. Outside of science and mathematics, even less progress has been made in tracing learning trajectories.
Without a much broader base of research on these questions, the progress in developing innovative assessments will be hampered. Policy makers are currently working from hypothesized trajectories of how learning in reading, English/language arts, and mathematics progresses from kindergarten through grade 12. These need to be elaborated, and the field needs a plan for gathering data about the validity of the common core standards that are based on them and for improving the descriptions of the trajectories.
Teaching and Leaning An important theme of the workshop was the intimate relationship between models of measurement and models of teaching and learning. If assessments are to play the valuable role in education that many envision, they must not only align with what is known about how students think and learn, but also provide meaningful information that educators can use. As many speakers emphasized, if educators are to play their part, their preparation and professional development must encompass this new thinking about assessment and the means to use it. Research is needed to support these changes. Teachers also have much to contribute to evolving thinking about teaching and assessment. Involving them in the research will be critical to ensuring that new kinds of assessment data can really improve instruction.
It is not only data that teachers need, though, some participants pointed out. Their capacity to reflect on and evaluate not only their own practice and capacity to adapt, but also the value of innovations they are asked to try, is also important. Working individually, in small groups, as whole departments, or even as schools, they can provide a check on such questions as the practical application of theoretical learning trajectories.
Experimentation Rebecca Zwick and others noted that there is no one optimal assessment system waiting to be discovered. A range of international models
offer promising possibilities and should be explored in greater detail. The development of state consortia offers the opportunity for the education community to explore a variety of different models and the theories that underlie them and to work out a variety of ways of addressing key system goals. The idea that educators and policy makers should experiment on students may have negative connotations, but many participants also spoke about the critical importance of taking innovation step by step and learning from each step. In no other field, one participant pointed out, would policy makers overlook the importance of research and development to something as important as redesigning the assessment system. Ideally, the process would begin with a clear picture of the questions that need answers and the development of a strategy for researching those questions and testing hypotheses.
Several participants noted that state consortia, individual states, districts, schools, teachers, and students can all contribute to the design of new aspects of assessment systems and the important work of trying them out and collecting information about what works well and what does not. More typical, however, has been a model in which a whole new assessment system is created and presented to the public as ready to be implemented statewide. The big risk in such an approach is that implementation problems could doom an idea with valuable potential before it had a chance to be fully implemented or that individual valuable features of the approach would be thrown out along with features that did not work.
Echoing the comments of Lorrie Shepard, several participants suggested that retaining some or all of the elements of existing assessment systems, while gradually incorporating new elements, would allow for both the development of political and public acceptance and the flexibility to benefit from experience. An incremental approach may also make it possible to address different aspects of a system in a way that would be too radical to attempt for the whole. Whether the innovations are new instructional units based on common core standards, in which assessment is embedded; revised curricula that better map the learning trajectories in new standards; or new formats and designs for summative assessments; or some other innovation, it should be possible to gradually construct a coherent system that meets the needs for both accountability and instructional guidance.