RECOMMENDATION 13 Create a new system of science assessment and monitoring. State science education leaders should create a long-term plan to develop and implement a new system of state science assessments that are designed to measure the performance expectations in the Next Generation Science Standards. The system should incorporate multiple elements, including on-demand tests, classroom-embedded assessments, and measures of opportunity to learn at the state or district level. When possible, state science education leaders and those responsible for state assessment should consider developing partnerships, perhaps with other states, to facilitate the work of developing new science assessments.
RECOMMENDATION 14 Help teachers develop appropriate formative assessment strategies. School leaders need to ensure that professional development for science teachers covers issues of assessment and supports teachers in using formative assessment of student thinking to inform ongoing instruction.
The Next Generation Science Standards (NGSS) describe specific goals for science learning in the form of performance expectations, statements about what students should know and be able to do at each grade level. Each performance expectation incorporates a practice in the context of a core idea and may also require students to call on a particular crosscutting concept.1
The performance expectations place significant demands on science learning at every grade level. It will not be feasible to assess each performance expectation for a given grade level during a single assessment occasion. Students will need multiple assessment opportunities—using a variety of formats—to demonstrate that their competence meets the expectations for a given grade level.
Measuring the performance expectations in the NGSS and providing all stakeholders—students, teachers, administrators, policy makers, and the public—with the information each needs about student learning will require assessments that are different in key ways from current science assessments. Specifically, the tasks designed to assess the performance expectations in the NGSS will need the following characteristics:
- include multiple components that reflect the connected use of different scientific practices in the context of interconnected disciplinary ideas and crosscutting concepts;
- address the progressive nature of learning by providing information about where students fall on a continuum between expected beginning and ending points in a given unit or grade; and
- include an interpretive system for evaluating a range of student products that are specific enough to be useful for helping teachers understand the range of student responses and, for formative assessment, provide tools that can help teachers decide on next steps in instruction.
The National Research Council (2014a) report on assessment provides numerous examples of the kinds of assessment tasks that have these characteristics.
Building on the advice in previous National Research Council reports (2006a, 2012, 2014a), the committee recommends a systems approach to science assessment. The system should include a range of assessment strategies that
______________
1For a detailed discussion and analysis of assessment, see National Research Council (2014a), on which this chapter is based.
are designed to answer different kinds of questions for different stakeholders (students, teachers, administrators, policy makers, and the public). Such a system needs to include three components:
- assessments designed to support classroom instruction,
- assessments designed to monitor science learning on a broader scale, and
- a series of indicators to track whether students are provided with adequate opportunity to learn science in the ways laid out in A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (hereafter referred to as “the Framework”) and the NGSS.
The rest of this section discusses each of these components.
Assessment to Support Classroom Instruction
Classroom assessments are an integral part of instruction and learning and should include both formative and summative tasks. Formative tasks are those that are specifically designed to be used to guide instructional decision making and lesson planning; summative tasks are those that are specifically designed to assign student grades.
Assessments Designed to Monitor Science Learning on a Broader Scale
Student learning needs to be monitored over time in order to evaluate the effectiveness of the science education system. Given the breadth and depth of material covered in the NGSS, new approaches will be needed to monitor students’ learning.
Assessments will need to include a variety of response formats, one of which needs to be performance-based questions that require students to construct or supply an answer, produce a product, or perform an activity. Although performance-based questions are especially suitable for assessing some aspects of student proficiency on the NGSS performance expectations, it will not be feasible to address the full breadth and depth of the NGSS performance expectations for a given grade level with a single external assessment comprised solely or mostly of performance-based questions. This is because performance-based questions take a significant amount of time to complete, and many of them would be needed in order to fully cover the set of performance expectations for a grade level. It will therefore be impossible to assess every student, on every standard, every year, using a one-time, on-demand test.
Thus, information from external on-demand assessments (those administered at a time mandated by the state) should be supplemented with information gathered from classroom-embedded assessments, which are administered at a time determined by the district or school or by the teacher, a time that fits the instructional sequence in the classroom. Classroom-embedded assessments may take various forms. They could be self-contained curricular units, which include instructional materials and assessments provided by the state or district to be administered in classrooms. Alternatively, a state or district might develop item banks of tasks that could be used at the appropriate time in classrooms. Another approach would be for states or districts to require that students in certain grade levels assemble portfolios of work products that demonstrate their levels of proficiency.
Indicators to Track Students’ Opportunity to Learn
It is important to ensure that the dramatic changes in curriculum, instruction, and assessments prompted by the Framework and the NGSS do not exacerbate current inequities in science education. Instead, it is expected that the changes can begin to reduce inequities, while raising the level of science education for all students. Information should be routinely collected to monitor the quality of the classroom instruction that students receive, to determine whether all students have the opportunity to learn science in the way called for in the Framework, and to see whether schools have the resources they need to support science learning. This information might include onsite program inspections, student and teacher surveys, monitoring of teachers’ professional development, and a system for periodic documentation of samples of teachers’ lesson plans and associated student work (National Research Council, 2014a). Some observation of classroom instruction is important in order to ensure that the science and engineering practices are being implemented.
IMPLEMENTING A NEW ASSESSMENT SYSTEM
The systems approach to science assessment that we recommend cannot be reached by simply tinkering with an old system. A systematic but gradual process that reflects carefully considered priorities and timelines will be needed to make the transition to an assessment system that supports the vision of the Framework. Those priorities should begin with what is both necessary and possible in the short term while also establishing long-term goals for implementation of a fully integrated and coherent system of curriculum, instruction, and assessment. State
leaders and educators should expect the development and implementation of the new system to take place in stages, over a number of years. Teachers will want to know the plans and timelines for changes in assessment at the state level; and at the same time, they will need professional development that supports them in using more open-ended assessment tasks in the classroom context.
The new system should be developed with an approach that begins with the process of designing assessments for the classroom, perhaps integrated into instructional units or curriculum materials, and then moves to designing large-scale assessments. Placing the initial focus on assessments that are close to the point of instruction will be the best way to identify successful ways to teach and assess knowledge of science practices as well as crosscutting concepts in specific disciplinary contexts. Effective strategies can then serve as the basis for developing assessments at other levels, including those used for accountability.
In designing and implementing assessment systems, states will need to focus on professional development. States will need to include adequate time and resources for professional development related to assessment strategies so that teachers can be properly prepared and guided and so that curriculum and assessment developers can adapt their work to the vision of the Framework.
State leaders who commission assessment development should ensure that the contracts address the changes called for by the Framework and the NGSS. They should therefore include in the contracts substantial amounts of time for the initial work and revision that will be needed to develop and implement such assessments: multiple cycles of design-based research will be necessary. Existing item banks are likely to be inadequate for gauging students’ learning in alignment with the NGSS.
Existing and emerging technologies will be critical tools for creating a science assessment system that meets the goals of the NGSS, particularly those that permit the assessment of performance expectations that combine practices, core ideas, and crosscutting concepts. Technology will also be important for streamlining assessment administration and scoring.
States are likely to be able to capitalize on efforts already under way to implement the new Common Core State Standards in English language arts and mathematics, which have required educators to integrate technology for assessment along with new learning expectations and instruction. Nevertheless, the approach to science assessment recommended in the NRC report (National Research Council, 2014a) Developing Assessments for the Next Generation Science Standards, and that we endorse here, may require additional modifications
BOX 6-1
NEW ENGLAND COMMON ASSESSMENT PROGRAM
The New England Common Assessment Program (NECAP) is a series of reading, writing, mathematics, and science achievement tests, administered annually, that were developed in response to the federal No Child Left Behind Act. Students in New Hampshire, Rhode Island, and Vermont have been participating in NECAP since 2005, and Maine joined the program in 2009.
The state departments of education in New Hampshire, Rhode Island, and Vermont developed a common set of grade-level expectations and test specifications in mathematics, reading, and writing. The success of that effort led to development of common assessment targets and test specifications for science. Student scores are reported at four levels of academic achievement; Proficient with Distinction, Proficient, Partially Proficient, and Substantially Below Proficient. Reading and math are assessed in grades 3-8 and 11, writing is assessed in grades 5, 8, and 11, and science is assessed in grades 4, 8 and 11. The reading, mathematics, and writing tests are administered each year in October. The science tests are administered in May.
to current systems. States will need to carefully set their priorities and adopt a thoughtful, reflective, and gradual process for making the transition to an assessment system and technology platform for assessment that can support the vision of the Framework. Effective use of technology for both instructional and assessment purposes will be critical. For example, if the system includes classroom-based performance tasks, technology will be needed to allow teachers to submit student work products, share assessment rubrics, and grade the work of other teachers’ students.
Given the complexity of the science assessment system envisioned, state science education leaders and those responsible for state assessment should consider partnerships with other states for the work. Multiple small coalitions may lead to a richer set of possibilities being developed than would be developed by only one or two large coalitions. An example of a relatively small coalition is the New England Common Assessment Program: see Box 6-1. The different coalitions could then share information with one another and with state leaders about their systems, blueprints for assessments, assessment tasks, and resources for assessment development. The goal should be to provide teachers and students with the best tools possible for assessing student learning in the classroom, as well to provide required accountability information through externally mandated assessments.
Failing to Differentiate the Purposes of Assessment
Teachers collect assessment data for a variety of purposes. Classroom assessments are used to diagnose student needs near the beginning of a unit, to monitor progress along the way, and to find out how students are thinking about a topic so that teachers can determine how best to support students’ learning or so students can evaluate their own progress. Assessments are also used to assign grades or to determine the effectiveness of a given unit for the class as a whole. It is important for a teacher to be clear from the start how the data will be used so that the right data can be collected and analyzed in a timely fashion. For example, for formative (classroom) assessments, ones that simply mimic external assessment tasks are unlikely to be useful (Penuel and Shepard, in press).
Failing to Respond to Assessment Results
There is no point in collecting assessment data if they are not used. In fact, collecting unnecessary data can be detrimental since assessment takes time away from learning experiences. Assessment data that are reported to teachers, or to students, too long after the assessment lose their effectiveness for supporting further student learning. It is also important for schools and districts to address any inequities that are revealed through the assessments of opportunities to learn.
Using Old Assessments While Mandating New Instructional Methods
It is unrealistic to expect teachers to immediately incorporate all the changes in instruction that are needed to support the NGSS. Instead, a staged approach, with ongoing professional development support, will be needed. Thus, it will be ineffective to continue to use old-style assessments to measure students and teachers while asking teachers to shift their teaching practice. Wherever possible, the transition needs to be supported by temporary relief from high-stakes accountability targets to allow both teachers and students the time to “hit their stride” with new demands of the NGSS (National Research Council, 2014a). The challenge for leaders is to find effective ways to monitor and support this progress while alleviating the anxiety of penalties for inadequate performance on tests not aligned to the new standards that can stifle attempts to make changes.