Science education is facing dramatic change. The new A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (hereafter, referred to as “the framework”) and the Next Generation Science Standards: For States, By States are designed to guide educators in significantly altering the way science is taught—from kindergarten through high school (K-12). The framework is aimed at making science education more closely resemble the way scientists actually work and think. It is also aimed at making instruction reflect research on learning that demonstrates the importance of building coherent understandings over time.
The framework structures science learning around three dimensions: the practices through which scientists and engineers do their work; the key crosscutting concepts that link the science disciplines; and the core ideas of the disciplines of life sciences, physical sciences, earth and space sciences, and engineering and technology. It argues that they should be interwoven in every aspect of science education, most critically, curriculum, instruction, and assessment. The framework emphasizes the importance of the connections among the disciplinary core ideas, such as using understandings about chemical interactions from physical science to explain biological phenomena.
We use the term “three-dimensional science learning” to refer to the integration of these dimensions. It describes not the process of learning, but the kind of thinking and understanding that science education should foster. The framework and the Next Generation Science Standards (NGSS) are also grounded in the ideas
that science learning develops over time and assessments will need to mark students’ progress toward specific learning objectives.
This new vision of science learning presents considerable challenges—but also a unique and valuable opportunity for assessment. Existing science assessments have not been designed to capture three-dimensional science learning, and developing assessments that can do so requires new approaches. Rethinking science assessment in this way also offers an opportunity to address long-standing problems with current approaches. In this context, the following charge was given to the Committee on Developing Assessments of Science Proficiency in K-12:
The committee will make recommendations for strategies for developing assessments that validly measure student proficiency in science as laid out in the new K-12 science education framework. The committee will review recent and current, ongoing work in science assessment to determine which aspects of the necessary assessment system for the framework’s vision can be assessed with available techniques and what additional research and development is required to create an overall assessment system for science education in K-12. The committee will prepare a report that includes a conceptual framework for science assessment in K-12 and will make recommendations to state and national policy makers, research organizations, assessment developers, and study sponsors about the steps needed to develop valid, reliable, and fair assessments for the framework’s vision of science education. The committee’s report will discuss the feasibility and cost of its recommendations.
AN ASSESSMENT SYSTEM
The NGSS describe specific goals for science learning in the form of performance expectations—statements about what students should know and be able to do at each grade level—and thus what should be tested at each grade level. Each performance expectation incorporates all three dimensions, and the NGSS emphasize the importance of the connections among scientific concepts. The NGSS’s performance expectations place significant demands on science learning at every grade level. It will not be feasible to assess all of the performance expectations for a given grade level during a single assessment occasion. Students will need multiple—and varied—assessment opportunities to demonstrate their competence on the performance expectations for a given grade level (Conclusion 2-3).1
1The conclusion and recommendation numbers refer to the report’s chapters and the order in which they appear.
In addition, the effective evaluation of three-dimensional science learning will require more than a one-to-one mapping between the performance expectations and assessment tasks. More than one assessment task may be needed to adequately assess students’ mastery of some performance expectations, and any given assessment task may assess aspects of more than one performance expectation. Moreover, to assess both understanding of core knowledge and facility with a practice, assessments may need to probe students’ use of a given practice in more than one disciplinary context. Assessment tasks that attempt to test practices in isolation from one another may not be meaningful as assessments of the three-dimensional science learning called for by the NGSS (Conclusion 2-4).
To adequately cover the three dimensions, assessment tasks will need to contain multiple components, such as a set of interrelated questions. It may be useful to focus on individual practices, core ideas, or crosscutting concepts in a specific component of an assessment task, but, together, the components need to support inferences about students’ three-dimensional science learning as described in a given performance expectation (Conclusion 2-1).
Measuring the learning described in the NGSS will require assessments that are significantly different from those in current use. Specifically, the tasks designed to assess the performance expectations in the NGSS will need to have the following characteristics (Conclusion 4-1):
- include multiple components that reflect the connected use of different scientific practices in the context of interconnected disciplinary ideas and crosscutting concepts;
- address the progressive nature of learning by providing information about where students fall on a continuum between expected beginning and ending points in a given unit or grade; and
- include an interpretive system for evaluating a range of student products that are specific enough to be useful for helping teachers understand the range of student responses and provide tools for helping teachers decide on next steps in instruction.
Designing specific assessment tasks and assembling them into tests will require a careful approach to assessment design. Some currently used approaches, such as evidence-centered design and construct modeling, do reflect such design through the use of the fundamentals of cognitive research and theory. With these approaches, the selection and development of assessment tasks, as well as the
scoring rubrics and criteria for scoring, are guided by the construct to be assessed and the best ways of eliciting evidence about students’ proficiency with that construct. In designing assessments to measure proficiency on the NGSS performance expectations, the committee recommends the use of one of these approaches (Recommendation 3-1).
More broadly, a system of assessments will be needed to measure the NGSS performance expectations and provide students, teachers, administrators, policy makers, and the public with the information each needs about student learning (Conclusion 6-1). This conclusion builds on the advice in prior reports of the National Research Council. We envision a range of assessment strategies that are designed to answer different kinds of questions with appropriate degrees of specificity and provide results that complement one another. Such a system needs to include three components:
- assessments designed to support classroom instruction,
- assessments designed to monitor science learning on a broader scale, and
- a series of indicators to monitor that the students are provided with adequate opportunity to learn science in the ways laid out in the framework and the NGSS.
Classroom assessments are an integral part of instruction and learning and should include both formative and summative tasks: formative tasks are those that are specifically designed to be used to guide instructional decision making and lesson planning; summative tasks are those that are specifically designed to assign student grades.
The kind of instruction that will be effective in teaching science in the way the framework and the NGSS envision will require students to engage in scientific and engineering practices in the context of disciplinary core ideas—and to make connections across topics through the crosscutting ideas. To develop the skills and dispositions to use scientific and engineering practices needed to further their learning and to solve problems, students need to experience instruction in which they (1) use multiple practices in developing a particular core idea and (2) apply each practice in the context of multiple core ideas. Effective use of the practices often requires that they be used in concert with one another, such as in supporting explanation with an argument or using mathematics to analyze data (Conclusion 4-2).
Assessment activities will be critical supports for this instruction. Students will need guidance about what is expected of them and opportunities to reflect on their performance as they develop proficiencies. Teachers will need information about what students understand and can do so they can adapt their instruction. Instruction that is aligned with the framework and the NGSS will naturally provide many opportunities for teachers to observe and record evidence of students learning. The student activities that reflect such learning include developing and refining models; generating, discussing, and analyzing data; engaging in both spoken and written explanations and argumentation; and reflecting on their own understanding. Such opportunities are the basis for the development of assessments of three-dimensional science learning.
Assessment tasks that have been designed to be integral to classroom instruction—in which the kinds of activities that are part of high-quality instruction are deployed in particular ways to yield assessment information—are beginning to be developed. They demonstrate that it is possible to design tasks that elicit students’ thinking about disciplinary core ideas and crosscutting concepts by engaging them in scientific practices and that students can respond to them successfully (Conclusion 4-3). However, these types of assessments of three-dimensional science learning are challenging to design, implement, and properly interpret. Teachers will need extensive professional development to successfully incorporate this type of assessment into their practice (Conclusion 4-4).
State and district leaders who design professional development for teachers should ensure that it addresses the changes posed by the framework and the NGSS in both the design and use of assessment tasks as well as instructional strategies. Professional development has to support teachers in integrating practices, crosscutting concepts, and disciplinary core ideas in inclusive and engaging instruction and in using new modes of assessment that support such instructional activities (Recommendation 4-1).
Curriculum developers, assessment developers, and others who create instructional units and resource materials aligned to the new science framework and the NGSS will need to ensure that assessment activities included in such materials (such as mid- and end-of-chapter activities, suggested tasks for unit assessment, and online activities) require students to engage in practices that demonstrate their understanding of core ideas and crosscutting concepts. These materials should also attend to multiple dimensions of diversity (e.g., by connecting with students’ cultural and linguistic resources). In designing these materials, development teams need to include experts in science, science learning, assessment design, equity and diversity, and science teaching (Recommendation 4-2).
Assessments designed for monitoring purposes, also referred to as external assessments, are used to audit student learning over time. They are used to answer important questions about student learning such as: How much have the students in a certain school system learned over the course of a year? How does achievement in one school system compare with achievement in another? Is one instructional technique or curricular program more effective than another? What are the effects of a particular policy measure such as reduction in class size?
To measure the NGSS performance expectations, the tasks used in assessments designed for monitoring purposes need to have the same characteristics as those used for classroom assessments. But assessments used for monitoring pose additional challenges: they need to be designed so that they can be given to large numbers of students, to be sufficiently standardized to support the intended monitoring purpose, to cover an appropriate breadth of the NGSS, and to be feasible and cost-effective for states.
The multicomponent tasks needed to effectively evaluate the NGSS performance expectations will include a variety of response formats, including performance-based questions, those that require students to construct or supply an answer, produce a product, or perform an activity. Although performance-based questions are especially suitable for assessing some aspects of student proficiency on the NGSS performance expectations, it will not be feasible to cover the full breadth and depth of the NGSS performance expectations for a given grade level with a single external assessment comprised solely or mostly of performance-based questions: performance-based questions take too much time to complete, and many of them would be needed in order to fully cover the set of performance expectations for a grade level. Consequently, the information from external on-demand assessments (i.e., assessments that are administered at a time mandated by the state) will need to be supplemented with information gathered from classroom-embedded assessments (i.e., assessments that are administered at a time determined by the district or school that fits the instructional sequence in the classroom) to fully cover the breadth and depth of the performance expectations. Both kinds of assessments will need to be designed so that they produce information that is appropriate and valid to support a specific monitoring purpose (Recommendation 6-1).
Classroom-embedded assessments may take various forms. They could be self-contained curricular units, which include instructional materials and assessments provided by the state or district to be administered in classrooms. Alternatively, a state or district might develop item banks of tasks that could be used at the appropriate time
in classrooms. States or districts might require that students in certain grade levels assemble portfolios of work products that demonstrate their levels of proficiency. Using classroom-embedded assessments for monitoring purposes leaves a number of important decisions to the district or school; quality control procedures would need to be implemented so that these assessments meet appropriate technical standards (Conclusion 5-2).
External assessments would consist of sets of multicomponent tasks. To the extent possible, these tasks should include—as a significant and visible aspect of the assessment—multiple, performance-based questions. When appropriate, computer-based technology should be used to broaden and deepen the range of performances used on these assessments (Recommendation 6-2).
Assessments that include performance-based questions can pose technical and practical challenges for some monitoring purposes. For instance, it can be difficult both to attain appropriate levels of score reliability and to produce results that can be compared across groups or across time, comparisons that are important for monitoring. Developing, administering, and scoring the tasks can be time consuming and resource intensive. To help address these challenges, assessment developers should take advantage of emerging and validated innovations in assessment design, scoring, and reporting to create and implement assessments of three-dimensional science learning (Recommendation 5-2). In particular, state and local policy makers should design the external assessment component of their systems so that they incorporate the use of matrix-sampling designs whenever appropriate (rather than requiring that every student take every item).
INDICATORS OF OPPORTUNITY TO LEARN
Indicators of the opportunity to learn make it possible to evaluate the effectiveness of science instructional programs and the equity of students’ opportunity to learn science in the ways envisioned by the new framework. States should routinely collect information to monitor the quality of the classroom instruction in science, the extent to which students have the opportunity to learn science in the way called for in the framework, and the extent to which schools have the resources needed to support learning (such as teacher qualification and subject area pedagogical knowledge, and time, space, and materials devoted to science instruction) (Recommendation 6-6).
Measures of the quality and content of instruction should also cover inclusive instructional approaches that reach students of varying cultural and linguistic backgrounds. Because assessment results cannot be fully understood in the absence
of information about the opportunities to learn what is tested, states should collect relevant indicators—including the material, human, and social resources available—to support student learning so they can contextualize and validate the inferences drawn from assessment results (Recommendation 7-6). This information should be collected through inspections of school science programs, surveys of students and teachers, monitoring of teacher professional development programs, and documentation of curriculum assignments and student work.
The assessment system that the committee recommends differs markedly from current practice and will thus take time to implement, just as it will take time to adopt the instructional programs needed for students to learn science in the way envisioned in the framework and the NGSS. States should develop and implement new assessment systems gradually and establish carefully considered priorities. Those priorities should begin with what is both necessary and possible in the short term while also establishing long-term goals leading to implementation of a fully integrated and coherent system of curriculum, instruction, and assessment (Recommendation 7-1).
The committee encourages a developmental path for assessment that is “bottom up” rather than “top down”: one that begins with the process of designing assessments for the classroom, perhaps integrated into instructional units, and moves toward assessments for monitoring. In designing and implementing their assessment systems, states will need to focus on professional development. States will need to include adequate time and resources for professional development so that teachers can be properly prepared and guided and so that curriculum and assessment developers can adapt their work to the vision of the framework and the NGSS (Recommendation 7-2).
State and district leaders who commission assessment development should ensure that the plans address the changes called for by the framework and the NGSS. They should build into their commissions adequate provision for the substantial amounts of time, effort, and refinement that are needed to develop and implement the use of such assessments: multiple cycles of design-based research will be necessary (Recommendation 7-3).
Existing and emerging technologies will be critical tools for creating a science assessment system that meets the goals of the framework and the NGSS, particularly those that permit the assessment of three-dimensional knowledge, as well as the streamlining of assessment administration and scoring
(Recommendation 7-7). States will be able to capitalize on efforts already under way to implement the new Common Core State Standards in English language arts and mathematics, which have required educators to integrate learning expectations and instruction. Nevertheless, the approach to science assessment that the committee recommends will still require modifications to current systems. States will need to carefully lay out their priorities and adopt a thoughtful, reflective, and gradual process for making the transition to an assessment system that supports the vision of the framework and the NGSS.
A fundamental component of the framework’s vision for science education is that all students can attain its learning goals. The framework and the NGSS both stress that this can only happen if all students have the opportunity to learn in the new ways called for and if science educators are trained to work with multiple dimensions of diversity. A good assessment system can play a critical role in providing fair and accurate measures of the learning of all students and providing students with multiple ways of demonstrating their competency. Such an assessment system will include formats and presentation of tasks and scoring procedures that reflect multiple dimensions of diversity, including culture, language, ethnicity, gender, and disability. Individuals with expertise in diversity should be integral participants in developing state assessment systems (Recommendation 7-5).