A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (National Research Council, 2012a, hereafter referred to as “the framework”) provided the foundation for new science education standards, which were published the following year (NGSS Lead States, 2013). The framework is grounded in a new vision for science education from kindergarten through high school (K-12): that all students—not just those who intend to pursue science beyond high school—will learn core scientific ideas in increasing depth over multiple years of schooling. It calls for an approach to education that closely mirrors the way that science is practiced and applied, and it focuses on the cumulative learning opportunities needed to ensure that (National Research Council, 2012a, p. 1):
[By] the end of 12th grade, all students have some appreciation of the beauty and wonder of science; possess sufficient knowledge of science and engineering to engage in public discussions on related issues; are careful consumers of scientific and technological information related to their everyday lives; are able to continue to learn about science outside school; and have the skills to enter careers of their choice, including (but not limited to) careers in science, engineering, and technology.
The framework cites well-known limitations in K-12 science education in the United States—that it “is not organized systematically across multiple years of school, emphasizes discrete facts with a focus on breadth over depth, and does not provide students with engaging opportunities to experience how science is actually done” (p. 1). To address these limitations, the framework details three dimen-
sions for science education—the practices through which scientists and engineers do their work, the key crosscutting concepts for all disciplines, and the core ideas of the disciplines—and it argues that the dimensions need to be interwoven in every aspect of science education, including assessment.
Developing new assessments to measure the kinds of learning the framework describes presents a significant challenge and will require a major change to the status quo. The framework calls for assessments that capture students’ competencies in performing the practices of science and engineering by applying the knowledge and skills they have learned. The assessments that are now in wide use were not designed to meet this vision of science proficiency and cannot readily be retrofitted to do so. To address this disjuncture, the Committee on Developing Assessments of Science Proficiency in K-12 was asked to help guide the development of new science assessments.
The committee was charged to make recommendations to state and national policy makers, research organizations, assessment developers, and funders about ways to use best practices to develop effective, fair, reliable, and high-quality assessment systems that support valid conclusions about student learning. The committee was asked to review current assessment approaches and promising research and to develop both a conceptual framework for K-12 science assessment and an analysis of feasibility issues. The committee’s full charge is shown in Box 1-1.
Science education has been under a great deal of scrutiny for several decades. Policy makers have lamented that the United States is falling behind in science, technology, engineering, and mathematics (STEM) education, based on international comparisons and on complaints that U.S. students are not well prepared for the workforce of the 21st century (see, e.g., National Research Council, 2007). The fact that women and some demographic groups are significantly underrepresented in postsecondary STEM education and in STEM careers is another fact that has captured attention (Bystydzienski and Bird, 2006; National Research Council, 2011a; Burke and Mattis, 2007). The framework discusses ways in which some student groups have been excluded from science and the need to better link science instruction to diverse students’ interests and experiences.1
1See Chapter 11 of the framework (National Research Council, 2012a) for discussion of these issues.
STATEMENT OF TASK
The committee will make recommendations for strategies for developing assessments that validly measure student proficiency in science as laid out in the new K-12 science education framework. The committee will review recent and current, ongoing work in science assessment to determine which aspects of the necessary assessment system for the framework’s vision can be assessed with available techniques and what additional research and development is required to create an overall assessment system for science education in K-12. The committee will prepare a report that includes a conceptual framework for science assessment in K-12 and will make recommendations to state and national policy makers, research organizations, assessment developers, and study sponsors about the steps needed to develop valid, reliable, and fair assessments for the framework’s vision of science education. The committee’s report will discuss the feasibility and cost of its recommendations.
Researchers, educators, and others have argued that a primary reason for the problems is the way science is taught in U.S. schools (see, e.g., National Research Council, 2006; National Task Force on Teacher Education in Physics, 2010; Davis et al., 2006; Association of Public and Land-grant Universities, 2011). They have pointed out specific challenges—for example, that many teachers who are responsible for science have not been provided with the knowledge and skills required to teach in the discipline they are teaching or in science education2—and the lack of adequate instructional time and adequate space and equipment for investigation and experimentation in many schools (OECD, 2011; National Research Council, 2005). Another key issue has been the inequity in access to instructional time on science and associated resources and its influence on the performance of different demographic groups of students. Others have focused on a broader failing, arguing that K-12 science education is generally too disconnected from the way science and engineering are practiced and should be reformed. The framework reflects and incorporates these perspectives.
The framework’s approach is also grounded in a growing body of research on how young people learn science, which is relevant to both instruction and
2This critique is generally targeted to both middle and secondary school teachers, who are usually science specialists, and elementary teachers who are responsible for teaching several subjects.
assessment. Researchers and practitioners have built an increasingly compelling picture of the cumulative development of conceptual understanding and the importance of instruction that guides students in a coherent way across the grades (National Research Council, 2001, 2006). A related line of research has focused on the importance of instruction that is accessible to students of different backgrounds and uses their varied experiences as a base on which to build. These newer models of how students learn science are increasingly dominant in the science education community, but feasible means of widely implementing changes in teacher practice that capitalize on these ideas have been emerging only gradually.
The new framework builds on influential documents about science education for K-12 students, including the National Science Education Standards (National Research Council, 1996) the Benchmarks for Science Literacy: A Tool for Curriculum Reform (American Association for the Advancement of Science, 1993, 2009), and the Science Framework for the 2009 National Assessment of Educational Progress (National Assessment Governing Board, 2009). At the same time, the landscape of academic standards has changed significantly in the last few years, as the majority of states have agreed to adopt common standards in language arts and mathematics.3
National and state assessment programs, as well as international ones, have been exploring new directions in assessment and will be useful examples for the developers of new science assessments. Two multistate consortia received grants under the federal Race to the Top Assessment Program to develop innovative assessment in language arts and mathematics that will align with the new Common Core State Standards. The Partnership for Assessment of Readiness for College and Careers (PARCC) and the Smarter Balanced Assessment Consortium (SBAC) are working to develop assessments that can be implemented during the 2014-2015 school year.4 We have followed their progress closely, but our recommendations for science assessment are completely separate from their work. Examples from international science assessments and the approach to developing assessments for the revised Advanced Placement Program in high schools in biology are other valuable models.
3The Common Core State Standards have been adopted by 45 states, the District of Columbia, four territories, and the U.S. Department of Defense Education Activity. For more information, see http://www.corestandards.org/ [August 2013].
4For details, see http://www2.ed.gov/programs/racetothetop/index.html [June 2013]. Information about PARCC, SBAC, and the Common Core State Standards can be found, respectively, at http://www.parcconline.org/about-parcc, http://www.smarterbalanced.org/, and http://www.corestandards.org/ [June 2013].
New standards, called the Next Generation Science Standards (NGSS), have been developed specifically in response to the approach laid out in the framework by a team of 26 states that are working with Achieve, Inc. The developers included representatives from science, engineering, science education, higher education, and business and industry (NGSS Lead States, 2013). Draft versions of the document were subjected to revisions based on extensive feedback from stakeholders and two rounds of public comment. The NGSS team also worked to coordinate the new science standards with new Common Core State Standards in English language arts and mathematics so that intellectual links among the disciplines can be emphasized in instruction. Preliminary drafts were available in May 2012, January 2013, and the final version of the NGSS was released in April 2013.
The new K-12 science framework provides an opportunity to rethink the role that assessment plays in science education. The most fundamental change the framework advocates—that understanding of core ideas and crosscutting concepts be completely integrated with the practices of science—requires changes in the expectations for science assessment and in the nature of the assessments used.
At present, the primary purpose of state-level assessment in the United States is to provide information that can be used for accountability purposes. Most states have responded to the requirements of the No Child Left Behind Act of 2001 (NCLB) by focusing their assessment resources on a narrow range of assessment goals. In science, NCLB requires formal, statewide assessment once in three clusters of grades (3-5, 6-9, and high school).5 That is, unless a state does more than NCLB requires, students’ understanding of science is formally evaluated only three times from kindergarten through grade 12, usually with state assessments that are centrally designed and administered. This approach to assessment does not align with the goals of the new framework: it does not reflect the importance of students’ gradual progress toward learning goals. Monitoring of student learning is important, but most current tests do not require students to demonstrate knowledge of the integration between scientific practices and conceptual understanding. The NGSS, for example, include an expectation that students understand how the way in which scientific phenomena are modeled may influence conceptual under-
5NCLB requires testing in mathematics and language arts every year.
standing, but few current science assessments evaluate this aspect of science. Thus, aligning new tests with the framework’s structure and goals will require the use of a range of assessment tools designed to meet a variety of needs for information about how well students are learning complex concepts and practices.
Among the states, the time, resources, and requirements for testing students in science vary widely: states each have devised their own combination of grades tested, subject areas covered, testing formats, and reporting strategies. Most states rely heavily on assessments that are affordable, efficient, and easily standardized: these are generally easy-to-score multiple-choice and short open-ended questions that assess recall of facts. Assessments used as benchmarks of progress, and even those embedded in curriculum, often use basic and efficient paper-and-pencil formats.
Although the various state science assessments often provide technically valid and reliable information for specific purposes, they cannot systematically assess the learning described in the framework and the three-dimensional performance standards described in the NGSS. New kinds of science assessments are needed to support the new vision and understanding of students’ science learning. Developing an assessment program that meets these new goals presents complex conceptual, technical, and practical challenges, including cost and efficiency, obtaining reliable results from new assessment types, and developing complex tasks that are equitable for students across a wide range of demographic characteristics.
The committee’s charge led us first to a detailed review of what is called for by the framework and the NGSS. We were not asked to take a position on these documents. The framework sets forth goals for science learning for all students that will require significant shifts in curriculum, instruction, and assessment. The NGSS represent a substantial and credible effort to map the complex, three-dimensional structure of the framework into a coherent set of performance expectations to guide the development of assessments (as well as curriculum and instruction). The committee recognizes that some mapping of this kind is an essential step in the alignment of assessments to the framework, and the NGSS are an excellent beginning. We frequently consulted both documents: the framework for the vision of student learning and the NGSS for specific characterization of the types of outcomes that will be expected of students.
We also examined prior National Research Council reports, such as Knowing What Students Know (National Research Council, 2001) and Systems for State Science Assessment (National Research Council, 2006), and other materials that are relevant to the systems approach to assessment called for in the new framework. And we explored research and practice in educational measurement that are relevant to our charge: the kinds of information that can be obtained using large-scale assessments; the potential benefits made possible by technological and other innovations; what can be learned from recent examples of new approaches, including those used outside the United States; and the results of attempts to implement performance assessments as part of education reform measures in the 1980s and 1990s. Last, we examined research and practice related to classroom-based assessments in science and the role of learning progressions in guiding approaches to science curricula, instruction, and assessment.
As noted above, this project was carried out in the context of developments that in many cases are rapidly altering the education landscape. The committee devoted attention to tracking the development of the NGSS and the implementation of the new Common Core State Standards.6 As this report went to press, 11 states and the District of Columbia had adopted the NGSS.7 The work of PARCC and SBAC, which are developing assessments to align with the Common Core State Standards and have explored some current technological possibilities, has also been important for the committee to track. However, we note that both consortia were constrained in their decisions about technology and task design, both by the challenge of assessing every student every year, as mandated by NCLB for mathematics and language arts, and by a timeline for full implementation that left little space for exploration of some of the more innovative options that we explored for science.
This committee’s charge required a somewhat unusual approach. Most National Research Council committees rely primarily on syntheses of the research literature in areas related to their charge as the basis for their conclusions and recommendations. However, the approach to instruction and assessment envisioned in the framework and the NGSS is new: thus, there is little research on which to base our recommendations for best strategies for assessment. Furthermore, the
7As of April 2014, the states were California, Delaware, Illinois, Kansas, Kentucky, Maryland, Nevada, Oregon, Rhode Island, Vermont, Washington, and the District of Columbia.
development of the NGSS occurred while our work was underway, and so we did not have the benefit of the final version until our work was nearly finished.
In carrying out our charge, we did review the available research in relevant fields, including educational measurement, cognitive science, learning sciences, and science education, and our recommendations are grounded in that research. They are also the product of our collective judgment about the most promising ways to make use of tools and ideas that are already familiar, as well as our collective judgment about some tools and ideas that are new, at least for large-scale applications in the United States. Our charge required that we consider very recent research and practice, alongside more established bodies of work, and to develop actionable recommendations on the basis of our findings and judgments. We believe our recommendations for science assessment can be implemented to support the changes called for in the framework.
Much of our research focused on gathering information on the types of science assessments that states currently use and the types of innovations that might be feasible in the near future. We considered this information in light of new assessment strategies that states will be using as part of their efforts to develop language arts and mathematics assessments for the Common Core through the Race to the Top consortia, particularly assessments that make use of constructed-response and performance-based tasks and technology-enhanced questions. To help us learn more about these efforts, representatives from the two consortia (PARCC and SBAC) made presentations at our first meeting, and several committee members participated in the June 2012 Invitational Research Symposium on Technology Enhanced Assessments, sponsored by the K-12 Center at the Educational Testing Service. That symposium focused on the types of innovations under consideration for use with the consortia-developed assessments, including the use of technology to assess hard-to-measure constructs and expand accessibility, the use of such innovative formats as simulations and games, and the development of embedded assessments.
We took a number of other steps to learn more about states’ science assessments. We reviewed data from a survey of states conducted by the Council of State Science Supervisors on the science assessments they used in 2012, the grades they tested, and the types of questions they used. Based on these survey data, we identified states that made use of any types of open-ended questions, performance tasks, or technology enhancements and followed up with the science specialists in those states: Massachusetts, Minnesota, New Hampshire, New York, Ohio, Oregon, Rhode Island, Vermont, and Utah.
During the course of our data gathering, a science assessment specialist in Massachusetts organized a webinar on states’ efforts to develop performance-based tasks. Through this webinar we learned of work under way in Connecticut, Ohio, and Vermont. Members of the committee also attended meetings of the State Collaborative on Assessment and Student Standards (SCASS) in science of the Council of Chief State School Officers (CCSSO) and a conference on building capacity for state science education sponsored by the CCSSO and the Council of State Science Supervisors.
We also held a public workshop, which we organized in conjunction with the SCASS. The workshop included presentations on a range of innovative assessments, including the College Board’s redesigned Advanced Placement Biology Program, the 2009 science assessment by the National Assessment of Educational Progress that made use of computer-interactive and hands-on tasks, WestEd’s SimScientist Program, and curriculum-embedded assessments from the middle school curriculum materials of IQWST (Investigating and Questioning our World through Science and Technology, Krajcik et al., 2013). SCASS members served as discussants at the workshop. The workshop was an opportunity to hear from researchers and practitioners about their perspectives on the challenges and possibilities for assessing science learning, as well as to hear about various state assessment programs. The workshop agenda appears in Appendix A.
Throughout the report the committee offers examples of assessment tasks that embody our approach and demonstrate what we think will be needed to measure science learning as described in the framework and the NGSS. Because the final version of the NGSS was not available until we had nearly completed work on this report, none of the examples was specifically aligned with the NGSS performance expectations. However, the examples reflect the ideas about teaching, learning, and assessment that influenced the framework and the NGSS, and they can serve as models of assessment tasks that measure both science content and practice.8 The examples have all been used in practice and appear in Chapters 2, 3, 4, and 5: see Table 1-1 for a summary of the example tasks included in the
8These examples were developed by committee members and other researchers prior to this study.
TABLE 1-1 Guide to Examples of Assessment Tasks in the Report
|Chapter and Example||Disciplinary Core Ideaa||Practices||Crosscutting Concepts||Grade Level|
1 What Is Going on Inside Me? (Chapter 2)
|PS1: Matter and its interactions
LS1: From molecules to organisms: Structures and processes
|Constructing explanations||Energy and matter: flows, cycles, and conservation||Middle school|
|Engaging in argument from evidence|
2 Pinball Car (Chapter 3)
|PS3: Energy||Planning and carrying out investigations||Energy and matter: flows, cycles, and conservation||Middle school|
|LS1.A: Structure and function: Organisms have macroscopic structures that allow for growth
||Asking questions||Patterns||Grade 3|
|Planning and carrying out investigations|
|LS1.B Growth and development of organisms: Organisms have unique and diverse life cycles||Analyzing and interpreting data|
|Engaging in argument from evidence|
4 Behavior of Air (Chapter 4)
|PS1: Matter and its interactions||Developing and using models
||Energy and matter: flows, cycles, and conservation.||Middle school|
|Engaging in argument from evidence||Systems and system models|
5 Movement of Water (Chapter 4)
|ESS2: Earth’s systems||Developing and using models
||Systems and system models||Middle school|
6 Biodiversity in the Schoolyard (Chapter 4)
|LS4: Biological evolution: Unity and diversity||Planning and carrying out investigationsb||Patterns||Grade 5|
|Analyzing and interpreting data|
|Chapter and Example||Disciplinary Core Ideaa||Practices||Crosscutting Concepts||Grade Level|
7 Climate Change (Chapter 4)
|LS2: Ecosystems: Interactions, energy, and dynamics
||Analyzing and interpreting data||System and system models||High school|
|ESS3-5: Earth and human activity||Using a model to predict phenomena|
8 Ecosystems (Chapter 4)
|LS2: Ecosystems: Interactions, energy, and dynamics||Planning and carrying out investigations and interpreting patterns||Systems and system models
9 Photosyntheses and Plant Evolution (Chapter 5)
|LS4: Biological evolution: Unity and diversity||Developing and using models
||Systems and system models||High school|
|Analyzing and interpreting data
|Using mathematics and computational thinking
10 Sinking and Floating (Chapter 5)
|PS2: Motion and stability||Obtaining, evaluating, and communicating information
||Cause and effect||Grade 2|
|Stability and change|
|Planning and carrying out investigations
|Analyzing and interpreting data
|Engaging in argument from evidence|
11 Plate Tectonics (Chapter 5)
|ESS2: Earth’s systems||Developing and using models
|Constructing explanations||Scale, proportion, and quantity|
b This example focuses on carrying out an investigation.
report and the disciplinary core ideas, practices, and crosscutting concepts that they are intended to measure.
The report is structured around the steps that will be required to develop assessments to evaluate students’ proficiency with the NGSS performance expectations, and we use the examples to illustrate those steps. The report begins, in Chapter 2, with an examination of what the new science framework and the NGSS require of assessments. The NGSS and framework emphasize that science learning involves the active engagement of scientific and engineering practices in the context of disciplinary core ideas and crosscutting concepts—a type of learning that we refer to as “three-dimensional learning.” The first of our example assessment tasks appears in this chapter to demonstrate what three-dimensional learning involves and how it might be assessed.
Chapter 3 provides an overview of the fundamentals of assessment design. In the chapter, we discuss “principled” approaches to assessment design: they are principled in that they provide a methodical and systematic approach to designing assessment tasks that elicit performances that accurately reflect students’ proficiency. We use the example assessment task in the chapter to illustrate this type of approach to developing assessments.
Chapter 4 focuses on the design of classroom assessment tasks that can measure the performance expectations in the NGSS. The chapter addresses assessment tasks that are administered in the classroom for both formative and summative purposes. We elaborate on strategies for designing assessment tasks that can be used for either of these assessment purposes, and we include examples to illustrate the strategies.
Chapter 5 moves beyond the classroom setting and focuses on assessments designed to monitor science learning across the country, such as to document students’ science achievement across time; to compare student performance across schools, districts, or states; or to evaluate the effectiveness of certain curricula or instructional practices. The chapter addresses strategies for designing assessment tasks that can be administered on a large scale, such as to all students in a school, district, or state. The chapter addresses the technical measurement issues associated with designing assessments (i.e., assembling groups of tasks into tests, administering them, and scoring the responses) so that the resulting performance data provide reliable, valid, and fair information that can be used for a specific monitoring purpose.
Chapter 6 discusses approaches to developing a coherent system of curricula, instruction, and assessments that together support and evaluate students’ science learning.
Finally, in Chapter 7 we address feasibility issues and explore the challenges associated with implementing the assessment strategies that we recommend. Those challenges include the central one of accurately assessing the science learning of all students, particularly while substantial change is under way. The equity issues that are part of this challenge are addressed in Chapter 7 and elsewhere in the report.