4

CLASSROOM ASSESSMENT

Assessments can be classified in terms of the way they relate to instructional activities. The term classroom assessment (sometimes called internal assessment) is used to refer to assessments designed or selected by teachers and given as an integral part of classroom instruction. They are given during or closely following an instructional activity or unit. This category of assessments may include teacher-student interactions in the classroom, observations, student products that result directly from ongoing instructional activities (called “immediate assessments”), and quizzes closely tied to instructional activities (called “close assessments”). They may also include formal classroom exams that cover the material from one or more instructional units (called “proximal assessments”).1 This category may also include assessments created by curriculum developers and embedded in instructional materials for teacher use.

In contrast, external assessments are designed or selected by districts, states, countries, or international bodies and are typically used to audit or monitor learning. External assessments are usually more distant in time and context from instruction. They may be based on the content and skills defined in state or national standards, but they do not necessarily reflect the specific content that was covered in any particular classroom. They are typically given at a time that is determined by administrators, rather than by the classroom teacher. This category includes such assessments as the statewide science tests required by the No Child

images

1This terminology is drawn from Ruiz-Primo et al. (2002) and Pellegrino (2013).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 83
4 CLASSROOM ASSESSMENT A ssessments can be classified in terms of the way they relate to instruc- tional activities. The term classroom assessment (sometimes called internal assessment) is used to refer to assessments designed or selected by teach- ers and given as an integral part of classroom instruction. They are given during or closely following an instructional activity or unit. This category of assessments may include teacher-student interactions in the classroom, observations, student products that result directly from ongoing instructional activities (called “immedi- ate assessments”), and quizzes closely tied to instructional activities (called “close assessments”). They may also include formal classroom exams that cover the material from one or more instructional units (called “proximal assessments”).1 This category may also include assessments created by curriculum developers and embedded in instructional materials for teacher use. In contrast, external assessments are designed or selected by districts, states, countries, or international bodies and are typically used to audit or moni- tor learning. External assessments are usually more distant in time and context from instruction. They may be based on the content and skills defined in state or national standards, but they do not necessarily reflect the specific content that was covered in any particular classroom. They are typically given at a time that is determined by administrators, rather than by the classroom teacher. This category includes such assessments as the statewide science tests required by the No Child 1This terminology is drawn from Ruiz-Primo et al. (2002) and Pellegrino (2013). 83

OCR for page 83
Left Behind Act or other accountability purposes (called “distal assessments”), as well as national and international assessments: the National Assessment of Educational Progress and the Programme for International Student Assessment (called “remote assessments”). Such external assessments and their monitoring function are the subject of the next chapter. In this chapter, we illustrate the types of assessment tasks that can be used in the classroom to meet the goals of A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (National Research Council, 2012a, hereafter referred to as “the framework”) and the Next Generation Science Standards: For States, By States (NGSS Lead States, 2013). We present example tasks that we judged to be both rigorous and deep probes of student capabilities and also to be consistent with the framework and the Next Generation Science Standards (NGSS). We discuss external assessments in Chapter 5 and the integra- tion of classroom and external assessments into a coherent system in Chapter 6. The latter chapter argues that an effective assessment system should include a variety of types of internal and external assessments, with each designed to ful- fill complementary functions in assessing achievement of the NGSS performance objectives. Our starting point for looking in depth at classroom assessment is the analysis in Chapter 2 of what the new science framework and the NGSS imply for assessment. We combine these ideas with our analysis in Chapter 3 of current approaches to assessment design as we consider key aspects of classroom assess- ment that can be used as a component in assessment of the NGSS performance objectives. ASSESSMENT PURPOSES: FORMATIVE OR SUMMATIVE Classroom assessments can be designed primarily to guide instruction (formative purposes) or to support decisions made beyond the classroom (summative purpos- es). Assessments used for formative purposes occur during the course of a unit of instruction and may involve both formal tests and informal activities conducted as part of a lesson. They may be used to identify students’ strengths and weaknesses, assist educators in planning subsequent instruction, assist students in guiding their own learning by evaluating and revising their own work, and foster students’ sense of autonomy and responsibility for their own learning (Andrade and Cizek, 2010, p. 4). Assessments used for summative purposes may be administered at the end of a unit of instruction. They are designed to provide evidence of achievement that can be used in decision making, such as assigning grades; making promotion 84 Developing Assessments for the Next Generation Science Standards

OCR for page 83
or retention decisions; and classifying test takers according to defined performance categories, such as “basic,” “proficient,” and “advanced” (levels often used in score reporting) (Andrade and Cizek, 2010, p. 3). The key difference between assessments used for formative purposes and those used for summative purposes is in how the information they provide is to be used: to guide and advance learning (usually while instruction is under way) or to obtain evidence of what students have learned for use beyond the classroom (usu- ally at the conclusion of some defined period of instruction). Whether intended for formative or summative purposes, evidence gathered in the classroom should be closely linked to the curriculum being taught. This does not mean that the assessment must use the formats or exactly the same material that was presented in instruction, but rather that the assessment task should directly address the con- cepts and practices to which the students have been exposed. The results of classroom assessments are evaluated by the teacher or some- times by groups of teachers in the school. Formative assessments may also be used for reflection among small groups of students or by the whole class together. Classroom assessments can play an integral role in students’ learning experiences while also providing evidence of progress in that learning. Classroom instruction is the focus of the framework and the NGSS, and it is classroom assessment— which by definition is integral to instruction—that will be the most straightfor- ward to align with NGSS goals (once classroom instruction is itself aligned with the NGSS). Currently, many schools and districts administer benchmark or interim assessments, which seem to straddle the line between formative and summative purposes (see Box 4-1). They are formative in the sense that they are used for a diagnostic function intended to guide instruction (i.e., to predict how well students are likely to do on the end-of-year tests). However, because of this purpose, the format they use resembles the end-of-year tests rather than other types of internal assessments commonly used to guide instruction (such as quizzes, classroom dia- logues, observations, or other types of immediate assessment strategies that are closely connected to instruction). Although benchmark and interim assessments serve a purpose, we note that they are not the types of formative assessments that we discuss in relation to the examples presented in this chapter or that are advo- cated by others (see, e.g., Black and Wiliam, 2009; Heritage, 2010; Perie et al., 2007). Box 4-1 provides additional information about these types of assessments. Classroom Assessment 85

OCR for page 83
BOX 4-1 BENCHMARK AND INTERIM ASSESSMENTS Currently, many schools and districts administer benchmark or interim assessments, which they treat as formative assessments. These assessments use tasks that are taken from large-scale tests given in a district or state or are very similar to tasks that have been used in those tests. They are designed to provide an estimate of students’ level of learning, and schools use them to serve a diagnostic function, such as to predict how well students will do on the end-of-year tests. Like the large-scale tests they closely resemble, benchmark tests rely heavily on multiple-choice items, each of which tests a single learning objective. The items are developed to provide only general information about whether students understand a particular idea, though sometimes the incorrect choices in a multiple-choice item are designed to probe for particular common misconceptions. Many such tasks would be needed to provide solid evidence that students have met the performance expectations for their grade level or grade band. Teachers use these tests to assess student knowledge of a particular concept or a particular aspect of practice (e.g., control of variables), typically after teaching a unit that focuses on specific discrete learning objectives. The premise behind using items that mimic typical large-scale tests is that they help teachers measure students’ progress toward objectives for which they and their students will be held accountable and provide a basis for deciding which students need extra help and what the teacher needs to teach again. CHARACTERISTICS OF NGSS-ALIGNED ASSESSMENTS Chapter 2 discusses the implications of the NGSS for assessment, which led to our first two conclusions: • Measuring the three-dimensional science learning called for in the frame- work and the Next Generation Science Standards requires assessment tasks that examine students’ performance of scientific and engineering practices in the context of crosscutting concepts and disciplinary core ideas. To ade- quately cover the three dimensions, assessment tasks will generally need to contain multiple components (e.g., a set of interrelated questions). It may be useful to focus on individual practices, core ideas, or crosscutting concerts in the various components of an assessment task, but, together, the components need to support inferences about students’ three-dimensional science learning as described in a given performance expectation (Conclusion 2-1). 86 Developing Assessments for the Next Generation Science Standards

OCR for page 83
• The Next Generation Science Standards require that assessment tasks be designed so that they can accurately locate students along a sequence of progressively more complex understandings of a core idea and successively more sophisticated applications of practices and crosscutting concepts (Conclusion 2-2). Students will likely need repeated exposure to investigations and tasks aligned to the framework and the NGSS performance expectations, guidance about what is expected of them, and opportunities for reflection on their per- formance to develop these proficiencies, as discussed in Chapter 2. The kind of instruction that will be effective in teaching science in the way the framework and the NGSS envision will require students to engage in science and engineering prac- tices in the context of disciplinary core ideas—and to make connections across topics through the crosscutting ideas. Such instruction will include activities that provide many opportunities for teachers to observe and record evidence of student thinking, such as when students develop and refine models; generate, discuss, and analyze data; engage in both spoken and written explanations and argumentation; and reflect on their own understanding of the core idea and the subtopic at hand (possibly in a personal science journal). The products of such instruction form a natural link to the characteris- tics of classroom assessment that aligns with the NGSS. We highlight four such characteristics: 1. the use of a variety of assessment activities that mirror the variety in NGSS- aligned instruction; 2. tasks that have multiple components so they can yield evidence of three- dimensional learning (and multiple performance expectations); 3. explicit attention to the connections among scientific concepts; and 4. the gathering of information about how far students have progressed along a defined sequence of learning. Variation in Assessment Activities Because NGSS-aligned instruction will naturally involve a range of activities, class- room assessment that is integral to instruction will need to involve a correspond- ing variation in the types of evidence it provides about student learning. Indeed, the distinction between instructional activities and assessment activities may be blurred, particularly when the assessment purpose is formative. A classroom Classroom Assessment 87

OCR for page 83
assessment may be based on a classroom discussion or a group activity in which students explore and respond to each other’s ideas and learn as they go through this process. Science and engineering practices lend themselves well to assessment activi- ties that can provide this type of evidence. For instance, when students are developing and using models, they may be given the opportunity to explain their models and to discuss them with classmates, thus providing the teacher with an opportunity for formative assessment reflection (illustrated in Example 4, below). Student discourse can give the teacher a window into students’ thinking and help to guide lesson planning. A classroom assessment may also involve a formal test or diagnostic quiz. Or it may be based on artifacts that are the products of class- room activities, rather than on tasks designed solely for assessment purposes. These artifacts may include student work produced in the classroom, homework assignments (such as lab reports), a portfolio of student work collected over the course of a unit or a school year (which may include both artifacts of instruction as well as results from formal unit and end-of-course tests), or activities conducted using computer technology. A classroom assessment may occur in the context of group work or discussions, as long as the teacher ensures that all the students that need to be observed are in fact active participants. Summative assessments may also take a variety of forms, but they are usually intended to assess each student’s independent accomplishments. Tasks with Multiple Components The NGSS performance expectations each blend a practice and, in some cases, also a crosscutting idea with an aspect of a particular core idea. In the past, assess- ment tasks have typically focused on measuring students’ understanding of aspects of core ideas or of science practices as discrete pieces of knowledge. Progression in learning was generally thought of as knowing more or providing more complete and correct responses. Similarly, practices were intentionally assessed in a way that minimized specific content knowledge demands—assessments were more likely to ask for definitions than for actual use of the practice. Assessment developers took this approach in part to be sure they were obtaining accurate measures of clearly definable constructs.2 However, although understanding the language and termi- 2“Construct” is generally used to refer to concepts or ideas that cannot be directly observed, such as “liberty.” In the context of educational measurement, the word is used more specifically to refer to a particular body of content (knowledge, understanding, or skills) that an assessment 88 Developing Assessments for the Next Generation Science Standards

OCR for page 83
nology of science is fundamental and factual knowledge is very important, tasks that demand only declarative knowledge about practices or isolated facts would be insufficient to measure performance expectations in the NGSS. As we note in Chapter 3, the performance expectations provide a start in defining the claim or inference that is to be made about student proficiency. However, it is also important to determine the observations (the forms of evidence in student work) that are needed to support the claims, and then to develop tasks or situations that will elicit the needed evidence. The task development approaches described in Chapter 3 are commonly used for developing external tests, but they can also be useful in guiding the design of classroom assessments. Considering the intended inference, or claim, about student learning will help curriculum devel- opers and classroom assessment designers ensure that the tasks elicit the needed evidence. As we note in Chapter 2, assessment tasks aligned with the NGSS perfor- mance expectations will need to have multiple components—that is, be composed of more than one kind of activity or question. They will need to include opportu- nities for students to engage in practices as a means to demonstrate their capacity to apply them. For example, a task designed to elicit evidence that a student can develop and use models to support explanations about structure-function relation- ships in the context of a core idea will need to have several components. It may require that students articulate a claim about selected structure-function relation- ships, develop or describe a model that supports the claim, and provide a justi- fication that links evidence to the claim (such as an explanation of an observed phenomenon described by the model). A multicomponent task may include some short-answer questions, possibly some carefully designed selected-response ques- tions, and some extended-response elements that require students to demonstrate their understandings (such as tasks in which students design an investigation or explain a pattern of data). For the purpose of making an appraisal of student learning, no single piece of evidence is likely to be sufficient; rather, the pattern of evidence across multiple components can provide a sufficient indicator of student understanding. is to measure. It can be used to refer to a very specific aspect of tested content (e.g., the water cycle) or a much broader area (e.g., mathematics). Classroom Assessment 89

OCR for page 83
Making Connections The NGSS emphasize the importance of the connections among scientific con- cepts. Thus, the NGSS performance expectations for one disciplinary core idea may be connected to performance expectations for other core ideas, both within the same domain or in other domains, in multiple ways: one core idea may be a prerequisite for understanding another, or a task may be linked to more than one performance expectation and thus involve more than one practice in the con- text of a given core idea. NGSS-aligned tasks will need to be constructed so that they provide information about how well students make these connections. For example, a task that focused only on students’ knowledge of a particular model would be less revealing than one that probed students’ understanding of the kinds of questions and investigations that motivated the development of the model. Example 1, “What Is Going on Inside Me?” (in Chapter 2), shows how a single assessment task can be designed to yield evidence related to multiple performance expectations, such as applying physical science concepts in a life science context. Tasks that do not address these connections will not fully capture or adequately support three-dimensional science learning. Learning as a Progression The framework and the NGSS address the process of learning science. They make clear that students should be encouraged to take an investigative stance toward their own and others’ ideas, to be open about what they are struggling to under- stand, and to recognize that struggle as part of the way science is done, as well as part of their own learning process. Thus, revealing students’ emerging capabilities with science practices and their partially correct or incomplete understandings of core ideas is an important function of classroom assessment. The framework and the NGSS also postulate that students will develop disciplinary understandings by engaging in practices that help them to question and explain the function- ing of natural and designed systems. Although learning is an ongoing process for both scientists and students, students are emerging practitioners of science, not scientists, and their ways of acting and reasoning differ from those of scientists in important ways. The framework discusses the importance of seeing learning as a trajectory in which students gradually progress in the course of a unit or a year, and across the whole K-12 span, and organizing instruction accordingly. The first example in this chapter, “Measuring Silkworms” (also discussed in Chapter 3), illustrates how this idea works in an assessment that is embedded in a larger instructional unit. As they begin the task, students are not competent data 90 Developing Assessments for the Next Generation Science Standards

OCR for page 83
analysts. They are unaware of how displays can convey ideas or of professional conventions for display and the rationale for these conventions. In designing their own displays, students begin to develop an understanding of the value of these conventions. Their partial and incomplete understandings of data visualization have to be explicitly identified so teachers can help them develop a more general understanding. Teachers help students learn about how different mathematical practices, such as ordering and counting data, influence the shapes the data take in models. The students come to understand how the shapes of the data support inferences about population growth. Thus, as discussed in Chapter 2, uncovering students’ incomplete forms of practice and understanding is critical: NGSS-aligned assessments will need to clearly define the forms of evidence associated with beginning, intermediate, and sophisticated levels of knowledge and practice expected for a particular instruc- tional sequence. A key goal of classroom assessments is to help teachers and students understand what has been learned and what areas will require further attention. NGSS-aligned assessments will also need to identify likely misunder- standings, productive ideas of students that can be built upon, and interim goals for learning. The NGSS performance expectations are general: they do not specify the kinds of intermediate understandings of disciplinary core ideas students may express during instruction nor do they help teachers interpret students’ emerging capabilities with science practices or their partially correct or incomplete under- standing. To teach toward the NGSS performance expectations, teachers will need a sense of the likely progression at a more micro level, to answer such questions as: • For this unit, where are the students expected to start, and where should they arrive? • What typical intermediate understandings emerge along this learning path? • What common logical errors or alternative conceptions present barriers to the desired learning or resources for beginning instruction? • What new aspects of a practice need to be developed in the context of this unit? Classroom assessment probes will need to be designed to generate enough evidence about students’ understandings so that their locations on the intended pathway can be reliably determined, and it is clear what next steps (instructional activities) are needed for them to continue to progress. As we note in Chapter 2, Classroom Assessment 91

OCR for page 83
only a limited amount of research is available to support detailed learning progres- sions: assessment developers and others who have been applying this approach have used a combination of research and practical experience to support depic- tions of learning trajectories. SIX EXAMPLES We have identified six example tasks and task sets that illustrate the elements needed to assess the development of three-dimensional science learning. As noted in Chapter 1, they all predate the publication of the NGSS. However, the con- structs being measured by each of these examples are similar to those found in the NGSS performance expectations. Each example was designed to provide evidence of students’ capabilities in using one or more practices as they attempt to reach and present conclusions about one or more core ideas: that is, all of them assess three-dimensional learning. Table 1-1 shows the NGSS disciplinary core ideas, practices, and crosscutting ideas that are closest to the assessment targets for all of the examples in the report.3 We emphasize that there are many possible designs for activities or tasks that assess three-dimensional science learning—these six examples are only a sam- pling of the possible range. They demonstrate a variety of approaches, but they share some common attributes. All of them require students to use some aspects of one or more science and engineering practices in the course of demonstrating and defending their understanding of aspects of a disciplinary core idea. Each of them also includes multiple components, such as asking students to engage in an activity, to work independently on a modeling or other task, and to discuss their thinking or defend their argument. These examples also show how one can use classroom work products and discussions as formative assessment opportunities. In addition, several of the examples include summative assessments. In each case, the evidence produced provides teachers with information about students’ thinking and their develop- ing understanding that would be useful for guiding next steps in instruction. Moreover, the time students spend in doing and reflecting on these tasks should 3The particular combinations in the examples may not be the same as NGSS examples at that grade level, but each of these examples of classroom assessment involves integrated knowledge of the same general type as the NGSS performance expectations. However, because they predate the NGSS and its emphasis on crosscutting concepts, only a few of these examples include refer- ence to a crosscutting concept, and none of them attempts to assess student understanding of, or disposition to invoke, such concepts. 92 Developing Assessments for the Next Generation Science Standards

OCR for page 83
be seen as an integral part of instruction, rather than as a stand-alone assessment task. We note that the example assessment tasks also produce a variety of prod- ucts and scorable evidence. For some we include illustrations of typical student work, and for others we include a construct map or scoring rubric used to guide the data interpretation process. Both are needed to develop an effective scoring system. Each example has been used in classrooms to gather information about par- ticular core ideas and practices. The examples are drawn from different grade lev- els and assess knowledge related to different disciplinary core ideas. Evidence from their use documents that, with appropriate prior instruction, students can success- fully carry out these kinds of tasks. We describe and illustrate each of these exam- ples below and close the chapter with general reflections about the examples, as well as our overall conclusions and recommendations about classroom assessment. Example 3:  Measuring Silkworms The committee chose this example because it illustrates several of the character- istics we argue an assessment aligned with the NGSS must have: in particular, it allows the teacher to place students along a defined learning trajectory (see Figure 3-13 in Chapter 3), while assessing both a disciplinary core idea and a crosscutting concept.4 The assessment component is formative, in that it helps the teacher understand what students already understood about data display and to adjust the instruction accordingly. This example, in which 3rd-grade students investigated the growth of silkworm larvae, first assesses students’ conceptions of how data can be represented visually and then engages them in conversations about what different representations of the data they had collected reveal. It is closely tied to instruction—the assessment is embedded in a set of classroom activities. The silkworm scenario is designed so that students’ responses to the tasks can be interpreted in reference to a trajectory of increasingly sophisticated forms of reasoning. A construct map displayed in Figure 3-13 shows developing concep- tions of data display. Once the students collect their data (measure the silkworms) and produce their own ways of visually representing their findings, the teacher uses the data displays as the basis for a discussion that has several objectives. 4This example is also discussed in Chapter 3 in the context of using construct modeling for task design. Classroom Assessment 93

OCR for page 83
FIGURE 4-13  Screenshot of a benchmark summative assessment of a student constructing a food web to model the flow of matter and energy in the ecosystem (without feedback and coaching); part of Example 8, “Ecosystems.” SOURCE: SimScientists Calipers II project (2013). Reprinted with permission. R02484 FIG4-13 convert.eps bitmap FIGURE 4-14  Screenshot of a benchmark summative assessment of a student using simula- tions to build balanced ecosystem population models (without feedback and coaching); part of R02484 FIG4-14 convert.eps Example 8, “Ecosystems.” bitmap SOURCE: SimScientists Calipers II project (2013). Reprinted with permission. 122 Developing Assessments for the Next Generation Science Standards

OCR for page 83
These formative assessments also have an instructional purpose. They are designed to promote model-based reasoning about the common organization and behaviors of all ecosystems (see Figure 4-9) and to teach students how to transfer knowledge they gain about how one ecosystem functions to examples of new eco- systems (Buckley and Quellmalz, 2013).17 LESSONS FROM THE EXAMPLES The six examples discussed above, as well as the one in Chapter 2, demonstrate characteristics we believe are needed to assess the learning called for in the NGSS and a range of approaches to using assessments constructively in the classroom to support such learning. A key goal of classroom assessment is to elicit and make visible students’ ways of thinking and acting. The examples demonstrate that it is possible to design tasks and contexts in which teachers elicit student thinking about a disciplinary core idea or crosscutting concept by engaging them in a sci- entific practice. The examples involve activities designed to stimulate classroom conversations or to produce a range of artifacts (products) that provide informa- tion to teachers about students’ current ways of thinking and acting, or both. This information can be used to adjust instruction or to evaluate learning that occurred during a specified time. Some of the examples involve formal scoring, while oth- ers are used by teachers to adjust their instructional activities without necessarily assigning student scores. Types of Assessment Activities In “What Is Going on Inside Me?” (Example 1 in Chapter 2), students produce a written evidence-based argument for an explanation of how animals get energy from food and defend that explanation orally in front of the class. In “Measuring Silkworms” (Example 3, above, and also discussed in Chapter 3), students pro- duce representations of data and discuss what they do and do not reveal about the data. In “Behavior of Air” (Example 4, above), models developed by groups of students are the stimulus for class discussion and argumentation that the teacher uses to diagnose and highlight discrepancies in students’ ideas. In “Movement of Water” (Example 5, above), multiple-choice questions that students answer using 17The system was designed using the evidence-centered design approach discussed in Chapter 3. Research on the assessments supports the idea that this approach could be a part of a coherent, balanced state science assessment system: see discussion in Chapter 6. Classroom Assessment 123

OCR for page 83
clickers are the stimulus for class discussion (assessment conversation). In each of these examples, students’ writing and classroom discourse provide evidence that can be used in decisions about whether additional activities for learning might be needed, and, if so, what kinds of activities might be most productive. In many of these examples, listening to and engaging with other students as they discuss and defend their responses is a part of the learning process, as students work toward a classroom consensus explanation or a model based on the evidence they have col- lected. The classroom discussion itself in these cases is the basis for the formative assessment process. We note that when assessments are designed to be used formatively, the goal is sometimes not to assign scores to individual students but rather to decide what further instruction is needed for groups of students or the class as a whole. Thus, instead of scoring rubrics, criteria or rubrics that can help guide instructional deci- sions may be used. (When the goal includes assessment of both individuals and groups, both types of scoring rubrics would be needed.) Teachers need support to learn to be intentional and deliberative about such decisions. In the examples shown, designers of curriculum and instruction have developed probes that address likely learning challenges, and teachers are supported in recognizing these challenges and in the use of the probes to seek evidence of what their students have learned and not learned, along some continuum. “Ecosystems” (Example 8, above) is a computer-based system in which stu- dents use simulations both to learn and to demonstrate what they have learned about food webs. It includes tasks that are explicitly designed for assessment. Other tasks may not be sharply distinguished from ongoing classroom activities. The data collection tasks in “Biodiversity in the Schoolyard” (Example 6, above) are part of students’ ongoing investigations, not separate from them, but they can provide evidence that can be used for formative purposes. Similarly, in “Measuring Silkworms” (Example 3) students create displays as part of the learning process in order to answer questions about biological growth. Constructing these displays engages students in the practice of analyzing data, and their displays are also a source of evidence for teachers about students’ profi- ciencies in reasoning about data aggregations; thus they can be used formatively. These forms of reasoning also become a topic of instructional conversations, so that students are encouraged to consider additional aspects of data representation, including tradeoffs about what different kinds of displays do and do not show about the same data. As students improve their capacity to visualize data, the data discussion then leads them to notice characteristics of organisms or populations 124 Developing Assessments for the Next Generation Science Standards

OCR for page 83
that are otherwise not apparent. This interplay between learning a practice (data representation as an aspect of data analysis) and learning about a core idea (varia- tion in a population), as well as a crosscutting concept (recognizing and interpret- ing patterns), provides an example of the power of three-dimensional learning, as well as an example of an assessment strategy. Interpreting Results A structured framework for interpreting evidence of student thinking is needed to make use of the task artifacts (products), which might include data displays, writ- ten explanations, or oral arguments. As we discuss in Chapter 3, interpretation of results is a core element of assessment, and it should be a part of the assessment design. An interpretive framework can help teachers and students themselves rec- ognize how far they have progressed and identify intermediate stages of under- standing and problematic ideas. “Measuring Silkworms” shows one such frame- work, a learning progression for data display developed jointly by researchers and teachers. “Behavior of Air” is similarly grounded in a learning progressions approach. “Movement of Water” presents an alternative example, using what is called a facets-based approach18 to track the stages in a learning progression (discussed in Chapter 2)—that is, to identify ideas that are commonly held by stu- dents relative to a disciplinary core idea. Although these preconceptions are often labeled as misconceptions or problematic ideas, they are the base on which stu- dent learning must be built. Diagnosing students’ preconceptions can help teachers identify the types of instruction needed to move students toward a more scientific conception of the topic. What these examples have in common is that they allow teachers to group students into categories, which helps with the difficult task of making sense of many kinds of student thinking; they also provide tools for helping teachers decide what to do next. In “Movement of Water,” for example, students’ use of clickers 18In this approach, a facet is a piece of knowledge constructed by a learner in order to solve a problem or explain an event (diSessa and Minstrell, 1998). Facets that are related to one anoth- er can be organized into clusters, and the basis for grouping can either be an explanation or an interpretation of a physical situation or a disciplinary core idea (Minstrell and Kraus, 2005). Clusters comprise goal facets (which are often standards or disciplinary core ideas) and prob- lematic facets (which are related to the disciplinary idea but which represent ways of reasoning about the idea that diverge from the goal facet). The facets perspective assumes that, in addition to problematic thinking, students also possess insights and understandings about the disciplinary core idea that can be deepened and revised through additional learning opportunities (Minstrell and van Zee, 2003). Classroom Assessment 125

OCR for page 83
to answer questions gives teachers initial feedback on the distribution of student ideas in the classroom. Depending on the prevalence of particular problematic ideas or forms of reasoning and their persistence in subsequent class discussion, teachers can choose to use a “contingent activity” that provides a different way of presenting a disciplinary core idea. The interpretive framework for evaluating evidence has to be expressed with enough specificity to make it useful for helping teachers decide on next steps. The construct map for data display in “Measuring Silkworms” meets this requirement: a representation that articulated only the distinction between the lowest and high- est levels of the construct map would be less useful. Learning progressions that articulate points of transition that take place across multiple years—rather than transitions that may occur in the course of a lesson or a unit—would be less use- ful for classroom decision making (although a single classroom may often include students who span such a range) (Alonzo and Gearhart, 2006). Using Multiple Practices The examples above involve tasks that cross different domains of science and cover multiple practices. “What Is Going on Inside Me?,” for example, requires students to demonstrate their understanding of how chemical processes support biological processes. It asks students not only to apply the crosscutting concept of energy and matter conservation, but also to support their arguments with explicit evidence about the chemical mechanism involved. In “Measuring Silkworms” and “Biodiversity in the Schoolyard,” students’ responses to the different tasks can provide evidence of their understanding of the crosscutting concept of patterns. It is important to note, however, that “patterns” in each case has a different and particular disciplinary interpretation. In “Measuring Silkworms,” students must recognize pattern in a display of data, in the form of the “shapes” the data can take, and begin to link ideas about growth and variation to these shapes. In con- trast, in “Biodiversity in the Schoolyard,” students need to recognize patterns in the distribution and numbers of organisms in order to use the data in constructing arguments. Three of the examples—“Measuring Silkworms,” “Biodiversity in the Schoolyard,” and “Climate Change”—provide some classroom-level snapshots of emerging proficiency with aspects of the practices of analyzing and interpreting data and using mathematics and computational thinking. We note, though, that each of these practices has multiple aspects, so multiple tasks would be needed to provide a complete picture of students’ capacity with each of them. Although 126 Developing Assessments for the Next Generation Science Standards

OCR for page 83
assessment tasks can identify particular skills related to specific practices, evalu- ating students’ disposition to engage in these practices without prompting likely requires some form of direct observation or assessment of the products of more open-ended student projects.19 In instruction, students engage in practices in interconnected ways that support their ongoing investigations of phenomena. Thus, students are likely to find that to address their questions, they will need to decide which sorts of data (including observational data) are needed; that is, they will need to design an investigation, collect those data, interpret the results; and construct explanations that relate their evidence to both claims and reasoning. It makes little sense for students to construct data displays in the absence of a question. And it is not pos- sible to assess the adequacy of their displays without knowing what question they are pursuing. In the past, teachers might have tried to isolate the skill of graphing data as something to teach separately from disciplinary content, but the new sci- ence framework and the NGSS call for teachers to structure tasks and interpret evidence in a broad context of learning that integrates or connects multiple con- tent ideas and treats scientific practices as interrelated. Similarly, assessment tasks designed to examine students’ facility with a particular practice may require stu- dents to draw on other practices as they complete the task. We stress in Chapter 2 that a key principle of the framework is that science education should connect to students’ interests and experiences. Students are likely to bring diverse interests and experiences to the classroom from their families and cultural communities. A potential focus of classroom assessment at the outset of instruction is to elicit students’ interests and experiences that may be relevant to the goals for instruction. However, identifying interests has not often been a focus of classroom assessment research in science, although it has been used to motivate and design assessments in specific curricula.20 One approach that could prove fruitful for classroom assessment is a strat- egy used in an elementary curriculum unit called Micros and Me (Tzou et al., 2007). The unit aims to engage students in the practice of argumentation to learn about key ideas in microbiology. In contrast to many curriculum units, however, this example provides students with the opportunity to pursue investigations related to issues that are relevant to them. The researchers adapted a qualitative 19The phrase “disposition to engage” is used in the context of science education to refer to stu- dents’ degree of engagement with and motivation to persevere with scientific thinking. 20One example is Issues, Evidence, and You: see Science Education for Public Understanding Program (SEPUP) (1995) and Wilson and Sloane (2000). Classroom Assessment 127

OCR for page 83
methodology from psychology, photo-elicitation, which is used to identify these issues. Research participants take photos that become the basis for interviews that elicit aspects of participants’ everyday lives (Clark-Ibañez, 2004). In Micros and Me, at the beginning of the unit, students take photos of things or activities they do to prevent disease and stay healthy. They share these photos in class, as a way to bring personally relevant experiences into the classroom to launch the unit. Their documentation also helps launch a student-led investigation focused on students’ own questions, which are refined as students encounter key ideas in microbiology. In describing the curriculum, Tzou and Bell (2010) do not call out the prac- tice of self-documentation of students’ personally relevant experiences as a form of assessment. At the same time, they note that a key function of self-documentation is to “elicit and make visible students’ everyday expertise” relevant to the unit content (Tzou and Bell, 2010, p. 1136). Eliciting and making visible prior knowl- edge is an important aspect of assessment that is used to guide instruction. It holds promise as a way to identify diversity in the classroom in science that can be used to help students productively engage in science practices (Clark-Ibañez, 2004; Tzou and Bell, 2010; Tzou et al., 2007). Professional Development The framework emphasizes that professional development will be an indispensable component of the changes to science education it calls for (see National Research Council, 2012a, Ch. 10). The needed changes in instruction are beyond our charge, but in the context of classroom assessment, we note that significant adap- tation will be asked of teachers. They will need systematic opportunities to learn how to use classroom discourse as a means to elicit, develop, and assess student thinking. The Contingent Pedagogies Project (see Example 4, above) illustrates one way to organize such professional development. In that approach, professional development included opportunities for teachers to learn how to orchestrate class- room discussion of core disciplinary ideas. Teachers also learned how to make use of specific discussion strategies to support the practice of argumentation. Eliciting student thinking through skillful use of discussion is not enough, however. Tasks or teacher questions also have to successfully elicit and display stu- dents’ problematic ways of reasoning about disciplinary core ideas and problem- atic aspects of their participation in practices. They must also elicit the interests and experiences students bring, so they can build on them throughout instruction. This is part of the process of integrating teaching and assessment. Thus, both 128 Developing Assessments for the Next Generation Science Standards

OCR for page 83
teachers and assessment developers need to be aware of the typical student ideas about a topic and the various problematic alternative conceptions that students are likely to hold. (This is often called pedagogical content knowledge.) In addi- tion, teachers need a system for interpreting students’ responses to tasks or ques- tions. That system should be intelligible and usable in practice: it cannot be so elaborate that teachers find it difficult to use in order to understand student think- ing during instruction. (The construct map and its associated scoring guide shown in Chapter 3 are an example of such a system.) CONCLUSIONS AND RECOMMENDATIONS The primary conclusion we draw from these examples is that it is possible to design tasks and contexts in which teachers elicit students’ thinking about disci- plinary core ideas and crosscutting concepts by engaging them in scientific prac- tices. Tasks designed with the characteristics we have discussed (three dimensions, interconnections among concepts and practices, a way to identify students’ place on a continuum) produce artifacts, discussions, and activities that provide teachers with information about students’ thinking and so can help them make decisions about how to proceed or how to adjust subsequent instruction or to evaluate the learning that took place over a specified period of time. Questions have been raised about whether students can achieve the ambi- tious performance expectations in the NGSS. The implementation of the NGSS is a complex subject that is beyond the scope of our charge; however, each of the examples shown has been implemented with diverse samples of students,21 and there have been students who succeeded on them (although there are also students who did not). The tasks in our examples assess learning that is part of a well- designed, coherent sequence of instruction on topics and in ways that are very similar to NGSS performance expectations. Each example offers multiple opportu- nities to engage in scientific practices and encourage students to draw connections among ideas, thus developing familiarity with crosscutting concepts. CONCLUSION 4-1  Tasks designed to assess the performance expectations in the Next Generation Science Standards will need to have the following characteristics: 21Samplesincluded students from rural and inner-city schools, from diverse racial and ethnic backgrounds, and English-language learners. Classroom Assessment 129

OCR for page 83
• multiple components that reflect the connected use of different scientific practices in the context of interconnected disciplinary ideas and crosscutting concepts; • reflect the progressive nature of learning by providing information about where students fall on a continuum between expected beginning and ending points in a given unit or grade; and • an interpretive system for evaluating a range of student products that is specific enough to be useful for helping teachers understand the range of stu- dent responses and that provides tools to helping them decide on next steps in instruction. CONCLUSION 4-2  To develop the skills and dispositions to use scientific and engineering practices needed to further their learning and to solve prob- lems, students need to experience instruction in which they (1) use multiple practices in developing a particular core idea and (2) apply each practice in the context of multiple core ideas. Effective use of the practices often requires that they be used in concert with one another, such as in supporting expla- nation with an argument or using mathematics to analyze data. Classroom assessments should include at least some tasks that reflect the connected use of multiple practices. CONCLUSION 4-3  It is possible to design assessment tasks and scoring rubrics that assess three-dimensional science learning. Such assessments pro- vide evidence that informs teachers and students of the strengths and weak- nesses of a student’s current understanding, which can guide further instruc- tion and student learning and can also be used to evaluate students’ learning. We emphasize that implementing the conception of science learning envi- sioned in the framework and the NGSS will require teachers who are well trained in assessment strategies such as those discussed in this chapter. Professional devel- opment will be essential in meeting this goal. CONCLUSION 4-4  Assessments of three-dimensional science learning are challenging to design, implement, and properly interpret. Teachers will need extensive professional development to successfully incorporate this type of assessment into their practice. 130 Developing Assessments for the Next Generation Science Standards

OCR for page 83
On the basis of the conclusions above, the committee offers recommen- dations about professional development and for curriculum and assessment development. RECOMMENDATION 4-1  State and district leaders who design professional development for teachers should ensure that it addresses the changes called for by the framework and the Next Generation Science Standards in both the design and use of assessment tasks and instructional strategies. Professional development must support teachers in integrating practices, crosscutting con- cepts, and disciplinary core ideas in inclusive and engaging instruction and in using new modes of assessment that support such instructional activities. Developing assessment tasks of this type will require the participation of several different kinds of experts. First, for the tasks to accurately reflect science ideas, scientists will need to be involved. Second, experts in science learning will also be needed to ensure that knowledge from research on learning is used as a guide to what is expected of students. Third, assessment experts will be needed to clarify relationships among tasks and the forms of knowledge and practice that the items are intended to elicit. Fourth, practitioners will be needed to ensure that the tasks and interpretive frameworks linked to them are usable in classrooms. And fifth, as we discuss further in Chapter 6, this multidisciplinary group of experts will need to include people who have knowledge of and experience with population subgroups, such as students with disabilities and students with varied cultural backgrounds, to ensure that the tasks are not biased for or against any subgroups of students for reasons irrelevant to what is being measured. We note also that curricula, textbooks, and other resources, such as digi- tal content, in which assessments may be embedded, will also need to reflect the characteristics we have discussed—and their development will present similar chal- lenges. For teachers to incorporate tasks of this type into their practice, and to design additional tasks for their classrooms, they will need to have worked with many good examples in their curriculum materials and professional development opportunities. RECOMMENDATION 4-2  Curriculum developers, assessment developers, and others who create resource materials aligned to the science framework and the Next Generation Science Standards should ensure that assessment activities included in such materials (such as mid- and end-of-chapter activi- Classroom Assessment 131

OCR for page 83
ties, suggested tasks for unit assessment, and online activities) require students to engage in practices that demonstrate their understanding of core ideas and crosscutting concepts. These materials should also reflect multiple dimensions of diversity (e.g., by connecting with students’ cultural and linguistic identi- ties). In designing these materials, development teams need to include experts in science, science learning, assessment design, equity and diversity, and sci- ence teaching. 132 Developing Assessments for the Next Generation Science Standards