Read "Grading the Nation's Report Card: Research from the Evaluation of NAEP" at NAP.edu

« Previous: 2 Families of Items in the NAEP Mathematics Assessment

Page 44 Cite

Suggested Citation:"3 Student Thinking and Related Assessment: Creating a Facet-Based Learning Environment." National Research Council. 2000. Grading the Nation's Report Card: Research from the Evaluation of NAEP. Washington, DC: The National Academies Press. doi: 10.17226/9751.

Page 45 Cite

Page 46 Cite

Page 47 Cite

Page 48 Cite

Page 49 Cite

Page 50 Cite

Page 51 Cite

Page 52 Cite

Page 53 Cite

Page 54 Cite

Page 55 Cite

Page 56 Cite

Page 57 Cite

Page 58 Cite

Page 59 Cite

Page 60 Cite

Page 61 Cite

Page 62 Cite

Page 63 Cite

Page 64 Cite

Page 65 Cite

Page 66 Cite

Page 67 Cite

Page 68 Cite

Page 69 Cite

Page 70 Cite

Page 71 Cite

Page 72 Cite

Page 73 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Student Thinking and Related Assessment: Creating a Facet-Based Learning Environment km Minstrell From the research literature we know that students come to our classes with preconceptions. Over the past 30 years there has been considerable research on students' conceptions. In a classic popularized article, McCloskey et al. (1980) identified several misconceptions in mechanics that they described as being con- sistent with the impetus theory, which predominated before Newton's synthesis. More recently, summaries of students' conceptual difficulties across the sciences have been published (Driver et al., 1994; Gabel et al., 1994; Project 2061, 1993~. There is even at least one summary of international research on students' concep- tions (Duit et al., 1991~. How can these research results be incorporated into mainline assessment, curriculum, and instruction? In topics new to their experience and thinking, learners construct understand- ing during class activities. The list of students' conceptions and reasoning has grown to be quite extensive and continues to grow. Consider the following student ideas: · Are these ideas wrong? · To find the average speed, divide the final position by the final time. · Heavier things fall faster. Extremely light things don't even fall. · A forward force is necessary to keep an object moving in the forward direction at a constant speed. · Objects don't weigh anything in space. . Balanced forces can't apply to both an at-rest object and an object moving at a constant velocity. · In an interaction the bigger/heavier object exerts the greater force. 44

JIM MINSTRELL 45 · The more pulleys the greater the mechanical advantage, or the less force one will need to exert. · More batteries will make the bulb brighter. Most of these statements seem valid on the surface. Several are true, depending on the context in which they are used. How can we honor the "sense making" learners have done and yet help them move toward a more scientific understanding? What can research reveal about students' thinking, and what are the implica- tions for instruction and assessment? This chapter illustrates some aspects of students' thinking, suggests a "facets of thinking" approach to organizing stu- dents' thinking, and shows that the facets approach can be useful to teachers in diagnosing student difficulties and designing or choosing instruction to address those difficulties. If it can be useful to teachers to effect better learning, it makes sense to incorporate the perspective into classroom assessment and even large- scale assessment in order to inform decisions at the program and policy levels. The purpose of the chapter is to demonstrate that research on learning and teach- ing can be used effectively to inform curriculum, instruction, and assessment at both the policy and especially the classroom levels. THINKING ABOUT STUDENTS' THINKING Background Consider the following question: A huge, strong magnet and a tiny, weak magnet are brought near each other. Which of the following statements makes the most sense to you? A. The huge magnet exerts no force on the small one, which exerts no force on the large one. B. The huge magnet exerts more force on the small magnet than the small one exerts on the large one. C. The huge magnet exerts the same force on the small magnet as the small magnet exerts on the large one. D. The huge magnet exerts less force on the small magnet than the small magnet does on the large one. E. The huge magnet exerts no force on the small magnet, which does exert force on the large one. Briefly explain how you decided. Readers can most likely predict which is the most popular answer. In our classes, prior to instruction, nearly 85 percent of the students pick B and justify the choice

46 STUDENT THINKING AND RELATED ASSESSMENT by citing the fact that the one magnet is larger and stronger and therefore capable of exerting the larger force. It is also interesting that about 15 percent choose C. In this case their rationale comes from authority: "I remember that for every action there is an equal reaction." Asked to cite experience consistent with this idea, students report remembering "reading it in a book" or "hearing it from a former teacher." This does not represent an adequate understanding. Consider a second question: Sam is taller, stronger, and heavier than Shirley. They are both standing on level ground and lean on each other back to back without falling. Which seems to make the most sense with respect to the forces they exert on each other? A. Sam exerts a greater force on Shirley. B. Sam and Shirley exert equal forces on each other. C. Shirley exerts a greater force on Sam. D. Neither exerts a force on the other. Briefly explain. With the Sam and Shirley problem the reader may have more difficulty predict- ing the outcomes. In our classes about 50 percent of the students suggest that Sam will exert the larger force "because he is bigger and/or stronger." About 20 percent of the students suggest Shirley will exert the greater force, citing such evidence as, "she has the angle on Sam" or "he is just leaning [passive), but she will have to be pushing [active] to keep them from falling over." Nearly 30 percent suggest they will exert equal forces. While some students cite knowledge learned from authority, many cite as evidence the fact that "nobody is winning" and "they are not falling over" [no effect). From these and similar questions it appeared that students were attending to surface features of problem situations rather than understanding and applying principles. From a formal physics perspective, it is clear the students are not being consistent. After all, these are both "third law" [Newton] questions and the students are not answering them the same. On the other hand, looking at the questions from the students' viewpoints, the questions are very different. The salient features in the two situations are different. In constructing their solutions the learners were considering such features as size, strength, "winning" or result- ing movement effects, and level of activity or passivity of the interacting objects. A tenet from cognitive psychology is that learners are naturally mentally active (Bruer, 1993~. As humans, we try to make sense of the natural world and human-made artifacts in it. We organize it initially by surface features and then react on the basis of recognition of patterns. We see what we perceive to be a similar situation and make a similar prediction or action. If something does not

JIM MINSTRELL 47 work out as expected, we attempt to reorganize our understanding. It is around these impasses, where ideas do not work, that change in our thinking results. Making the leap to abstract scientific principles, like Newton's Third Law, to organize the world phenomena does not come naturally or quickly. It takes opportunities for development and time to develop our thinking to that level of principled performance. To better understand my students' thinking so that I could create better instruction to address their cognition, I tried to think about the physical world like a student does. I assumed my students were trying to make sense of their world. I read their solutions and listened to their ideas with an eye and ear tuned to search for features that seemed to make sense to them in limited contexts. From the field of research on students' conceptions and reasoning, I began identifying and organizing student thinking associated with various problematic situations. I identified the individual sorts of thinking (which I call facets) and clustered them around certain situations or ideas. I call these facet clusters (Minstrel!, 1992~. The term facets was used to avoid the "baggage" that goes with such terms as misconceptions or alternative conceptions. In fact, much of the thinking of students is useful and can be built upon, but it does not appear to be theoretically based, such as what would be part of an impetus theory or Newtonian theory. It seems rather to be based on salient features and a construc- tion of explanations from "pieces" of understanding (diSessa, 1993~. Facets of Thinking Facets are used to describe students' thinking as it is seen or heard in the classroom. Facets of students' thinking are individual pieces or constructions of a few pieces of knowledge and/or strategies of reasoning. While facets assumes a "knowledge in pieces" perspective like that of diSessa (1993), the pieces are generally not as small as the phenomenological primitives (p-prims) assumed by diSessa. Facets have been derived from research on students' thinking and from classroom observations by teachers. They are convenient units of thought for characterizing and analyzing students' thinking in the interest of making deci- sions to effect specific reform of curriculum, instruction, and assessment. Since facets are only slight generalizations from what students actually say or do in the classroom, they can be identified by teachers and used by them to discuss the phenomena of students' ideas. Some are content specific for example, "horizontal movement makes a falling object fall more slowly." Others are strategic, like "average velocity is half the sum of the initial and final velocities" (in any situation). Still others are generic "more implies more," such as "the more batteries, the brighter the bulb." Typically they are (or seem to be) valid, depending on the context of usage. Facet clusters are sets of related facets, grouped around a physical situation (e.g., forces on interacting objects) or around some conceptual idea (e.g., meaning

48 STUDENT THINKING AND RELATED ASSESSMENT of average velocity). Within the cluster, facets are sequenced in an approximate order of development and for recording purposes are coded numerically (see Figures 3-1 and 3-2~. Those ending with O or 1 in the units digit tend to be appropnate, acceptable understandings for introductory physics. The facets end- ing in 9, 8, or so tend to be the more problematic facets in that, if this is not dealt with during instruction, the student will likely have a great deal of trouble with this cluster and with ideas in related clusters. For example, if students do not differentiate average speed from a change in position (facet 229-3), they will have great difficulty understanding many other ideas about motion. For some facets there are several "subspecies." For example, 229 has three ways that it represents what students do when they do not separate average rate (speed/ velocity) from amount of distance or displacement. Those facets with middle digits frequently arise from formal instruction, but the student may have over- generalized or undergeneralized the application of an appropriate pnnciple. The numerical code is intended as a descriptive aid. Thus, rather than simply a score, they suggest implications for what specifically needs to be addressed, where specific deficiencies exist. For additional information on facets and clusters see the following two Web sites: http://weber.u.washington.edu/~huntlab/diagnoser/ facetcode.html and www.talariainc.com. FIGURE 3-1 Cluster 470: forces on interacting objects. *470 All interactions involve equal magnitude and oppositely directed action and reaction forces that are on the separate interacting bodies 474 475 476 Effects (such as damage or resulting motion) dictate relative magnitudes of forces during interaction. At rest, therefore interaction forces balance. "Moves," therefore interacting forces unbalanced. Equal force pairs are identified as action and reaction but are on the same object Stronger exerts more force 477 One with more motion exerts more force 478 479 More active/energetic exerts more force Bigger/heavier exerts more force

JIM MINSTRELL FIGURE 3-2 Cluster 220: meaning of average speed or average velocity. *220 avg. speed = (total distance covered)/(total amount of time) *221 avg. velocity = Ax/At (together with a direction) 225 Rate expression is over-generalized 225-1 avg. v = vf + vi/2 unless compensation between low and high values occurs e.g., acceleration is constant 225-2 avg. v = xf / if 226 Rate expression misstated 226-1 avg. v = At/Ax, i.e., change in time divided by change in position. 226-2 avg. v = Avl2 226-3 avg. v = vf/2 226-4 avg. v = (vf+vi)/At 228 Average rate not differentiated from another rate 228-1 avg. v means constant velocity 228-2 Velocity = speed Student doesn't differentiate between velocity and speed. 228-3 avg. v = vf, i.e., average v is the same as the final v. 228-31 greatest avg. vet = greatest Vf during any part of trip 228-4 avg. v = avg. a 228-5 avg. v = Av or Av divided by a quantity other than At 229 Average rate (speed/velocity) not differentiated from amount of distance or displacement. 229-2 avg.v = pf, i.e., the final position 229-21 avg.v = avg.p 229-3 avg.v = Ap INSTRUCTIONAL DESIGN BASED ON STUDENTS' THINKING Using Facets to Create a Facet-Based Learning Environment 49 This section demonstrates how having information from facet assessment can inform instructional decisions. Whether an assessment is done in the class- room or on a larger scale, such as state or national assessments, the results and implications must eventually be fed back to teachers to affect programs and instruction. Thus, the facet assessment examples presented here are at the class- room interface between teacher, student, and curriculum. Likewise, assessment implications can also affect curriculum development or adaptation to better address targeted learning difficulties with respect to particular learning goals (e.g., standards). I will describe how the research on facets is used to create a facet assessment- based learning environment. The purpose of the environment will be to build

so STUDENT THINKING AND RELATED ASSESSMENT from assessments of students' initial and developing ideas toward a more prin- cipled understanding. Facets are used to diagnose students' ideas and to direct the choice or design of instructional activities (Minstrel!, 1989; Minstrell and Stimpson, 1996~. The main body of this paper discusses the value of teachers having, and being able to use, facets and facet clusters. A particular facet cluster is used to demonstrate the creation of such a facet assessment-based learning environment. Goals in our introductory physics course include understanding the nature of gravity and its effects and understanding the effects of ambient fluid (e.g., air or water) mediums on objects in them, whether the objects are at rest or moving through the fluid. For many introductory physics students, an initial difficulty involves a confusion between which effects are effects of gravity and which are effects of the surrounding medium. When one attempts to weigh something, does it weigh what it does because the air pushes down on it? Or is the scale reading that would give the true weight of the object distorted somehow because of the air? Or is there absolutely no effect by air? Because these have been issues for beginning students, the students are usually highly motivated to engage in thoughtful discussion of the issues. Assessment for Eliciting Students' Ideas Prior to Instruction in Order to Build an Awareness of the Initial Understanding At the beginning of several units or subunits, a preinstruction quiz is admin- istered. One purpose is to provide the teacher with knowledge of the related issues in the class in general and to provide specific knowledge of which students exhibit what sorts of ideas. A second reason is to help students become more aware of the content and issues involved in the upcoming unit. To get students involved in separating effects of gravity from effects of the ambient medium, we use the following question associated with Figure 3-3. "First, suppose we weigh some object on a large spring scale, not unlike the ones we have at the local market. The object apparently weighs ten pounds, according to the scale. Now we put the same apparatus, scale, object and all, under a very large glass dome, seal the system around the edges, and pump out all the air. That is, we use a vacuum pump to allow all the air to escape out from under the glass dome. What will the scale reading be now? Answer as precisely as you can at this point in time. [pause] And, in the space provided, briefly explain how you decided." Thus, students' ideas are elicited. (I encourage the reader to answer this question now as best, and as precisely, as possible.) Students write their answers and rationale. From their words a facet diagnosis can be made relatively easily. The facets associated with this cluster, "Separating medium effects from gravitational effects," can be seen in Figure 3-4. Students who give an answer of zero pounds for the scale reading in a vacuum usually are thinking that air only presses down and that "without air there would be no

JIM MINSTRELL FIGURE 3-3 Preinstruction question. Name School- Period Physics I.D. # Scale reading = 10.0 lbs. Scale reading = Briefly explain how you decided. 51 Teacher Nature and Effects of Gravity Diagnostic Quiz Problem 1. Glass dome with air removed ,~, lbs. weight, like in space" (facet 319~. Other students suggest a number "a little less than 10" because "air is very light, so it doesn't press down very hard, but it does press down some"; thus, taking the air away will only decrease the scale reading slightly (facet 318~. Other students suggest there will be no change at all. "Air has absolutely no effect on scale reading." This answer could result either from a belief that mediums do not exert any forces or pressures on objects in them (facet 314) or that fluid pressures on the top and bottom of an object are equal (facet 315~. A few students suggest that while there are pressures from above and below there is a net upward pressure by the fluid. "There is a slight buoyant force" (facet 310, an acceptable workable idea at this point). Finally, a few students answer that there will be a large increase in the scale reading "because of the [buoyant! support by the air" (facet 317~. The numbering scheme for the facets allows for more than simply marking the answers "right" or "wrong." The codes ending with a high digit (9, 8, and sometimes 7) represent common facets used by our students at the beginning of instruction. Codes ending in 0 or 1 are used to represent goals of instruction. The

52 STUDENT THINKING AND RELATED ASSESSMENT FIGURE 3-4 Separating medium effects from gravitational effects. *310 Pushes from above and below by a surrounding fluid medium lend a slight support (net upward push due to differences in depth pressure gradient) *310-1 The difference between the upward and downward pushes by the surrounding air results in a slight upward support or buoyancy. *310-2 Pushes above and below an object in a liquid medium yield a buoyant upward force due to the larger pressure from below. *311 A mathematical formulate approach (e.g., rhoxgxhl - rhoxgxh2 = net buoyant pressure) 314 Surrounding fluids don't exert any forces or pushes on objects 315 Surrounding fluids exert equal pushes all around an object 315-1 Air pressure has no up or down influence (neutral) 315-2 Liquid presses equally from all sides regardless of depth 316 Whichever surface has greater amount of fluid above or below the object has the greater push by the fluid on the surface. 317 Fluid mediums exert an upward push only 317-1 Air pressure is a big up influence (only direction) 317-2 Liquid presses up only 317-3 Fluids exert bigger up forces on lighter objects 318 Surrounding fluid mediums exert a net downward push 318-1 Air pressure is a down influence (only direction) 318-2 Liquid presses (net press) down 319 Weight of an object is directly proportional to medium pressure on it 319-1 Weight is proportional to air pressure. 319-2 Weight is proportional to liquid pressure latter abstractions represent the sort of reasoning or understanding that would be productive at this level of learning and instruction. Middle number codes repre- sent some learning. When data are coded, the teacher/researcher can visually scan the class results to identify dominant targets for the focus of instruction. Benchmark Instruction to Initiate Change in Understanding and Reasoning By committing their answers and rationale to paper, students express greater interest in coming to some resolution, in finding out what is "right." Students are now motivated to participate in activities that can lead to resolution. In the classroom this benchmark lesson usually begins with a discussion of students' ideas. We call this stage "benchmark instruction" since the lesson tends to be a

JIM MINSTRELL 53 reference point for subsequent lessons (diSessa and Minstrell, 1998~. It unpacks the issues in the unit and provides clues to potential resolution of those issues. In this stage, students are encouraged to share their answers and associated ratio- nales. Teachers attempt to maintain neutrality in leading the discussion, both to allow issues to be brought forth by students to maintain a focus on their thinking and to honor the potential validity of students' facets of knowledge and reasoning (van Zee and Minstrell, 1997~. Note that many of the ideas and their corresponding facets have validity. Facet 319: Some students have suggested a valid correlation between no air in space and no apparent weight in space. What they have not realized is that in an earth-orbiting shuttle one would likely get a zero spring scale reading, whether in the breathable air inside the shuttle or the airless environment outside. Facet 318: It is true that air is light, that is, its density is low relative to most objects we put in it. Air does push downward, but it also pushes in other directions. Facet 317: Air does help buoy things up, but the buoyant force involves a resolution of the upward and downward forces by the fluid, and that effect is relatively small on most objects in air (not so for a helium balloon). Facet 315: For many situations the difference between the up and down forces by air is so small that even the physicist chooses to ignore it. Thus, there is validity to most of the facets of understanding and reasoning used by students as they attempt to understand and reason about this problem situation. By now many threads of students' present understanding of the situation are unraveled and lay on the table for consideration. The next phase of the discussion moves toward allowing fellow students to identify strengths and limitations of the various suggested individual threads. "Is this idea ever true? When and in what contexts? Is this idea valid in this context? Why or why not?" After seeing the various threads unraveled, students are motivated to know "what is the truth." The teacher asks: "How can we find out what happens?" Students readily sug- gest: "Try it. Do the experiment and see what happens." The experiment is run, air is evacuated, and the result is "no detectable difference" in the scale reading in the vacuum versus in air. Facet-Informed Elaboration Instruction to Explore Contexts of Application of Other Threads Related to New Understanding and Reasoning The initial activity was to address facet 319, considered the problematic understanding. But many of the students also thought that air only pushed down or only pushed up. Additional discussion and laboratory investigations allow students to test the contexts of validity for other threads of understanding and reasoning. Other activities involving ordinary daily experiences are brought out for investigation: an inverted glass of water with a plastic card over the opening (the water does not come out), a vertical straw dipped in water and a finger placed over the upper end (the water does not come out of the lower end until the finger

54 STUDENT THINKING AND RELATED ASSESSMENT is removed from the top), an inverted cylinder is lowered into a larger cylinder of water (it "floats" and as the inverted cylinder is pushed down, one can see the water rise relative to the inside of the inverted cylinder), and a 2-liter, water- filled, plastic soda bottle with three holes at different levels down the side (uncapped, water from the lowest hole comes out fastest; capped, air goes in the top hole and water comes out the bottom hole). These activities address students' hypotheses consistent with facets 318 and 317. While each experiment is a new specific context, the teacher encourages the students to come to general conclusions about the effects of the surrounding fluid. "What can each experiment tell us that might relate to all of the other situations, including the original benchmark problem?" In addition to encouraging addi- tional investigation of issues, the teacher can help students note the similarity between what happens to an object submerged in a container of water and what happens to an object submerged in the "ocean of air" around the earth. A final experiment for this subunit affords students the opportunity to try their new understanding and reasoning in another more specific context. A solid metal slug is "weighed" successively in air, partially submerged in water (scale reading is slightly less), totally submerged just below the surface of the water (scale reading is even less), and totally submerged deep in a container of water (scale reading is the same as any other position, as long as it is totally sub- merged). From the scale reading in air, students are asked to predict (qualita- tively compare) each of the other results, do the experiment, record their results, and, finally, interpret those results. This activity specifically addresses the students' hypotheses associated with facets 316, 315, and 314. This task asks the students to relate these results and the results of the previous experiments to the original benchmark experience. By seeing that air and water have similar fluid properties, students are pre- pared to build an analogy between results. Weighing in water is to weighing out of water (in air) as weighing in the ocean of air is to weighing out of the ocean of air (in a vacuum). Thus, students are now better prepared to answer the original question about weighing in a surround of air, and they have developed a more principled view of the situation. Since students' cognition is associated with the specific features of each situation, a paramount task for instruction is to help students recognize the common features that cross the various situations. Part of coming to understand physics is coming to see the world differently, but the general principled view can be constructed inductively from experiences and from the ideas that apply across a variety of specific situations. The facets are our representation of the students' ideas. They originate and are used by the students, although they may be elicited from the students by a skilled instructor or within the design of assessment items. Thus, the generalized understandings and explanations are constructed by students from their own earlier ideas. In this way I am attempting to bridge from students' ideas to the formal ideas of physics.

JIM MINSTRELL 55 Assessment Embedded Within Instruction The facet assessment can also be embedded within instruction and served technologically. This could be Web served giving policy and program people information, but the system I will describe here is one that our teachers are using to make instructional decisions. Sometime after the benchmark instruction, after the class begins to come to tentative resolution on some issues, it is useful to give students the opportunity to individually check their understanding and reasoning. Although we sometimes administer these questions on paper in large-group format, we prefer to allow the students to quiz themselves when they are ready. To address this need for ongoing assessment, we have developed a computerized tool to assist the teacher in individualizing the assessment and keeping records on student progress. When students think they are ready, they are encouraged to work through computer- presented problems, appropriate to the unit being studied, using a program called DIAGNOSER (Levidow et al., 1991; Hunt and Minstrell, 1994~. The DIAGNOSER is organized into units that parallel units of instruction in our physics course. Within our example unit, there is a cluster of questions that focus on the effects of a surrounding medium on scale readings when attempting to weigh an object. Within each cluster the DIAGNOSER contains several question sets. Each set may address specific situations dealt with in the recent instruction to emphasize to students that we want them to understand and be able to explain these situations. Sets also may depict a new problem context related to this cluster. We want to continually encourage students to extend the contexts of their understanding and reasoning. Each question set consists of four screens. The first screen contains a phenomenological question, typically asking the student "what would happen if . . . ?" The appropriate observations or predictions are presented in a multiple- choice format with each alternative representing an answer derivable from under- standing or reasoning associated with a facet in this cluster. From the student's choice, the system makes a preliminary facet diagnosis. For example, in Figure 3-5 the choices are facet coded as 315.1 (for A),318.1 (B), 319.1 (C),310.1 (D), 318.1 (E), and 317.1 (F), respectively. The second screen asks the student: "What reasoning best justifies the answer you chose?" Again the format is multiple choice with each choice briefly paraphrasing a facet as applied to this problem context. For example, in Figure 3-6 the choices are facet coded as 319.1 (A),318.1 (B), 315.1 (C),317.1 (D), and 310.1 (E). From the student's choice of answer to the reasoning question, the system makes a second diagnosis. Each screen also has an alternative "write a note to the instructor" button beneath the portion with the multiple choices. Clicking on this option will allow students to leave a note about their interpretation of the question, about their difficulties with the content, or if they have an answer other than the choices

56 STUDENT THINKING AND RELATED ASSESSMENT FIGURE 3-5 Phenomenological questions. FIGURE 3-6 Reasoning question.

JIM MINSTRELL 57 offered. These notes can be scrutinized by the teacher/researcher to assist indi- vidual students, to improve DIAGNOSER questions and to modify activities to improve instruction. Students also are allowed to move back and forth between the question and the reasoning screens. This is done to encourage students to think about why they have answered the question the way they have, to encourage them to seek more general reasons for answering questions in specific contexts. The "reasons" screen is followed by a diagnosis feedback screen. What this screen says depends on precisely what the student did on the question screen and the reasoning screen. The feedback basically says whether the student's answer and reasoning choices appear to be consistent. Consistency is important in all empirical and rationally based systems, especially in science. Then the card tells the student whether there seems to be a problem with his or her answer and/or reasoning. For example, if a student chose answers that were consistent but problematic, a screen like that in Figure 3-7 would appear. The fourth screen in each sequence is the prescription screen. If a student's answers are diagnosed as being associated with productive understanding and reasoning, the student is mildly commended and is encouraged to try more ques- tions to be more sure. The rationale here is that it should be recognized that while the student's ideas seem OK in this one context, overcongratulating the student may allow him or her to get by with a problematic idea that just did not happen to show up in this problem situation. If the student's answers were diagnosed as potentially troublesome, they are issued a prescription associated with the problematic facet. Typically, the student is encouraged to think about how his or her ideas would apply to some common everyday experience or to do an experiment he or she may not yet have done. In either case the experience was chosen because the results will likely challenge the problematic facet apparently invoked by the student. For example, if the student had chosen answer E for the phenomenological question and B for the reasoning question, the system would serve the screen with a prescription consistent with facet 318.1, as in Figure 3-8. FIGURE 3-7 Feedback for consistent problematic answer.

58 STUDENT THINKING AND RELATED ASSESSMENT FIGURE 3-8 Prescriptive lessons for facet 318.1. The DIAGNOSER is run in parallel with other instructional activities going on in the classroom. Some students are working on DIAGNOSER, while others are working in groups on problem solving or additional laboratory investigations. In the case of our example subunit, the class may even be moving ahead into the next subunit. Students work on the program individually or in small groups of two (ideally). When they are finished with a session, they are presented with a summary of their performance but are not graded on it. It is a tool to help them assess their own thinking and a tool to help the teacher assess additional instruc- tional needs for the class as a whole or for students individually. It is also a device to assist the students and teacher in keeping a focus on understanding and reasoning. For additional information on DIAGNOSER, see the following two Web sites: http://weber.u.washington.edu/~huntlab/diagnoser/facetcode.html and www.talariainc.com. As of the writing of this paper, DIAGNOSER can be down- loaded from the first site. Newer versions of DIAGNOSER-type assessment systems will be available on both sites as the assessments are ready. Application of Ideas and Further Facet Assessment of Knowledge A unit of instruction may consist of several benchmark experiences and many more elaboration experiences together with the associated DIAGNOSER sessions. Sometime after a unit is completed, students' understanding and reason- ing are tested to assess the extent to which instruction has yielded more productive

JIM MINSTRELL 59 understanding and reasoning. When designing questions for paper-and-pencil assessment, we attempt to create at least some questions that will test for exten- sion of application of understanding and reasoning beyond the specific contexts dealt with in class. Has learning been a genuine reweave into a new fabric of understanding that generalizes across specific contexts or has instruction resulted in brittle, situation-bound knowledge? In general, we find that students' answers and reasoning become progressively more consistent and they progress toward the goal facets, as will be seen under "Results." In designing tests the cluster of facets becomes the focus for a particular test question. In the example cluster here, test questions probe students' thinking about situations in which the local air pressure is substantially changed. Have students moved from believing that air pressure is the cause of gravitational force? Other questions focus on interpreting the effects of a surrounding medium, as they help us infer the forces on an object in that fluid medium. Do students now believe that the fluid pushes in all directions? Do they believe greater pushes by the fluid are applied at greater depths? Can they integrate all of these ideas together to correctly predict, qualitatively, what effects the fluid medium will have on an object in the fluid? In future units dynamics, for example do students integrate this qualitative understanding of relative pushes to identify and diagram relative magnitudes of forces acting on submerged objects? DIAGNOSER- type questions, like those in Figure 3-9, can be used on end-of-unit or end-of-term tests as well as being used as assessment embedded in instruction. Whether the question is in multiple-choice or open-response format, we attempt to develop a list of expected answers and associated rationale based on the individual facets in that cluster. After inventing a situation context relevant to the cluster, we read each facet in the cluster, predict the answer, and characterize the sort of rationale students would use if they were operating under this facet. Assuming we have designed clear question situations and our lists and character- izations of facets are sufficiently descriptive of students' understanding and rea- soning, we can trace the development of their thinking by recording the trail of facets from preinstruction, through DIAGNOSER, to postunit quizzes and final tests in the course. Results For the sample of results described in this section, we continue to focus on separating gravitational effects from effects of the surrounding fluid. The answers for each question associated with diagnosis or assessment were coded using the facets from the cluster for "Separating medium effects from gravitational effects" (see Figure 3-4, 310 cluster of facets). The preinstruction assessment called for free-response answers (Figure 3-3~. On it 3 percent of our students wrote answers coded at the most productive level of understanding (see Table 3-1~. On the embedded assessment (DIAGNOSER),

60 STUDENT THINKING AND RELATED ASSESSMENT FIGURE 3-9 Examples of other relevant DIAGNOSER questions.

JIM MINSTRELL 6 FIGURE 3-9 Examples of other relevant DIAGNOSER questions.

62 STUDENT THINKING AND RELATED ASSESSMENT FIGURE 3-9 Examples of other relevant DIAGNOSER questions.

JIM MINSTRELL 63 FIGURE 3-9 Examples of other relevant DIAGNOSER questions.

64 STUDENT THINKING AND RELATED ASSESSMENT TABLE 3-1 Student Preinstruction Predictions for Scale Reading Scale Readinga Percent Facet Code s201bs. 2 317 20 > s > 11 11 317 11 s > 10 3 310 s= 10 35 314/315 10 > s 2 9 12 318 9 > s > 1 17 318 1 2 s20 20 319 Note: Table is ordered by predicted scale reading answer followed by the inferred facet associated with that answer. aRepresents the predicted scale reading. after students completed the elaboration experiences for a similar multiple-choice question and reasoning combination, 81 percent of the answers to the phenom- enological question and 59 percent of the answers to the reasoning were coded 310. Apparently revisiting the "object in fluid" context in subsequent instruction helped maintain the most productive level of understanding and reasoning about buoyancy at nearly the 60 percent level. By the end of the first semester, the class had integrated force-related ideas (statics and dynamics) into the context of fluid effects on objects submerged in the fluid medium. On a question in this area 60 percent of the students chose, and then briefly defended in writing, an answer coded 310. On the end-of-year final, 55,56, and 63 percent of students chose the answer coded 310 on three related questions. At the other end of the understanding and reasoning spectrum of facets is a substantial development away from believing that "downward pressure causes gravitational effects" (facet 319) and "fluid mediums push mainly in the down- ward direction" (facet 318~. On the free-response preinstruction assessment, these two facets accounted for 49 percent of the data (see Table 3-1~. In the DIAGNOSER those facets accounted for about 5 to 20 percent of the data. Similar results were achieved on both the first- and second-semester finals. Much of this movement away from the problematic "pressure down" facets did not make it all the way to the most productive facet. Much student thinking moved to intermediate facets that involve thinking that there are no pushes by the surround- ing fluid of air (facet 314) or that the pushes up and down by the surrounding air are equal (facet 315~. Most of the students were not stuck on these intermediate facets in the water context. This makes sense since they have direct evidence that water pressure at different depths causes a difference in the scale reading. In the air case the preponderance of the evidence is that if there is any difference because of depth it does not matter (e.g., force diagrams on a metal slug hung in the classroom do not usually include forces by the surrounding air). Even low

JIM MINSTRELL 65 achieving students made significant gains (see Tables 3-2 and 3-3~. The semester test questions used were similar to the DIAGNOSER questions shown earlier. Also from Tables 3-2 and 3-3 it can be seen that individual students do not always answer in consistent fashion. Across items and across time individual students exhibit various facets of thinking. Which pieces of their knowledge and understanding are brought to a particular problem depend on the features of the problem. Early in the instruction it is the salient physical or verbal features of the problem. At this time there is considerable inconsistency between their answers to problems that might be seen as similar when the questions are organized by formal topic (recall the questions about Newton's Third Law). Later in the TABLE 3-2 Development of Understanding and Reasoning: Forces by Surrounding Air on Objects Facet Code Preair Prewater Seml Sem2 310 16, 55 64, 72, 74 5, 16, 19, 21, 25, 64, 72 315 66, 74 16, 31, 53, 8, 27, 31, 53, 55, 66, 69 66, 69, 74 317 21 318 7, 19, 25, 27, 5, 7, 8, 19, 7 64, 69 21, 27, 55 319 5, 8, 31, 53, 72 25 Note: Numbers at right are identification numbers for 16 low-achieving students. TABLE3-3 Development of Understanding and Reasoning: Forces by Surrounding Water on Objects (four days after preair) Facet Code Preair Prewater Seml Sem2 310 7,8,16,25 5,7,8, 16, 19, 5,7,16,19,21 55, 64, 74 21 ,27, 53, 55, 25, 27, 31, 53, 55, 64, 66, 72, 74 64, 66, 72, 74 315 5, 19,27, 66 25, 31, 69 69 317 318 319 21, 31, 53, 69 72 Note: Numbers at right are identification numbers for 16 low-achieving students.

66 STUDENT THINKING AND RELATED ASSESSMENT instruction, as students become more expert like, their answers are based on threads of experience and understanding that are more principle based (Chi et al., 1981~. Their answers become more consistent and converge on the target under- standings. Apparently about half of our students came to physics instruction believing that air and perhaps even water pressure effects are mainly in the downward direction. By the end of the year, through early specific instruction and later revisiting, this belief was greatly reduced, and over half of the students were able to demonstrate good productive understanding of buoyant effects. Given that this is a difficult topic conceptually even for many physics teachers, these results are encouraging. Similar facet-based instruction is now being used by many physics teachers and some curriculum developers (Hunt and Minstrell, 1994~. Facet-based instruc- tion has also been effective in the learning of introductory statistics and in train- ing health care providers in the management of pain. IMPLICATIONS FOR LARGE-SCALE ASSESSMENT The examples given above are primarily from the classroom. That is the source of most of our specific experience with facet-based assessment. But the classroom is also where the results of large-scale assessments must make sense and be useful if the large-scale assessments are to help effect reform and result in better learning. We are beginning to explore the application of facet-based assess- ment to large-scale assessment. Large-scale tests like the National Assessment of Educational Progress or the state assessments could include facet-indexed foils that could inform policy, program, and practice. While the preceding material is based on many years of research and practice, below are some speculations as we begin our exploration. Implications for Policy, Program, and Practice The National Science Education Standards advocate reform in assessment as well as curriculum and instruction (National Research Council, 1996~. The test items and ranking purposes of the typical normative-based assessment system will not be sufficient. Universities and employers may still need to rank appli- cants against each other, and that has been accomplished reasonably well by normative testing, such as the SAT (Scholastic Assessment Test). But in a standards-based system, large-scale assessment needs to compare the perfor- mance of the unit (state, district, school, or individual) with the standard. There is a choice to be made for the criteria for making the comparison. One could set the large-scale standard to be a certain score that is deemed sufficient for certifi- cation. But such action would sidestep the intent of the standards effort. We would not know what the troubles are at a level of specificity that can help decide

JIM MINSTRELL 67 what to do about them. This would not be much different from what we presently have with respect to assessment. Suppose instead that the learning target standards are integrated with the problematic understandings in facet clusters. Multiple-choice foils, or the rubrics for coding open-response items, could be tuned to the facets. Such a large-scale assessment system would be able to check on accountability for policy and pro- gram revision, but it would also allow sufficiently rich feedback to inform the system about what troubles exist. From identification of specific troubles, teach- ers and others creating or adapting a curriculum could design or choose lessons to address the problematic issues. What might a test based on facets be like? To characterize thinking in any one cluster for a group of students would likely require incorporating two DIAGNOSER-type items, like those shown earlier, to each form of the test. If the two items incorporate the reasoning as well as the phenomenological ques- tion, that is like having four subitems per cluster. From our experience respond- ing to these items takes about 1 minute per subitem for a total of about 4 minutes per cluster. At that rate we could test for 15 clusters per 60-minute test. For our physics program there are about 40 clusters, but several are not unique to physics. If a large-scale test is to cover the learning in science over a three-year period, I estimate that would represent about 100 clusters. (Note: that is not 100 topics. For example, the topics of force and motion would be represented by about eight clusters.) For large-scale assessment in which not every student needs to take the same test, sampling procedures could be used to cover all clusters. Analysis from such an assessment could provide information about specifically where students were having trouble in each cluster. This is the sort of feedback that can inform curriculum and instruction decisions as well as teachers about what needs to be focused on in the classroom. It seems that something like this procedure could be used for NAEP and some state tests. What about large-scale assessments where all students are to take equivalent forms of the same test? For example, in Washington state all students at grade 10 need to obtain a certificate of mastery in science. Would that imply that all students would have to be tested over the same clusters and that from one year to the next the clusters must be the same? Presently test developers include items in topical headings. If each topic contains several clusters, perhaps test developers could have the freedom to choose items from within clusters under the given topic heading. For example, the test contractor for the state of Washington was to choose or design two or three items associated with each topical strand. There are about 40 topical strands in the state science standards. Within each topical strand, I estimate there would be two or three clusters. Thus, I believe a facets and facet cluster base could be used as the basis for constructing and choosing items instead of using traditional methods or current ones. In this way the state would be able to certify students as meeting the general standard for science

68 STUDENT THINKING AND RELATED ASSESSMENT using a score from reduced test data. But the school could get facet cluster-based data from which to make program decisions, and teachers could get facet-based data from which they could make instructional decisions to improve practice and learning. A facet-based system can also be used to tune expected learning targets. The setting of our present national and state standards is based to a considerable extent on what we "want our students to know and be able to do." To a much lesser extent, standards efforts have incorporated some implications from research on what students "do know and are able to do," especially when we set goals for "all" students. We could consider these goals as the top-level facets, but much more research is needed to determine the problematic constructions by students on their way to the goal (Minstrel!, in preparation). This sets an item on the agenda for research. In the past, research on learning was set largely in clinical or classroom situations designed to teach particular topics, not particularly tuned to learning the standards. To the extent that we collectively believe the standards that have been set are the goals we want to achieve, we need to direct research on learning in the disciplines toward identifying the problematic issues and under- standing on the way to the goals. Then in our teaching experiments the problem- atic ideas become the focus of our design of curriculum and instruction as we attempt to guide students toward the standards. Facet-based assessment can provide information from which we can decide expected levels of understanding. If we had characterizations of various under- standing and reasoning for students nationally, we might be better able to identify reasonable targets for learning. For example, using the previously stated results, is it reasonable to assume that all high school students can achieve the 310 level of understanding for the air contexts as well as the water contexts? For air contexts we might be willing to set the standard bar at 317 (air has some buoyant upward force somehow) or 315 (air pushes from above and below are equal). Yet requiring a 310 standard for all with respect to understanding water contexts seems reasonable, since we see (from Tables 3-2 and 3-3) that the water context is more achievable, even by lower achievers. Thus, a facet-like system can provide information for making cost-benefit decisions. For example, knowing that low-achieving students were diagnosed at 310 on the water cluster but at 315 in the air cluster would suggest that better activities are needed for demonstrating the similarity of fluid characteristics of air and water. Can we afford the extra instructional time to get from one level of understanding to the higher level? Should we invest the extra time? For making practical classroom "next day" decisions, one or more facet- based questions can be used during one class period to inform the teacher about tomorrow's needs. More questions per cluster will be needed in the long run for periodic monitoring of learning by the teacher. Except on unit exams, the results of the monitoring can be low-stakes assessment, with grades assigned only on the

JIM MINSTRELL 69 basis of honest effort. Meanwhile, the results provide data from which teachers can make decisions on what might happen next. Developing students' understanding in a cluster takes instructional time. Deep learning cannot be hurried. Judging from our experience in classes, it took four to five hours in class for students to develop their understanding in one cluster, like the clusters already demonstrated. Other clusters, such as the three for developing ideas of length, area, and volume, can be taken together as part of coming to understand spatial extent, about five hours at the high school level. Still other clusters involve the processes of scientific thinking and can be assessed across some of the other more subject-matter-oriented clusters. For example, the cluster for the meaning of explanation in science (Figure 3-10) can be applied across items that ask for explanation of specific events (e.g., explaining falling bodies or interpreting the resulting offspring from parent plants). As can be seen from Tables 3-2 and 3-3, not all of the understanding comes during the four days of instruction in that cluster. Some comes through revisiting the ideas and issues in subsequent subunits around related clusters. Thus, districts, departments, and individual teachers need to decide which clusters are more important or more difficult for their students and choose or design instruction to develop the more important ideas. Need for Ongoing Research on Learning and Teaching Although we have a good start for developing facets as they apply to high school physics, substantially more research needs to be done to characterize students' thinking across sciences and across grade levels. Consistent with this FIGURE 3-10 Cluster: explanations or interpretations of phenomena. *050 Explanations or interpretations involve conceptual modeling of multiple related science or math concepts, using experimental evidence and rational argument to address questions of "how do you know . . . ?" or "why do you believe the results, observation, or prediction?" *051 Explanation involves a mathematical modeling approach, incorporating principles subsumed under that model. 053 Explanation involves identifying possible mechanisms involving a single concept causing the result. 055 Explanation involves identifying and stating a relevant concept. 057 Explanation constitutes a description of procedures that led to the result. 059 Explanations or interpretations are given by repeating the observation or result to be explained.

70 STUDENT THINKING AND RELATED ASSESSMENT vision, we initiated an investigation into students' facets of thinking in probability and statistics at the introductory level at the university (Schaffner et al., 1997~. In collaboration with the University of Washington, the State Commission on Student Learning, selected school districts, and Talaria, Inc., Earl Hunt and I are directing the building of an assessment system to serve teachers as they focus on students' learning. This project involves identifying facets and developing a facet- based system for classroom assessment in the physical sciences and mathematics relevant to the quantitative sciences for grades 6 through 10 for Washington state. To follow this development, see the Web sites at http://weber.u.washington.edu/ huntlab/diagnoser/facetcode.html and www.talariainc.com. Building a base of facets and facet clusters involves setting particular learn- ing goals and doing the research to describe students' thinking in intermediate positions on the way to those goals. The top-level facets need to be described at a level of specificity that includes all of the "pieces" of knowledge and process- ing necessary to operationally define the goal. For our example 310 facet, the description fully written out is about a third of a page long. Defining these goals at this level requires deep knowledge of the content domain. For a large-scale facet assessment, the goals of learning will need to be carefully and specifically articulated. To identify the other facets requires research. What do learners say and do when confronted with situations relevant to the learning goal? Some of the research on students' conceptions exists in the literature, but much more needs to be done in the context of the classroom. When we were building our present version of the facets, we identified situations or tasks we thought students should be able to explain if they had the goal understanding. Ideally, the tasks also involved many of the key issues related to the cluster. We collected 50 or so student responses to each task. As we read the responses, we sorted them accord- ing to similarities in answers and reasoning. Then we attempted to characterize the similarities among the several responses in one pile. Each characterization was the first try at identifying a facet. Next, using another task that was relevant to the same learning goal, the process was repeated for the responses to that second task. If the characterization of one of the piles for this set seemed similar to the characterization for a stack from the other set, we began to think we had validity and reliability for identifying that particular facet. But since particular tasks elicit particular ideas, not finding a similar pile for the second task analysis did not mean that the facet was not valid. The showing of a particular facet typically depends on context as well as content. To validate the facets associated with large-scale assessment would necessitate substantially more research on students' understanding of critical ideas in multiple contexts. Once several facets in a particular cluster are identified, they can be used to predict typical responses on other tasks related to the cluster. It takes creativity to come up with novel problematic situations, but then the facets can be used to

JIM MINSTRELL 71 suggest responses to open-ended questions or to create foils for multiple-choice questions. A facets perspective offers an opportunity to apply statistical analyses to determine prerequisite knowledge for the development of understanding of more complex ideas. Participation in large-scale assessment offers the opportunity to do research to determine what development is dependent on the development of what other ideas. Research on learning and teaching can benefit from develop- ment of understanding of students' facets of thinking resulting from large-scale assessment. Statistical analyses of large-scale test data could yield information on what facet in one cluster is related to what facets in other clusters. Thus, research on learning can identify what facets are in an ecological relationship (one of mutual existence and support) with other facets. Such research could serve curriculum program designers about what ideas to address as a set. Computerized tools can assist teachers or large-scale assessment systems in diagnosing facets and handling electronically posted data from students. Univer- sity of Washington colleagues Adam Carlson and Steve Tanimoto are building a computerized system for facet coding of electronically submitted open responses to questions and problems. Another colleague, Aurora Graf, has designed a DIAGNOSER-type module to address facets or thinking about ratio reasoning for middle-level students. SUMMARY Through a better understanding of students' thinking, we can characterize the sorts of problematic understandings that students exhibit on their way to learning goals. We can create facet clusters and individual facets. Using facet assessment can help teachers identify needs for particular learning activities. Curriculum developers or teachers adapting curriculum can better know and understand the targets for the lessons they engineer. Facet assessment can be used to monitor students' progress in the classroom. Large-scale facet-based assessments can identify particular curricular needs or suggest the need to revise standards or learning goals to make them more appropriate developmentally or with respect to time and other available resources. Finally, large-scale facet- based assessment will require support to clearly specify learning goals and research to identify more than just the "right" answers. Through facets and tasks related to targeted facet clusters the thinking of large groups of students can be characterized and reported. From facet descrip- tions of groups of learners, policy and program decisions can be informed. Feed- back and recommendations, specific to the facets, can be presented to teachers in the classroom and they can be better informed about what specifically to do to effect better learning.

72 STUDENT THINKING AND RELATED ASSESSMENT ACKNOWLEDGMENTS Several colleagues over the years have influenced this work or assisted in its progress. Arnold Arons, John Clement, Andrea diSessa, Virginia Stimpson, Dorothy Simpson, Emily van Zee, and Earl Hunt have contributed to the genera- tion or revision of the ideas. They deserve much credit. Tens of other teachers and thousands of students have tested the ideas. I also want to thank the adminis- trations of the school distncts, especially Mercer Island School Distnct, for their willingness to allow their teachers to think about facet assessment and the effects it can have on teaching and learning in the classroom. The research and development described in this paper were supported by grants to Mercer Island School Distnct and the University of Washington from the James S. McDonnell Foundation Program for Cognitive Studies for Educa- tional Practice and the National Science Foundation: Program for Research in Teaching and Learning. Preparation of this paper was supported in part by a grant from the National Science Foundation to Talana, Inc., a small research and development company that creates facet-based assessment and learning environ- ments. The ideas expressed here are those of the author and do not necessarily reflect the beliefs of the sponsoring foundations. REFERENCES Bruer, J. 1993 Schools for Thought: A Science of Learning in the Classroom. Cambridge, Mass.: MIT Press. Chi, M., P. Feltovich, and R. Glaser 1981 Categorization and representation of physics problems by experts and novices. Cognitive Science 5:121-152. diSessa, A. 1993 Toward an epistemology of physics. Cognition and Instruction 10(2-3):105-226. diSessa, A., and J. Minstrell 1998 Cultivating conceptual change with benchmark lessons. In Thinking Practices in Learning and Teaching Science and Mathematics, J.G. Greeno and S. Goldman, eds. Mahwah, N.J.: Lawrence Erlbaum Associates. Driver, R., A. Squires, P. Rushworth, and V. Wood-Robinson 1994 Making Sense of Secondary Science: Research into Children's Ideas, New York: Routledge. Duit, R., F. Goldberg, and H. Niedder (eds.) 1991 Research in Physics Learning: Theoretical Issues and Empirical Studies: Proceedings of an International Workshop held in Kiel, Germany. Institute for Science Education. Gabel, D. (ed) 1994 Handbook of Research on Science Teaching and Learning. New York: MacMillan. Hunt, E., and J. Minstrell 1994 A cognitive approach to the teaching of physics. In Classroom Lessons, K. McGilly, ed. Cambridge, Mass.: MIT Press. Levidow, B., E. Hunt, and C. McKee 1991 The Diagnoser: A HyperCard tool for building theoretically based tutorials. Behavior Research Methods, Instruments, and Computers 23(2):249-252.

JIM MINSTRELL 73 McCloskey, M., A. Caramazza, and B. Green 1980 Curvilinear motion in the absence of external forces: Naive beliefs about the motion of objects. Science 210:1139-1141. Minstrell, J. 1989 Teaching science for understanding. In Toward the Thinking Curriculum: Current Cog nitive Research, L. Resnick and L. Klopfer, eds. 1989 Yearbook of the Association for Supervision and Curriculum Development, Alexandria, Virginia. 1992 Facets of students' knowledge and relevant instruction. Pp. 110-128 in Research in Physics Learning: Theoretical Issues and Empirical Studies: Proceedings of an Inter national Workshop held in Kiel, Germany. R. Duit, F. Goldberg, and H. Niedderer, eds. Kiel, Germany: Institute for Science Education. Minstrell, J., and V. Stimpson 1996 A classroom environment for learning: Guiding students' reconstruction of understand- ing and reasoning. In Innovations in Learning: New Environments for Education, L. Schauble and R. Glaser, eds. Mahwah, New Jersey: Lawrence Erlbaum Associates. National Research Council 1996 National Science Education Standards. Washington, D.C.: National Academy Press. Project 2061 1993 Benchmarks for Science Literacy. New York: Oxford University Press. Schaffner, A., D. Madigan, A. Graf, E. Hunt, J. Minstrell, and M. Nason 1997 Benchmark lessons and the World Wide Web: Tools for teaching statistics. In: Proceed- ings of the Second International Conference on the Learning Sciences, D.C. Edelson and E.A. Domeshek (eds.). Evanston, Ill: Northwestern University. van Zee, E., and J. Minstrell 1997 Reflective discourse: Developing shared understanding in a physics classroom. Inter- national Journal of Science Education 19(2):209-228.

Next: 4 An External Evaluation of the 1996 Grade 8 NAEP Science Framework »

Grading the Nation's Report Card: Research from the Evaluation of NAEP (2000)

Chapter: 3 Student Thinking and Related Assessment: Creating a Facet-Based Learning Environment

Welcome to OpenBook!

Get Email Updates