Read "Assessment in Support of Instruction and Learning: Bridging the Gap Between Large-Scale and Classroom Assessment: Workshop Report" at NAP.edu

« Previous: 4. Some International Examples

Page 26 Cite

Suggested Citation:"5. Assessment to Improve Learning." National Research Council. 2003. Assessment in Support of Instruction and Learning: Bridging the Gap Between Large-Scale and Classroom Assessment: Workshop Report. Washington, DC: The National Academies Press. doi: 10.17226/10802.

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Page 34 Cite

Page 35 Cite

Page 36 Cite

Page 37 Cite

Page 38 Cite

Page 39 Cite

Page 40 Cite

Page 41 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Assessment to Improve Learning The U.S. programs presented at the workshop were selected based on an informal review of efforts in states and districts to put into practice the goal of aligning their assessment systems with standards and curricula. The committee acknowledges that there are many more states and districts that are working to bridge the gap between classroom and large-scale assessments, but exploring more than a few at the present workshop was beyond its charge. The selected programs are, however, exemplary in that they are making progress towards goals of the kind identified by the committee. Not all of the programs have articulated their goals in the same terms the committee had identified, but all share a commitment to using assessments to improve learning, and were seen as evidently meeting at least one of the criteria (see summary in Chapter 2, and Boxes 2-1 and 2-2~. NEBRASKA: SCHOOL-BASED TEACHER-LED ASSESSMENT RECORDING SYSTEM Nebraska is an interesting state to consider first because it had no statewide assessment program at all until 2000, and thus had the benefit of many years to observe the efforts of other states before initiating its own program. As Patricia Roschewski, director of assessment for the Nebraska Department of Education, explained, the state had first developed academic standards in 1998, and had decided that it needed an assessment program for two reasons. First, the state wanted to collect information about student performance that could be used to improve instruction. Second, it wanted accountability data that could be shared 26

ASSESSMENT TO IMPROVE LEARNING 27 with the public. Nebraska was clear in wanting the primary stakeholders in the system to be students and teachers, rather than policy makers, and this decision led it to give teachers a key role in the assessment program. The Nebraska program's title, the School-Based Teacher-Led Assessment Recording System (STARS), is a very brief summary of the goals the state had developed, and, as Patricia Roschewski explained, the focus on teachers led it to devote a considerable proportion of the available resources to professional devel- opment and support. Many Nebraska districts had developed their own assess- ment systems, mostly criterion-referenced and classroom-based, but the state perceived that teachers and administrators generally had had very little training in assessment issues. STARS is essentially a way of building on existing local assessments to meet the new statewide goals. Under STARS, goals based on the state standards are set for each district and school, and clearly articulated so that students, parents, and everyone else con- cerned understands the expectations for learning. For accreditation purposes, each district gives a norm-referenced test, such as the Terra Nova or the SAT 9, which typically covers some 35-40 percent of the standards. The remaining 60-65 percent is measured using classroom-based assessments developed by teachers; local teachers and administrators can blend these with activities and assessments dictated by district curricula in whatever ways they choose. The state monitors these assessments using a national advisory panel made up of assessment experts and Nebraska educators. This panel reviews and rates assess- ment portfolios prepared by the districts over a period of several months each summer. Districts whose methods are not successful receive further support and training; exemplary methods are shared around the state. The system, Roschewski explained, works in part because the local curricula and state standards are closely aligned and clearly understood, and in part because intensive training has built the "assessment literacy" of the educators who are responsible for the bulk of the assessment. Nebraska teachers who once had little reason to think about issues such as validity and reliability are now responsible for ensuring that they assess their students in ways that stand up to professional scrutiny. The state had not thought in terms of bridging a particular gap, said Roschewski, but rather had sought to focus on balancing and integrating a new element the desire for more feedback that teachers and students could use to improve learning and for information that could be used for accountability into a system without disrupting the balance it had already achieved. DELAWARE: COMPREHENSIVE SCIENCE ASSESSMENT Delaware provides another example of a program that involved a significant amount of teacher training to increase assessment literacy. Its comprehensive science assessment grew out of the state' s commitment to improve science learn- ing. Rachel Wood, education associate at the state's Science Resource Center,

28 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING described a process that began in 1992 with the development of state standards in science. A needs assessment revealed that science was indeed getting short shrift; in the elementary grades it was often taught for as little as forty-five minutes a week. Curriculum materials were developed, and the state focused on identifying explicit learning goals for the topics outlined in the standards. For example, a requirement that fourth graders study electricity was broken down into precise descriptions of the key concepts related to electricity that were to be mastered. Attention was paid at the same time to both cognitive and practical factors that would affect articulation so that the prerequisites for meeting the curricular objectives were accomplished grade by grade. Once the state was pleased with its curricular units, it took a look at the accompanying end-of-unit assessments, and was not satisfied. In particular, it found that the scoring rubrics were generic and provided little useful diagnostic information. The state wanted to obtain summative information that could be used for accountability purposes from assessments that were closely linked to the curriculum, but also wanted the assessments to give teachers clear feedback they could use to improve their instruction. Delaware wanted specific data about how students were faring with particular elements of the curriculum, and it wanted assessments that would be part of a continuous loop of feedback and improve- ment, thus fostering a community in which teachers and students shared a sense of the purpose of and expectations for science learning. The state made the decision that teachers should be heavily involved in the assessment process, and that a significant investment in professional development was needed. One of the innovations Delaware instituted was in direct response to the need for diagnostic assessment data. Using a system of double-digit scoring rubrics, modeled after a strategy used in the performance component of the Third Inter- national Mathematics and Science Study, educators could collect not only data showing how well students did with particular items, but also data on the kinds of misconceptions that kept them from complete understanding. In this system, the first digit works in the same way many rubrics do, indicating that a response is completely or partially correct. The second digit indicates the nature of miscon- ceptions expressed in the answer (and raters are trained to recognize and code these) so that teachers can see what is missing in their students' understanding. Moreover, widespread misconceptions can often be traced to areas of the curricu- lum that are not adequately addressed, or to ambiguities in texts or materials. The state recognized that few teachers had sufficient background in assess- ment issues to meet the emerging needs. With some outside resources, the state provided intensive professional development for a cadre of teachers, who could then branch out and work with other teachers. Not only did teachers undergo training to improve their understanding of assessment issues as well as their capacities for making use of both formative and summative assessments, they were also increasingly linked together in the kind of community of learners referred to earlier by Dylan Wiliam. Using shared materials available online,

ASSESSMENT TO IMPROVE LEARNING 29 including assessments, rubrics, and student work, as well as professional devel- opment activities, teachers were encouraged to share ideas about specific goals for student learning and ways to help their students meet them. VERMONT: THE VERMONT ASSESSMENT SYSTEM AND THE PARTNERSHIP FOR THE ASSESSMENT OF STANDARDS-BASED SCIENCE Vermont was the subject of national attention in the spring of 2002, when it announced that it was considering foregoing public education funds so that it would not have to comply with all of the assessment requirements of the No Child Left Behind Act. The state subsequently decided to accept the funds and is now trying to work out a way to satisfy the new federal requirements using locally designed assessments as well as statewide, large-scale assessments, as it has been doing for a number of years. Vermont's existing assessment program was designed to rely in part on a formal set of assessment tools developed or selected by districts, or, in some cases, by schools, to meet their specific needs. As described by Bud Myers and David White, assessment coordinators in the Vermont Department of Education, the state's goals for its local assessments are very clear. Assessments are to be linked to state and local content standards, provide information that is valued at the local level, support teaching and learning, meet tough standards of reliability and validity, and be part of a continuum of assessment strategies that serve a range of purposes at the national, state, district, school, and program level, includ- ing both evaluation and feedback to students. Vermont has developed an infrastructure both to support teachers and administrators in carrying out assessments and to ensure that the local assess- ments meet quality standards. Technical advisory panels oversee the quality of local assessments. Materials are provided to guide the development of local assessments, and exemplary assessment tools, item banks, and other resources are posted on a website accessible throughout the state. Review panels continu- ously evaluate assessment tools, and summer institutes help teachers keep up to date on assessment strategies. The state has built professional development for both teachers and administrators into the system, and has developed master's degree programs for teachers with an incomplete command of the mathematics and science knowledge needed to teach the content outlined in the standards. A part of Vermont' s assessment system is the Partnership for the Assessment of Standards-Based Science (PASS) program, which is a commercially available standards-based science assessment developed by WestEd. Kathy Comfort, prin- f

30 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING cipal investigator and director of PASS, described how the program fits Vermont's goals and dovetails with the larger question of integrating large-scale and class- room assessments. PASS was originally developed as a large-scale assessment that states and districts could use to measure their students' performance and growth in science against national standards and learning goals. PASS also meets the science assessment requirements of the No Child Left Behind act. The PASS assessment is aligned with the content recommendations of the National Science Education Standards (NRC, 1996) and the American Association for the Ad- vancement of Science's Benchmarks for Science Literacy (1993~. It incorporates multiple measures enhanced multiple-choice questions, hands-on performance tasks, constructed-response investigations, and open-ended questions to get at different kinds of knowledge and skills. WestEd staff worked closely with Ver- mont officials to customize the assessment to Vermont's standards and learning goals. In response to feedback from PASS users, WestEd is developing ways that the program could also be used to help inform instruction and guide professional development. WestEd is using PASS to conduct research on the relationship among different assessment components, instructional practices, and student achievement, and on teachers' understanding of large-scale assessment results and the uses they make of the results in their classroom practice. Vermont teachers develop school and classroom science assessments using the methodol- ogy and learning goals of the PASS assessment. Teachers are also involved in developing items and in scoring, which provides an opportunity for large numbers of them to focus on specific performance expectations, and to share information and ideas. While Vermont is proud of what it has done to make local assessments an integral part of its system, Bud Myers discussed some of the issues that are still of concern. Questions have arisen about how to keep the local assessments secure, and also about ways to make sure all the stakeholders find them credible. Perhaps foremost, however, is the question of resources. A significant degree of profes- sional development, in both content and assessment issues, has been required to achieve current levels of competence. Myers raised concerns about both the funding and time that will be required to keep the program moving forward. He also cited the requirements of the No Child Left Behind Act, noting that they are not readily compatible with a system that relies as heavily as Vermont does on local assessments. Adding additional assessments to meet the requirement would substantially increase the assessment costs the state will have to bear. WYOMING: BODY OF EVIDENCE SYSTEM Wyoming' s newly approved system grew out of the desire to make sure that graduating students had mastered the content specified in the state standards. Scott Marion, former director of assessment for the Wyoming Department of

ASSESSMENT TO IMPROVE LEARNING 31 Education, described how, in lieu of an end-of-school exit exam, the state decided to develop the Body of Evidence System (BOE). Under the system, students will, over time, establish that they have mastered the material required for gradua- tionperformance standards in nine content areas. They will be able to meet these requirements as early as eighth grade, and typically will complete most by the end of tenth grade. Multiple sources of evidence will be acceptable. An important goal for the BOB system was to improve teaching, learning, and classroom assessment; at the same time, Wyoming hoped to avoid some of the negative consequences other states had encountered using single high-stakes exams to make sure students had mastered graduation requirements. The state has asked local districts to design the measures by which students would demon- strate their mastery, based on a set of five assessment design principles arrived at through a deliberative process. Each district's program will be evaluated in terms of: alignment with the state's content and performance standards; consistent and reliable application; fairness, in that it is not biased against any subgroups and uses accommo- dations and alternate assessments appropriately; and in that it provides students with multiple opportunities, using different formats, to demon- strate their knowledge and skills; standard-setting, as revealed in the strength of its rationale for its method of choosing cut scores, and how closely they are linked to performance standards; and comparability, through evidence that requirements are applied in compa- rable ways across classrooms, programs, schools, and the district. (Wyo- ming decided not to evaluate comparability from district to district, since each would be meeting minimum requirements.) Recognizing that in most cases local educators lack the expertise to design the innovative measures Wyoming wanted to see in use, the state has begun providing considerable professional development and technical support for this endeavor. Moreover, it decided to use peer review to evaluate local systems, in part because of the many opportunities this would provide for professional devel- opment; reviewers are drawn from every one of Wyoming's districts and some serve as team leaders throughout the state. The reviewers work with national experts, Marion explained, and the review process has already helped those in- volved grapple with the real meaning of alignment, coherence, and other assess- ment design principles. In addition, to address the sometimes poor quality of locally developed assessments, the state formed the Body of Evidence Consor- tA cut score is a score point below which performance is deemed unacceptable for a particular purpose.

32 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING tium, a partnership of almost all of the districts, Wyoming's Department of Education, and national assessment experts, which disseminates assessment knowledge and skills through workshops and other activities. Marion discussed what he perceives as the most difficult challenges the state has faced in implementing the BOE system. As noted, the state was initially disappointed with the quality of many local assessments, and efforts to address that problem have led in many cases to a deeper conversation about theories of learning and modes of teaching. While this is an ongoing challenge, Marion was pleased to find veteran teachers seeking guidance on how to modify their teach- ing in light of what they had learned through the BOE process. On a more practical note, the state has found that aggregating the various kinds of evidence to make fair decisions about students across districts has been a challenge, as has setting standards. Reflecting on how Wyoming's system looks in light of the criteria presented by the committee, Marion concluded that the BOE system has focused on finding a variety of workable summative assessments. Consequently, it places relatively little emphasis on classroom assessment the state hopes that the BOE system will foster classroom discourse and the kinds of ongoing feedback that teachers and students need, but it has not made that a requirement. He suggested that while a system can try to address all of the criteria the committee identified, and perhaps come close on many of them, there is a fundamental choice that needs to be made in the end between the unique characteristics and demands of large-scale assessment and those of classroom assessment. Marion expressed concern that there is a contradiction between the goal of assessing the few, carefully chosen, big ideas and the goal of assessing in a way that provides frequent and unobtru- sive feedback. As he affirmed, "You can't assess big ideas very frequently unless you are assessing parts of the big ideas, and then are they still big ideas?" MAINE: COMPREHENSIVE ASSESSMENT SYSTEM Like many states, Maine developed a new assessment system after new standards were put into place. Jill Rosenblum and Pam Rolfe, assessment coor- dinators at the Maine Mathematics and Science Alliance and the Maine Depart- ment of Education, respectively, described the state's efforts. Maine had three principal goals for its assessment program, as outlined in 1997 legislation, but it highlighted as the first producing "high quality information about student perfor- mance that will inform teaching and learning." The other two goals are monitor- ing schools and administrative units and holding them accountable for their suc- cess at making sure students meet the state standards, and certifying that students have met the content standards. Maine was determined to meet those goals with a system that delegated a considerable amount of the assessment work to schools and districts. The state administers a large-scale assessment in six subjects at grades four, eight, and

ASSESSMENT TO IMPROVE LEARNING eleven, and participates i, 33 n the National Assessment of Educational Progress. While the state expects that it will need to further modify its system to meet the requirements of the No Child Left Behind Act, it currently relies on local educa- tors to devise their own strategies for all the remaining assessments required to meet Maine's three goals. Table 5-1, provided by Rolfe and Rosenblum, summa- rizes the basic structure of the system. To unify its system, Maine developed a very specific "alignment protocol," which spells out in detail the relationship between the assessments at all levels and the state standards. All assessments are to be linked to learning targets described in the standards documents, and they are conducted at the classroom, school, district, and state levels, as well as at all grades. It is left to the discretion of local educators to determine when they think their students have mastered a particular body of material and are ready to be assessed on it. Students are assessed using a wide variety of methods, and are given multiple opportunities to demonstrate their knowledge, understanding, and developing skills. The assess- ments are in many cases common instruments but are tailored to fit local curricu- lum and instruction, and provide immediate feedback to teachers and students. The state is now completing the pilot testing of its assessment plan, which uses a combination of anchor tasks, common tasks, and assessments developed and selected at the local level. Thus, in Rosenblum's view, Maine avoided the need to make the basic choice between large-scale and classroom objectives that Scott Marion identified in Wyoming. Maine, she argued, has taken a middle TABLE 5-1 Characteristics of Maine's Assessment System Primary Purpose Selected or Developed by Scored by Classroom assessment School or district assessment State assessment Assessment system Informing teaching and learning Informing and monitoring Monitoring and evaluating programs to ensure accountability Informing teaching, monitoring and evaluating, certification Individual teacher Groups of teachers and administrators Groups of administrators, and/or policy makers District assessment leadership Individual teacher Groups of teachers (and others) Scorers outside the district Both internal and external SOURCE: Maine Department of Education (2003).

34 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING path: the school and district assessments have shared features but are firmly grounded in the curriculum. Professional development has been a key to making the system work, accord- ing to Rosenblum and Rolfe. For teachers to succeed with this new kind of responsibility, Rosenblum explained, they need to make assessment concepts such as validity and reliability a part of their day-to-day thinking. They need to internalize the links between the content in the standards, the local curriculum, their own instructional models, and the purposes and nature of the assessments they are carrying out. Maine bolstered teachers' capacity to do this through a series of regional seminars that tackled assessment issues and presented the details of the way the system was to operate. At summer institutes for assessment development, educa- tors had many opportunities to build their base of knowledge, share ideas, and participate in scoring sessions that helped them focus on performance expecta- tions. Maine considers the work it has done in professional development to be one of the key successes of the program, and cites not only improved assessment literacy, but also improved instruction and a broad-based sense of shared respon- sibility for the program' s success. WASHINGTON: ADAPTING A TRADITIONAL ASSESSMENT Greg Hall, assistant superintendent of assessment and research in the Office of the Superintendent of Public Instruction, Washington State Department of Education, explained that the principal purpose of Washington's assessment sys- tem is to provide the state, districts, schools, parents, and other stakeholders with evidence of how well students are meeting state standards. The state made the decision to use an assessment program it is using a criterion-referenced test developed jointly with a commercial testing company to lead an effort to reform and improve its system. Articulated as an effort to make Washington competitive internationally, the reform goal was not initially popular in a state that had previ- ously been characterized by strong local control of education. Many initially saw the assessment program that was to drive the reform as secretive and out of touch with classroom needs. The state identified professional development as the potential bridge that could link teachers and classrooms into the potential benefits of the new assess- ment system, and has found a number of ways to involve teachers in the process. First, they are participants in all stages of test development. The test contractor was asked to conduct all item-writing workshops in the state and to involve only Washington teachers. Teachers also pilot the assessments and are involved in review of the pilot data; they have also conducted the scoring, which has pro- vided ongoing opportunities for them to focus on performance benchmarks. Through regional learning and assessment centers, national assessment experts provide training in assessment issues and methods of interpreting data. Teacher

ASSESSMENT TO IMPROVE LEARNING 35 assessment leadership teams help disseminate the knowledge they gain at the centers, and provide support to other teachers in their home districts and schools. Washington also strives to help its teachers make use of the data they can obtain from the large-scale assessment. Reports that are provided to every school and district include data linked to each learning target and strand in the state standards, as well as item analyses by school, district, and state. A companion document contains the language of the learning target, so that educators can track patterns in performance on different elements of the standards. The supporting document also provides guidance on how to analyze the data and how to use the released items that are included. Hall told the workshop that Washington expects that now that teachers are developing competence with large-scale assessment issues, and becoming more comfortable with the data that they can provide, the state will be able to further develop teachers' assessment literacy and, in turn, improve their classroom assess- ment skills. BERKELEY EVALUATION AND ASSESSMENT RESEARCH SYSTEM The Berkeley Evaluation and Assessment Research (BEAR) Center has de- veloped a science assessment system, BEAR, that is based on close links between assessment and curriculum. Indeed, explained Mark Wilson of the University of California at Berkeley, and one of the system's contributing researchers, the idea guiding BEAR is that a large-scale assessment that is not coherent with classroom assessment cannot effectively improve instruction because any gains students make on it will be superficial. At the same time, he added, if classroom assess- ments are not linked to large-scale assessments, teachers will be faced with the need to teach two curricula, another recipe for failure. Developed in tandem with a middle school science curriculum, the Issues, Evidence, and You (IEY) program, BEAR is based on a developmental perspec- tive on students' science learning. It is structured around what Wilson calls "progress variables," definitions of the steps students take as they develop higher levels of competence and deeper understanding of the material they are studying. The teacher uses the progress variables to guide instruction and to provide direct feedback to students. The assessment component consists of opportunities to observe student performance, through tasks that are embedded in the instruc- tional program and linked to particular progress variables, and through "link tests," which assess similar skills in different contexts. Thus link tests provide a kind of check on the information gained through the embedded assessments; teachers evaluate both using common, generic scoring guides and examples of student work. These different sorts of items are then scaled so that student progress on the multiple progress variables that define the curriculum can be monitored. These results are used to establish that the assessments achieve high standards of

36 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING reliability and validity (for example, that the classroom-based IEY assessments have reliabilities similar to those archived on standardized tests). The results can be displayed in a variety of ways that can help teachers with planning and instruc- tional activities for example, by showing an individual's progress over a year, the state of a class at a particular time, or detailed results on each item for a particular student. Scoring sessions, in which teachers collaborate to calibrate their expecta- tions, have been a crucial part of the program. The teachers not only learn from one another about performance standards and ways of working with students, they also use the opportunity to have deeper conversations about the educational implications of the assessments and other issues related to teaching. At the same time, Wilson explained, these sessions have been the principal way teachers have made the system their own and internalized its goals and overall approach. Teach- ers have also conducted similar moderating sessions in their classrooms to help students understand the performance expectations and enter into the goals of the program. In describing the genesis of the BEAR program it was developed primarily by graduate students in measurement working with curriculum developers- Wilson noted the ways in which that process encapsulated the gaps the present workshop attempted to address. He observed that the curriculum developers functioned in a sense as artists do, working to assemble a set of experiences that would provoke thinking and have effects on the participants. They had little instinct for the prime concern of the measurement specialists, who focused on finding valid and reliable evidence of particular outcomes. Yet these two groups were able to find common ground using concrete notions of what students would be doing in the form of the progress variables. Using that common framework, they were able to combine their disparate goals into a coherent system. NORTHERN CALIFORNIA MATHEMATICS ASSESSMENT COLLABORATIVE The Mathematics Assessment Collaborative (MAC), an initiative of the California-based Noyce Foundation, is made up of thirty school districts in the San Francisco Bay area that share the goal of using high-quality mathematics performance assessments to improve both instruction and student learning.2 Participating districts assess 65,000 students every year in grades three through ten. Linda Fisher, who directs MAC, and David Foster, mathematics program director of the Noyce Foundation, described the way the collaborative's assess- 2The MAC is one of several related projects designed to support mathematics instruction that have been sponsored by the Noyce Foundation. It is considered a component of the Silicon Valley Mathematics Initiative, which addresses all aspects of mathematics instruction and learning.

ASSESSMENT TO IMPROVE LEARNING 37 ment program works and provided a detailed look at the kinds of feedback teach- ers get about their students from the assessments. The assessments used by the collaborative are produced by a commercial test publishing company (CTB/McGraw-Hill), together with the Mathematics Assess- ment Resource Service (MARS), which is a joint endeavor of a number of univer- sities to write performance exams, scoring guides, and score reports that are aligned with the national standards produced by the National Council of Teachers of Mathematics. The collaborative has been administering a performance-based assessment system since 1998; it provides both formative and summative data. Foster began by setting the collaborative's use of MARS in the context of California's assessment program. He noted that the performance of California students on the SAT 9, a commercially available, norm-referenced test, had increased steadily from 1998 to 2002, but that there were significant discrepancies between student performance on that test and on the MARS. A comparison of the results showed that although both assessments were based on the same standards, students who performed well on the SAT 9 did not necessarily perform well on MARS, the performance-based assessments. The findings for seventh graders, for example, showed that half of the students who performed well on the norm- referenced test did not meet national standards for seventh graders according to the MARS results. These results, Foster explained, demonstrate the critical importance of using multiple measures to assess student performance without them, educators and administrators can be seriously misled about their students' learning. The MARS assessment program was designed not only to provide multiple measures of achievement, but also to provide tools teachers can use to target their instruction. The focus on teachers meant both that significant opportunities for professional development were incorporated into the program, and also that the assessment results were produced in a way that was meaningful for teachers in the classroom as well as for more summative purposes. Fisher presented a number of assessment tasks, and some of the data produced from them, to illus- trate the "Tools for Teachers" that the MARS program includes. Box 5-1 is a sample of the results teachers get for each task; it shows results for point four on a ten-point scale. The goal in providing this kind of detail is to encourage teachers to be "reflective about their practice" Fisher explained. What the organizers of the collaborative have found is that as teachers work with such feedback, and consider ways to use it with their students, they become curious about research that might help them understand the misconceptions their students showed and suggest techniques to help them in addressing these problems. Sessions with teachers to go over the assessment data also yielded broader insights about the kinds of professional development that might best help teach- ers improve instruction. Fisher explained, for example, that in sessions focused on the textbooks students were using, teachers quickly identified links between the way many of them oriented the information they presented and some of the

38 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING student misconceptions they had discovered through the assessments. They brainstormed ways to use the textbooks differently so they could anticipate and forestall the misconceptions. Teachers involved in the collaborative have a variety of other sources of support and development. Summer workshops as well as training sessions during the school year, supporting materials (the "Tools for Teachers," which include targeted questions for them to use in evaluating their test results and lesson plans), opportunities to participate in scoring the assessments, opportunities for one-on-one coaching and classroom observations, and schoolwide debriefing sessions, are all part of the program. Both Fisher and Foster stressed that the various ways in which teachers are involved and encouraged to learn and change are key elements of the program. FACET-BASED ASSESSMENT Jim Minstrell, a former high school physics teacher in Washington state, described a system he has created for teaching physics according to a model of students' developing understanding. The facet-based system is based on the cognitive principle that students come to physics with ideas and preconceptions that teachers need to identify and build on. To describe the basic units of thought, Minstrell chose the word "facets" meaning pieces of knowledge, reasoning, or beliefs that students have because he wanted to include both correct ideas and the incorrect, naive, or incomplete ideas that students typically have along the way to complete understanding. He chose not to use the word "misconceptions" for the incorrect or incomplete ideas because these ideas often reflect important steps along the way to full understanding that teachers can use to advantage. Facet clusters, then, are sets of facets related to a particular topic that include both the learning target and a complete and accurate understanding of a complex

ASSESSMENT TO IMPROVE LEARNING 39 principle or other topic, as well as students' evolving notions, arranged in the approximate order that developing understanding usually follows. The facets and clusters have been identified through research, teacher observations, and analysis of student work. Using this means of organizing the content, Minstrell and his colleagues developed a set of tools with which teachers can structure instruction and assessment. The system provides teachers with tasks, activities, Reassessments, and scor- ing procedures that help them discover which facets their students are using, and then guide students toward complete understanding. All of the activities and assessment tools are linked to some part of the facet cluster for a particular topic and are also coded so that they can be easily analyzed. The codes work with multiple-choice as well as short-answer questions: distracters (incorrect choices) and other student-generated responses are linked to the naive or incomplete facets identified for the topic. Thus, when a teacher sees that a group of students misunderstand, for example, the effect of ambient air on weight, he or she is prepared: the facet-based system will likely supply a "prescriptive activity" the teacher can use to address this shared misunderstanding in the classroom. To make the system accessible to more than just a handful of teachers, Minstrell and his colleagues developed a website for Washington teachers and their students. Teachers can find elements such as preinstruction activities for eliciting naive understandings, "checkout" questions to monitor students' devel- opment, tools for interpreting and using assessment results, and other resources and support. Students can also log on to do activities and get feedback about their progress. Teachers who have used the system have shown measurable improvements in results for individual units, but Minstrell has found it difficult to involve teachers as extensively as he had hoped. Web access in schools has presented a practical obstacle: many schools have outdated systems that are slow or cannot navigate the site, and in many schools students have only limited web access. A perhaps larger problem has been that many teachers who were intrigued by facet-based assessment were not sure they could manage to incorporate it and still cover all the material their students would need to meet state requirements. While the facet clusters are linked to Washington performance benchmarks for physics, Minstrell recognizes that teachers will need more support if they are to make full use of the program. He and his colleagues are currently conducting research to better understand what kinds of professional development and teacher and dis- trict support will be needed to make the program more readily accessible. MODEL-BASED ASSESSMENT In her presentation on the Los Angeles Unified School District's application of a National Center for Research on Evaluation, Standards, and Student Testing (CRESST) program, Eva Baker discussed some ideas she believes are critical to

40 ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING the goal of using assessments to support learning. For Baker, professor in the School of Education, University of California, Los Angeles, the goal of assess- ment is to produce both usable and useful knowledge, and she explained what she meant by the distinction. Usable knowledge is in a form that can be understood and applied, it is timed appropriately, and it may cause rethinking of the problem. Useful knowledge yields a new solution, based on rethinking of the problem. It is adapted to the situation, it is sufficient to provide a solution, and it can yield an improved outcome. Some schools are much more successful than others at using assessment knowledge for several reasons. They focus on the learning of both students and adults. They make constant use of appropriate information, drawn from both formal and informal assessments, and they focus on feedback and change. Learn- ing and change are publicized and the entire learning community takes pride in its achievements. The CRESST program, called Model-Based Assessment (MBA), is rooted in this understanding of the ways in which assessments can benefit a learning community. MBA takes research-based understanding of thinking skills and applies it to different content areas. MBA's key elements of learning are · content understanding, · problem solving, · metacognition (consciousness about one's thought processes), · communication, and · teamwork and collaboration. With MBA, these basic principles were intended to guide both the design of assessments and instruction. Models were developed that could be used as tem- plates and transferred to many subject areas, and were designed so that new teachers can easily be trained to use and score them; they are also reusable and thus relatively inexpensive and easy to adapt. The models, or templates, include tasks, formats, prompts, scoring guides, directions, and samples. The scoring and performance expectations are based on a research-based model of the way experts in particular domains think and work in their area of expertise. Experts make use of principles or themes in organizing their existing knowledge as well as new information. They draw on prior knowledge, identify explicit relationships among ideas or pieces of information, and avoid miscon- ceptions.3 Baker illustrated the application of this understanding of expertise with several sample templates, showing how the prompts were derived from an understanding of expertise in particular domains, such as using primary docu- ments to organize an essay. 3The expert model is discussed more fully in Knowing What Students Know (NRC, 2001c).

ASSESSMENT TO IMPROVE LEARNING 41 Despite the challenges it presented, the opportunity to try out MBA in Los Angeles was welcome, as the assessment's creators were very eager to find out how well the program could operate on a large scale. Initially the plan was to use MBA in four subjects at three grade levels and in two languages. The program is currently being administered in grades two through nine. CRESST staff have trained a large cadre of teachers to score the assessments and to train other teachers. Despite pressures to provide more concrete accountability and to address mandated curriculum packages, Baker has hopes that the program will continue. CRESST has been conducting validation studies and pursuing a number of research efforts to help it refine the program. Baker cited several key elements to their success in running MBA on such a large scale. Because of the vital impor- tance of cost and time factors, CRESST worked from the start of the program to maintain a low cost per student, and thus benefited from the crucial support of both the school board and teachers' union. Finally, because MBA was designed to be easily transferable, responsibility for the program could be shifted relatively easily to the school district staff, which had many important benefits. Los Angeles educators were much better able to implement the knowledge gained from the assessments because they felt responsible for the program. Moreover, teachers learned and benefited from their participation, and the MBA was more easily meshed with other educational mandates by those within the system than it could have been by CRESST staff.

Next: 6. Concluding Thoughts and Possible Next Steps »

Assessment in Support of Instruction and Learning: Bridging the Gap Between Large-Scale and Classroom Assessment: Workshop Report (2003)

Chapter: 5. Assessment to Improve Learning

Welcome to OpenBook!

Get Email Updates