Skip to main content

Currently Skimming:

6 Evaluating Mathematics Assessment
Pages 117-146

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 117...
... Issues of how to evaluate educational assessments have often been discussed under the heading of "validity theory." Validity has been characterized as "an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment." ~ In other words, an assessment is not valid in and of itself; its validity depends on how it is interpreted and used. Validity is a judgment based on evidence from the assessment and on some rationale for making decisions using that evidence.
From page 118...
... which What mathematical processes are involved in responding? assessment Applying the content principle to a mathematics assessment means judging how well it reflects the mathematics that is most tasks are important for students to learn.
From page 119...
... That assumption is false. Because mathematics relies on precise reasoning, errors easily creep into the words, figures, and symbols in which assessment tasks are expressed.
From page 120...
... at the Univer sity of Pittsburgh and the Center for Research, Evaluation, Stan dards, and Student Testing (CRESST) at the University of California at Los Angeles are beginning to explore techniques for identifying the cognitive requirements of performance tasks and other kinds of open-ended assessments in hands-on science and in history.9 Mixing Paint To paint a bathroom, a painter needs 2 gallons of light blue paint mixed in a proportion of 4 parts white to 3 parts blue.
From page 121...
... Because possibilities for responses to alternative assessment tasks may be broader than those of traditional items, developers must work harder to specify the type of response they want to evoke from the task. For example, the QUASAR project has developed a scheme for classifying tasks that involves four dimensions: ~ I ~ cognitive processes (such as understanding and representing problems, discerning mathematical relationships, organizing information, justifying procedures, etc.~; (2)
From page 122...
... A general scoring rubric (similar to that used in the California Assessment Program) was developed that reflected the scheme used for classifying tasks.
From page 123...
... The process of developing the specific rubric is also iterative, with students' responses and the reactions of reviewers guiding its refinement. Each year, before the QCAI is administered for program assessment, teachers are sent sample tasks, sample scored responses, and criteria for assigning scores that they use in discussing the assessment with their students.
From page 124...
... are neecled to |24 An evaluator interested in the intended curriculum might judge the examine whether and with what frequency students actually use the specific content and skills from the curriculum framework list in alignment of new responding to the five problems. This examination would no doubt require a reanalysis of the students' responses because the needed assessments information would not appear in the scoring.
From page 125...
... Mathematics assessments should be judged as to how well they reflect the learning principle, with particular attention to two goals that the principle seeks to promote improved learning and better instruction-and to its resulting goal of a high-quality educational system. IMPROVED LEARNING Assessments might enhance student learning in a variety of ways.
From page 126...
... Studies of the effects of standardized tests have made this point quite clearly. For example, a survey of eighth grade teachers' perceptions of the impact of their state or district mandated testing program revealed an increased use of direct instruction and a decreased emphasis on project work and on the use of calculator or computer activities.~3 Some studies have suggested that the instructional effects of mandated testing programs on instruc tion have been rather limited when the stakes are low, 14 but these effects appear to increase as stakes are raised.~5 Teachers may see the effects on their instruction as positive even when those effects are directed away from the reform vision of mathematics instructions Assessments fashioned in keeping with the learning principle should result in changes more in line with that vision.
From page 127...
... EFFECTS ON THE EDUCATIONAL SYSTEM Recent proposals for assessment reform and for some type of national examination system contend that new forms of assess EVALUATING ASSESSMENTS critical to the 127 1 l
From page 128...
... Evidence needs to be collected on the intended and the unintended effects of an assessment on how teachers and students use their time and conceive of their goals.23 Systemic validity refers to the curricular and instructional changes induced in the educational system by an assessment. Evaluating systemic effects thoroughly is a massive undertaking, and there are few extant examples in assessment stuc~iec ~practice.
From page 129...
... FAIRNESS AND COMPARABILITY Traditional concerns with fair assessment are amplified or take on different importance in the context of new forms of mathematics assessment. For example, when an assessment includes a few complex tasks, often set in contexts not equally familiar to all students, any systematic disparity in the way tasks are devised or EVALUATING ASSESSMENTS 129
From page 130...
... As researchers using a variety of assessment formats are learning, home anc! at tasks that require explanation, justification, and written reflection as- may leave unclear the extent to which student performance reflects school can be a different knowledge of mathematics and different language and communication skills.
From page 131...
... If the goal is to help support students' opportunity to learn important math ematics, perceived difficulty must be taken into account along with other aspects of accessibility. The role assessments play in giving students the sense that mathematics is something they can successfully learn is largely EVALUATING ASSESSMENTS assessments I· .
From page 132...
... performance 32 research suggests that acceptable levels of consistency across raters may be achievable in mathematics as well.30 A second aspect of generalizability reflects whether the alternative assessment measures the particular set of skills and that make up abilities of interest within a domain. This aspect may represent a special challenge in mathematics, particularly as assessments strive to mathematical meet broader goals.31 As researchers using a variety of assessment formats are discovering, tasks that require explanation, justification, competence and written reflection leave unclear the extent to which student performance reflects knowledge of mathematics rather than lan guage and communication skills.32 A third aspect of generalizability rests on the consistency of scores over different tasks which can be thought of as task compara bility.
From page 133...
... Whereas traditional item-writing procedures and test theory focus attention on the measurement properties of an assessment, the content, learning, and equity principles recognize the educational value of assessment tasks. However, if inferences and decisions are to be made about school systems or individual students, educational values cannot be the only ones present in the analysis.
From page 134...
... even be posed. The best guideline is more of a meta-guideline: First determine what information is needed, and then gauge the effectiveness and efficiency of an assessment in providing such information.
From page 135...
... Mathematics assessment tasks can be made more valid if they broadly reflect the range of mathematical activities people carry out in the real world. This includes features not traditionally seen in assessments: collaborative work.
From page 136...
... In traditional educational testing, the guidelines for evaluation of assessment tasks concerned almost exclusively how consistently and how well they ordered individual students along a scale. This view shaped the evolution of testing to favor multiple-choice tasks because they were the most economical, within a traditional cost/benefits framework.
From page 137...
... : $33 per student · Estimated total cost for a national test modeled on systemwide multiple-choice tests: $160 million annually · Estimated total cost for a national test modeled on systemwide performance-based tests: $330 million annually. Although the earlier estimate of $325 per student annually was undoubtedly inflated because it did not take into account some of the savings that might be realized in a national examination if it were not based on the AP model, the GAO estimate of $33 seems very low.37 The GAO survey oversampled seven states that were using performance-based formats in state-mandated testing.
From page 138...
... By one estimate,40 the Standard Assessment Tasks recently introduced in Great Britain and scheduled to take 3 weeks were estimated by local administrators to require closer to 6 weeks. In Frederick County, Maryland, classes in some grades lost a Time spent Girl whole week of instruction completing performance assessments in mathematics and language arts.4 h~gh-qual~ty These estimates of direct costs may understate the benefits mathematics of performance assessments because innovative assessments con tribute to instruction and teacher development.
From page 139...
... Scheduled for release in spring 1995, this volume will lay out standards for assessments that serve a range of purposes from classroom instruction to policy, program evaluation, planning, and student placement. The three components of standardscurriculum, pedagogy, and assessment- provide a basis for renewing teacher education, rethinking school organization, enhancing implementation of reform, and promoting dialogue about systemic change among the many stakeholders in mathematics education.
From page 140...
... Many organizations are emerging on local, state, and national levels to broaden the recruitment of new members. Networks and alliances such as State Coalitions for Mathematics and Science Education, the Alliance to Improve Mathematics for Minorities, the State Systemic Initiatives, and the Math Connection are defining their mission to promote reform in mathematics education, including assessment that meets the content, learning, and equity principles.
From page 141...
... All educational actions must support this goal, and assessment is no exception. Although there are many unanswered questions that will require continuing research, the best way for assessment to support the goal is to adhere to the content, learning, and equity principles.
From page 142...
... Silver and Suzanne Lane, "Assessment in the Context of Mathematics Instruction Reform: The Design of Assessment in the QUASAR Project," in Mogens Niss, ea., Cases of Assessment in Mathematics Education: An ICMI Study (Dordrecht, The Netherlands: Kluwer Academic Publishers, 1993)
From page 143...
... Silver, Patricia Ann Kenney, and Leslie Salmon-Cox, The Content and Curricular Validity of the 1990 NAEP Mathematics Items: A Retrospective Analysis (Pittsburgh, PA: Learning Research and Development Center, University of Pittsburgh, 1991) , 25; "Design Innovations in Measuring Mathematics Achievement"; Dennie Wolf, session on "What Can Alternative Assessment Really Do for Us?
From page 144...
... 31 Design Innovations in Measuring Mathematics Achievement 32 The Content and Curricular Validity of the 1990 NAEP Mathematics Items: A Retrospective Analysis; Dennie Wolf (Remarks made at the National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles, CA, 1012 September 1992~. 33 "Quality Control in the Development and Use of Performance Assessments"; What's Happening with Educational Assessment?
From page 145...
... Furthermore, NAEP uses a sample of students to make inferences about the population of students at a grade level, so the total cost is less than that of administering a less expensive test to the entire population. 36 Student Testing: Current Extent and Expenditures with Cost Estimates for a National Examination.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.