Chapter 5: Alignment and Standards-based Assessments
Textbooks are a critical component of the curriculum. Supervisors and others involved in choosing textbooks need to ensure that the textbooks they adopt are aligned to the curricular guidelines or frameworks of their system. Today, however, those who choose textbooks also need to consider the type and quality of assessments that are either created or adopted by their districts or states. More and more districts and states are attempting to implement assessment systems that are aligned to standards (e.g., Chicago, Maryland, Texas). Unfortunately, many who are involved in this process have no clear models of what an aligned system might look like.
When it comes to evaluating whether a particular assessment is aligned to a given set of standards, it is really not all that useful to rely solely on the test vendor's evaluation. It is far too tempting for the vendor to think creatively and envisage the enormous number of ways in which the test might meet the needs of the client. Just as it is possible to link a given textbook to various state standards in a cursory way (or to link a whole range of standards to a single textbook) and call it standards-based, it is also possible to map any given test or assessment instrument to a wide range of standards in the same fashion. A far more sound way to proceed is for a test-selection committee to apply a set of alignment criteria and to make its own professional judgment about the degree of alignment between the standards and an assessment. Districts or states should want standards and assessment to be
aligned in a way that encourages students to learn the mathematics specified in the standards.
Standards-based assessment should improve instruction and learning, rather than sustain current inadequacies or present new ones. Indeed, if a standards-based assessment program were simply to reinforce the status quo, it would do little or nothing to enhance the learning of mathematics (NCTM, 1989, 1991, 1995; NRC, 1993b).
This chapter draws extensively on Norman Webb's Criteria for Alignment of Expectations and Assessments in Mathematics and Science Education (Webb, 1997) to illustrate the most salient issues in aligning standards and assessments. Webb provides a coherent way of thinking about and evaluating the alignment of standards and assessments. He provides useful criteria that can be used by district or state assessment adoption committees. This chapter highlights some of Webb's criteria, applies them to the New Standards Performance Standards and corresponding Reference Examinations, and discusses more general issues that might arise in their application.
The alignment of standards and assessment
There are two overarching components that need to be addressed by any well-formulated alignment analysis. First and foremost, alignment criteria are needed to evaluate Balance and Equity and Fairness.
Balance in learning and assessment
There is usually little argument that balance is a critical consideration for learning and assessment. Balance can help ensure that assessments no longer confine themselves to those aspects of the curriculum that are easy to assess.
Webb's criteria for alignment of expectations and assessments in mathematics and science education (Webb, 1997) provide a comprehensive and accessible guide to those seeking systemic reform through standards and standards-based assessments. Six of Webb's categories that can contribute to an evaluation of balance are concerned with the following aspects: categorical concurrence, depth of knowledge, range of knowledge, structure of knowledge, balance of representation, and dispositional consonance. Taken as a group, these categories provide a powerful tool kit for qualitative alignment analysis. Each category has well-specified agreement criteria that can be used by professionals in their alignment evaluations.
In Webb's system, categorical concurrence describes the degree of agreement between the content categories of the standards and
the content categories of the assessment. According to Webb, to alignment of categories would be judged as stringent when the organizing categories of the standards are the same as the reporting categories of the assessment.
By way of example, the alignment between the organizing categories of the New Standards Performance Standards and the reporting categories of the New Standards Reference Examination is illustrated by the following diagram:
There is clearly no stringent alignment between these performance standards and the reporting categories. For example, Putting Mathematics to Work cannot be assessed using the Reference Examination because this standards asks students to use mathematics to complete extended tasks and projects. This standard can instead be assessed using the New Standards' portfolio system, or any other portfolio system that assesses the ways in which students can put mathematics to work. Furthermore, the Reference Examination does not report separate scores for each of the content standards, nor does it report a score for mathematical communication. Instead, the content standards contribute to a combined conceptual understanding score, while the communication standard is assessed as a part of the mathematical problem-solving score and also by a portfolio system. If a reliable score were to be reported for each of the standards individually, the examination would need to incorporate many more items than it currently contains, and it would need to be considerably longer that its current approximate three-hour length (Shavelson, Gao, & Baxter, 1993; Linn, 1994). As an alternative, each New Standards Reference Examination reports three scores that are statistically reliable, and the alignment between these and the categories of the New Standards Performance Standards would rate as acceptable in Webb's system.
Webb's focus on the alignment of categories establishes a conceptual basis for rudimentary evaluation of the link between assessment and standards. Such an evaluation can be particularly useful to a curriculum or assessment specialist who is in the process of selecting an examination or assessment system that is aligned to the school's, district's, or state's standards. Suppose, for example, that an examination offers stringent alignment between its reporting categories and the standards. If there are more than three or four reporting categories, it would be important to ask, ''what are the costs of this stringent alignment?" Or, "what tradeoffs are being made here?" It could be that stringent alignment of categories is achieved by using a large number of only very short exercises on the examination. Thus, the assessment might contain no complex tasks similar to those described in Chapters 2 and 3. The tradeoff would be between the variety and complexity of tasks that students will be expected to complete and stringent alignment of categories. For many, this particular tradeoff would be unacceptable because it very often sends the message to those involved in teaching and learning that it is not important for students to solve complex tasks. Alignment criteria should consider the necessary tradeoffs implied by choices and evaluate their consequences for teaching and learning.
Webb's Depth of Knowledge category judges how well the cognitive demands of the assessments match the expectations outlined in the standards on which they are based. This category calls for an expert judgment on whether the assessments contain tasks comparable to what the standards suggest that they should be able to do. If an assessment consisting entirely of multiple-choice items were constructed to assess the expectations of either the NCTM Standards or the New Standards Performance Standards, it is quite unlikely that the cognitive demands of the communication component of either of these standards could be realized.
Beck (1998) defines criteria to be used in evaluating the alignment between tests and standards. One of these criteria also judges the match between the cognitive demand of the task and the cognitive demand of the standards. Both Beck and Webb take care to ensure that standards-based assessments call for a level of mathematical functioning that is just as cognitively demanding as are the standards on which the assessment is based. This category is important to implement for any system seeking to improve upon the mile wide, inch deep nature of many mathematics programs and to instill a system of instruction and learning that is more reflective of current reform principles (e.g., NCTM, 1989; NRC, 1993b).
If both the adopted standards and the corresponding assessments call upon students to go beyond short procedural exercises and template problems, the hope is that classrooms will then
provide the opportunity for students to learn and to practice doing the kinds of things that both the standards and the assessments are asking them to do. High expectations are for all students, and conditions need to be created to enable all students to tackle mathematical problems that are challenging and reflect the depth of the standards.
When a mathematics curriculum or assessment specialist is evaluating the depth category in selecting an examination, to understand the demand of the tasks, it is important to work through each task, paying attention to the time taken and the mathematics assessed by each one. This also will provide some idea about the opportunity to perform that is afforded by the examination. Frequently, tasks look easier than they really are.
The Range of Knowledge category evaluates bias in the way the components of an assessment sample the expectations expressed in the various components of the corresponding standards. To be aligned, the range of expectations should be reflected in the range of assessments.
The range of knowledge category serves as a reminder that the main function of standards-based assessment is not to discriminate between or sort students but to determine the gap between what students know and can do and what the standards are asking them to do. In some cases there may be no gap, which does not mean that there is something wrong with the assessment or the standards. This category should remind us that standards and standards-based assessments offer a radically different way of thinking about and conceptualizing our notions of assessment.
The range of knowledge category also ensures that assessments do not ignore those aspects of standards that are important and yet are straightforward and can be easily demonstrated by all students. Standards-based assessment developers and administrators need not worry if there is a ceiling effect (that is, all students or almost all students demonstrate success) associated with a particular item or set of items. This is to say, even if all students demonstrate success on a particular item, it does not mean that the item has been placed on the assessment in error. As long as the item can be mapped onto the standards and evaluated as assessing a depth of knowledge that is consistent with that required by the standards, then ceiling effects can be celebrated.
In the instrument that Beck devises to evaluate the alignment of examinations to standards, she also considers range of knowledge. In her instrument, sets of items that are mapped to each standard are judged by how well they reflect the challenge of that standard. In this way, Beck's range of challenge category is comparable to Webb's range of knowledge category.
The Structure of Knowledge category goes to the heart of the changes in mathematics education reflected in the NCTM Standards. If a standards-based assessment is designed to align with the model for assessment presented in Chapter 2, the structure of knowledge category ensures that the assessment will incorporate problem solving, connections (both from within and from outside of mathematics), mathematical communication, and representation. If the complete set of assessments designed to assess problem solving were of the heavily scaffolded type described in Chapter 3, the structure of the assessments would not be comparable with the structure of the problem-solving standard (NCTM, 1989, 1998).0
The structure of knowledge category shifts emphasis from the pieces to the overarching structure. It ensures that when the structure of knowledge in standards and assessments are aligned, the assessments will not permit success for students whose understanding of mathematics is fragile or based on disconnected fragments. If this shift is to lead to a corresponding shift in classroom practice, there will need to be a shift away from the tightly sequenced teaching and testing practices described in Chapter 4. This "teach it then test it" coupling of instruction and learning may help students do template problems or exercises, but it also inhibits them when they are called upon to apply their learning to challenging or non-routine tasks.
The Balance of Representation category requires that expectations be given comparable emphasis in both the standards and the assessment. The balance of representation category is important because it ensures that attributes of the standards are not simply paid lip service in the assessment but instead are assigned weights that reflect their actual importance. Beck's alignment analysis includes a similar category, which she calls content representativeness. It is usually not feasible for a timed test to assess everything. Instead, the assessment designer must sample the entire domain. Beck's category specifies:
A set of items mapped to a particular standard is judged as content representative to the degree that the elements of the standard represented in the set of items are strongly connected to the elements that are not sampled directly in the examination. (Beck, 1998, p. 16)
It would be very difficult for any examination, taken all by itself, to satisfy completely the balance of representation category. Instead, it is important to include other forms of assessment that can probe aspects of problem solving, communication, and mathematical connections as part of an overall assessment plan. Extended tasks of the type found in Measuring Up (NRC, 1993a) and High School Mathematics at Work (NRC, 1998) would be useful in portfolio assessment or for the purpose of developing classroom-embedded
assessments. Such tasks can help avoid the narrowing of the curriculum that can occur when classroom practice is overly concerned with routinized preparation for the test.
The final category addressed under Webb's Content Focus concerns Dispositional Consonance. This category is designed to ensure that assessments do not simply ignore those aspects of disposition that the standards attempt to cultivate, including, for example, belief that mathematics is valuable and confidence in one's own ability. If aspects of disposition are important enough to be included in the standards, they deserve to be observed or monitored. And if they are never observed, monitored, or reported, it is very likely that they will fall through the cracks. Black and Wiliam (1998) found that student self-assessment was a useful component of formative assessment. Therefore, if standards expect students to become independent learners, then self-assessment will be essential in cultivating such a disposition.
The sense of balance derived from Webb's and Beck's work on alignment is essential if standards and assessment are to have a positive influence instruction and learning. Certainly, this sense of balance is necessary in providing opportunity to learn for all students. It also is essential to define a balanced set of curricular activities for all students and especially so for those who may have been restricted to a diet of short exercises. Bond makes this case strongly when he writes:
The concentration on teaching basic skills to disadvantaged students has blinded us as educators to the capabilities of such students for sophisticated thought and complex problem solving. The vast majority of students can only learn what they are taught, and can master only what they practice. They do not learn what they are not taught, and they do not master what they do not practice. (Bond, 1995, p. 23)
According to Bond, therefore, a balance in teaching is an imperative for the opportunity to learn.
Equity and fairness in learning and assessment
Webb (1997) defines Equity and Fairness as being one of five general categories for judging alignment. This category, which reinforces the need for multiple measures of assessment, draws upon a series of research studies that show that the format of an assessment can have an adverse impact on students' opportunity to perform (Shavelson & Baxter, 1992; Shavelson, Gao, & Baxter, 1993; Shavelson, Webb, & Rowley, 1989). In addition to promoting multiple measures, Webb's Equity and Fairness alignment category focuses attention on the role of culture, ethnicity, and gender in restricting students' opportunity to perform.
Webb's equity and fairness category also embraces the concept of opportunity to perform, as defined and discussed in Chapter 3. There, issues of opportunity to perform were separated from those of opportunity to learn in an attempt to identify and take responsibility for those aspects of assessment tasks that inhibit opportunity to perform. Broadly, opportunity to perform was defined as the opportunity created by the task for the students to show what they had learned.
The assessment research and experience described in Chapter 3 identified a multitude of factors that must be recognized and controlled to develop quality assessments. In particular, total cognitive load, task miscue, overuse of scaffolding, contextual challenge. and over-zealous assessment can count against students' opportunity to perform. This research and experience reinforces Shavelson and Baxter's conclusion:
Performance assessments are very delicate instruments. They need to be carefully crafted. . . . Shortcuts taken in development of these assessments will produce poor measuring devices. (Shavelson & Baxter, 1992, p. 24)
In Beck's alignment instrument, source of challenge is defined as an important aspect that directs attention toward issues of opportunity to perform. She argues:
In a set of items with appropriate sources of challenge, the greatest challenges in the items lie in the mathematics targeted by the standard (as opposed, for example, to challenges of reading comprehension or interpreting item context). (Beck, 1998, p. 13)
Because all students can and should benefit from studying to prepare for standards-based assessments, it is important that assessments do not (advertently or inadvertently) create tasks that trick or trap. Esoteric abbreviations, contexts, reading comprehension, or any of the other barriers to performance identified in Chapter 3 should not prevent students from being able to showcase their understanding. Similarly, it is important that assessments be designed so that success depends upon mathematical understanding and not only upon unusual talent or special insight.
Bond reminds us that it is not enough only to eliminate bias that counts against some portions of the population:
It is important to note that specific issues of bias and fairness, and the more general issues of unintended negative consequences, involve not only the elimination of elements in assessment that unduly disadvantage minority persons but also the elimination of construct irrelevant elements that may subtly advantage majority persons over others [italics in original]. (Bond, 1995, p. 23)
Using alignment criteria—some recommendations
In a state or district where there is an aligned system of instruction and learning, the challenge is to activate those opportunities for learning that are afforded by the alignment of standards and assessment. Webb writes:
Through understanding the link between expectations and assessment, teachers are more likely to find ways to translate what is being advanced . . . into their daily work with students. (Webb, 1997, p. 2)
The important connection described here is between standards and assessment and not between instruction and assessment. When standards and assessment are aligned, appropriate instruction can be delineated. When instruction and assessment are aligned without regard to learning expectations, there is a danger that instruction will become too narrowly focused on the assessment rather than on the depth, range, structure, of the learning expectations themselves. Here are some recommendations that can be used in aligning standards and assessments:
Recognize that tradeoffs will always need to be made to achieve alignment.
Evaluate how any one tradeoff might affect students' opportunity to learn.
Evaluate how any one tradeoff might affect equity and fairness.
When adopting an assessment that is aligned with standards, make sure that the adoption committee works from the standards to the assessment, rather than from the assessment to the standards. This will help ensure that important aspects of the standards are not inadvertently overlooked.
Use multiple assessment measures whenever possible. A single examination is seldom sufficiently fair and equitable.
Introduce teachers to alignment criteria through professional development. When teachers understand the link between standards and assessment, they are more likely to align their teaching to both standards and assessment. A useful exercise is to compare different assessment systems to a single set of standards.
Ensure that all involved are aware that when instruction is aligned to assessment rather than to standards, there is the danger of the curriculum becoming overly narrow. When the depth, range, structure, and balance of the standards are reflected in the assessment system, the curriculum can, in turn, be designed to reflect the depth, range, structure and balance of the standards.
The whole instruction and learning system can be seen as a bridge that is vital in closing the gap between what the standards expect and where assessments show the majority of students to be.
The comprehensive picture that emerges from Webb's alignment document and Black and Wiliam's vision for effective teaching (discussed in Chapter 4) is a rounded picture of the components that are necessary to ensure opportunity to learn for all students. The vision created by these two important recent publications is an exciting one.
In classrooms informed by this vision, students would work on a variety of tasks whose range would extend well beyond the range of tasks that are likely to appear in the on-demand component of the assessment plan. There would be a balance of work on skills, conceptual understanding, and problem solving. Perhaps more students would develop their repertoire of basic mathematical skills as they worked on challenging non-routine problems. There would be fewer instances of tightly sequenced instruction and assessment, and more students who put their mathematics to work in ways that demonstrate an integrated and robust understanding.
In such classrooms, teachers would be freer from the constraints of a fragmented and cluttered curriculum and would be more committed to teach for understanding rather than coverage alone.
In a world informed by this vision, parents might not worry as much about grades and about how well their students compared to other students. Instead, they might worry whether the tests that were administered to their children were aligned to the standards, and whether their children had been given opportunities to learn and to perform.
The primary role of assessment would not be to compare students to one another but to enable students to see how they perform in relation to a balanced, publicly negotiated, and challenging set of standards. Rather than rank or sort, assessment could provide feedback to the students on how well they have learned what they are supposed to learn. As a result, students would have new understandings of what they need to do to learn the mathematics that the standards expect them to learn. It is worth remembering that the main function of alignment is not simply to improve assessment but to use assessment as a means of enhancing student learning.