ASSESSMENT AND ACCOUNTABILITY: WHAT KINDS OF ASSESSMENT ARE USED AND FOR WHAT PURPOSES?
Assessment, traditionally used by individual teachers to monitor student learning and to provide a basis for assigning grades, has always been a critical component of the education system (Glaser and Silver, 1994). Over the years, however, the character of educational assessment has changed. In the 1970s, concerns about reading and computational literacy led many states to implement minimum competency programs as a requirement for high school graduation. The role of assessment continued to evolve, as policy makers turned to assessment as a way to improve education. Standards-based reforms of the 1990s gave assessment increasing visibility, sending signals about the successes and failures of schools and school districts, as well as of individual students.
Assessments generate information and, depending on the nature and use of the information obtained, can play multiple roles in education. Accountability involves using some of this information to generate incentives to validate or change the behaviors of students and educators. Taken together, assessments and accountability policy constitute a third channel through which education reform ideas may flow. Various types of assessments— formative classroom assessment, classroom tests, state and local tests, college entrance and placement practices, tests for teacher certification—all interact with other elements in the education system, sometimes in unanticipated ways.
Considering the roles of assessment in K-12 educational practice includes study of four key elements:
How accountability interacts with assessments
How teachers conduct and use classroom assessment
How states and districts use assessments for accountability
How assessments influence postsecondary education choices
ASSESSMENT AND ACCOUNTABILITY IN THE EDUCATION SYSTEM
The pervasiveness, political importance, and potential influence of assessment on student learning make it a potent tool for change. Compared to other vehicles for change, such as long-term professional development, assessment is an attractive strategy to policy makers, since tests are relatively inexpensive to construct and administer. Moreover, assessment can be externally mandated and implemented rapidly, yielding visible results (Linn, 2000).
As the standards movement extended beyond standards designed by the educational community for use by educators to a vehicle for motivating school change, states began designing assessments to measure student learning against those standards. Other policies also contributed to the increased role of assessment. For example, Title I of Improving America’s Schools Act of 1994 (PL 103–382 108) and the Individuals with Disabilities Education Act Amendments of 1997 (PL 105–17 111) require that states develop high-quality assessments to measure performance on high standards for all students, including those with disabilities. In addition, states participating in the second Education Summit in Palisades, New York, in March 1996 agreed to establish clear academic standards for student achievement in core subject areas and to assist schools in accurately measuring student progress toward reaching these standards (National Education Goals Panel, 1996).
Assessments provide a systematic way to inform students,
teachers, parents, policy makers, and the public about student performance. The reporting of test results represents the simplest form of accountability. Stronger incentives for educational change are provided by accountability mechanisms that use information from assessments to make consequential decisions about students, teachers, or schools. Assessment and accountability policies can provide clear direction for teachers and principals in terms of student outcomes and can become a positive impetus for instructional and curricular changes (Goertz, 2000; Kelley, Odden, Milanowski, and Heneman, 2000; O’Day and Smith, 1993; Popham, 2000). When assessments are aligned with learning goals, accountability systems can motivate classroom instruction to focus on those outcomes (Stecher, Barron, Kaganoff, and Goodwin, 1998). Thus, policy makers and educators in many states view assessment linked with accountability as a powerful strategy for ensuring that all students are held to the same set of high standards (Grissmer and Flanagan, 1998; Massell et al., 1997; Olson, 2001).
Assessments can drive change at different levels of the system, for example, by informing the public about the overall state of achievement or by informing those who make decisions about teacher certification, allocation of resources, or rewards and sanctions for schools. Tests based on large, statistically selected national samples, such as the National Assessment of Educational Progress (NAEP), are designed to provide a national overview of U.S. student achievement over time (National Research Council [NRC], 1999b), often spurring state and national efforts targeted at reform. Although NAEP results provide no information about individual students, many state assessments are designed to compare individual student performance levels to specific state standards.
Assessments are designed to serve particular purposes, and assessment experts warn that a test designed for one purpose is unlikely to be appropriate for an entirely different purpose. One major issue in the late 1990s concerned the inappropriate use of
tests as evidence of the success or failure of schools and schooling (Linn and Herman, 1997; NRC, 1999b; American Educational Research Association, 2000).
Assessment and accountability practices apply to educators as well as to K-12 students. National concern about teacher quality (NRC, 2001b; Lewis et al., 1999; Education Trust, 1999a) has given rise to assessments for prospective and practicing teachers. These vary from tests such as the Praxis I and II, used by many colleges and universities as an entry or exit requirement for teacher education programs, to state tests that prospective teachers must pass before they receive licensure. Some states have instituted more complex processes for initial licensure, including evaluation of portfolios of student work and videos of classroom practice during induction years. Teachers seeking National Board for Professional Teaching Standards (NBPTS) certification must satisfactorily complete a series of assessments based on videos of their classroom teaching and analysis of student work, as well as tests of their content knowledge (NBPTS, 2001).
Assessments designed or selected by teachers are critical components of education assessment. Teachers use assessment to inform instructional decisions, motivate and reward students, assign grades, and report student progress to families. They continuously assess what students know and how they have come to that understanding by, for example, reviewing homework, managing discussions, asking questions, listening to student conversations, answering questions, and observing student strategies as they work in class. Assessment and instruction interact when teachers collect evidence about student performance and use it to shape their teaching (NRC, 2001a; Shepard, 2000; Black and Wiliam, 1998; Niyogi, 1995).
Teachers also give students “summative” assessments regularly as end-of-unit and end-of-year tests. Teachers build their
understanding of formal assessment from their own classroom experiences, interactions with colleagues, assessment materials accompanying textbooks, courses in preservice and professional development programs, and their familiarity with standardized assessments. They may adopt a variety of forms of assessment, from multiple-choice tests to writing assignments to performance-based assessments guided by scoring rubrics. Teachers may use student portfolios to document student learning over time, which, in the case of technology, may often take the form of student-created projects.
State and District Assessment and Accountability Policies
Nearly all states have adopted assessment programs, often as the centerpiece of their accountability strategies (Education Week, 2001; Council of Chief State School Officers [CCSSO], 1999b). From a policy viewpoint, state tests sometimes define the “content of most worth” for schools and their teachers. School districts may use their own or commercially developed tests to measure their progress against national norms, to evaluate their own programs, or to monitor the level of individual student learning for placement purposes.
Some state and local district assessments are “high stakes.” That is, they carry important consequences for students, teachers, or schools, such as promotion to the next grade, salary allocations, or monetary bonuses for schools (CCSSO, 1999a). Some states also provide extra staff and resources to assist low-performing schools or districts; some give financial rewards for high levels of performance or for improvements in student outcomes.
States and districts may use “norm-referenced” tests, where a student’s reported score is compared to the scores of other students in some reference population. Schools may use the results of those tests to “track” students into courses with different content and achievement expectations, a practice that has raised concerns about adversely affecting minorities and students in certain geographic
areas (Oakes et al., 1990; Shepard, 1991; Glaser and Silver, 1994). Publishers of norm-referenced tests study state curricular guidelines and existing textbooks, and establish test specifications based on the content they identify. In some instances, publishers customize tests according to the criteria of a particular state or district. Generally, such tests are not released to educators or the public; their confidential nature often makes it difficult to analyze what the tests actually measure.
Over half of the states and some districts use some form of “criterion-referenced” assessments (CCSSO, 1998). Such assessments attempt to establish whether a student has met a particular performance level by estimating the extent to which each student has learned certain content, regardless of how others might have performed (NRC, 1999d). A number of states and districts have attempted to use portfolios to document student learning over time, but have encountered substantial problems due to scoring difficulties and costs (Koretz, 1998; Stecher, 1998).
In addition to state tests, school districts may use a variety of other tests, which interact with decisions made about curriculum and instruction. Tests that measure what students know overall are different from those designed to measure what students have learned within a particular course or time interval, placing different demands on what teachers are expected to teach. From test to test, the conditions and the nature of the content tested may vary widely. For example, one test may allow the use of calculators, another may not; one may emphasize mastery of science terms, another may emphasize understanding of science concepts. Some assessment reports may disaggregate the data, highlighting changes in performance for students of different ethnicities, socioeconomic backgrounds, or cultures, leading to greater focus on students within those groups.
College Entrance and Placement Practices
Within two years after high school graduation, nearly 75 percent
of U.S. students will enroll in a postsecondary institution (Education Trust, 1999b; National Center for Education Statistics, 1997a). Consequently, college entrance and placement assessments guide many decisions made by high school students and teachers, as well as decisions about those students made by postsecondary institutions. The most important assessments for those students— customarily the SAT or ACT—affect college admission. Other assessments, including advanced placement tests and those administered by colleges and universities, guide course and program placement. For example, placement tests for introductory mathematics at the college level are used to identify students for remediation or acceleration and may as a result influence the content taught at the secondary level (Hebel, 2001).
Impact and Unintended Consequences of Assessment
The interpretation and consequent influence of assessment as a measure of educational improvement are matters of debate. On the one hand, such assessments can set levels of acceptable performance for all students and provide benchmarks against which teachers, students, and states can view their own educational accomplishments. The assessments may motivate educators to change their practices and decision makers to modify their policies. If politicians and educators believe that full alignment of content, instruction, and assessment will positively affect student outcomes, they may invest considerable effort in trying to ensure that such alignment is in place across all levels of the education system.
On the other hand, researchers and others have raised concerns about using large-scale assessments to monitor student and school performance (Resnick and Resnick, 1992). Large-scale assessments may not provide valid and comparable measures of performance for all students. States or districts may exclude some students from their assessment programs (generally second-language learners), or withhold student test results that are not valid measures of what
students know or are not comparable to scores of students generated under regular testing conditions.
Questions often arise regarding scoring procedures and what it means to “pass” a particular test. For example, some researchers claim that the use of averages in reporting test scores—one of the most common strategies in assessment—is inappropriate, arguing that average scores fail to account for variability within the population (Meyer, 1996). There is evidence that the choice of controlling variables (e.g., socioeconomic status variables, prior achievement) and summary statistics (e.g., mean gain, mean difference) help determine what conclusions are drawn (Linn, 2000; Clotfeler and Ladd, 1996). Factors such as when a test is administered during the school year also affect conclusions about apparent growth in student achievement (Linn, 2000). In addition, there is concern about the validity of what assessment data seem to indicate about student performance. A recurring pattern is evident in the implementation of a new test—a decrease in student performance the first year, followed by sharp increases in achievement in subsequent years— that may overstate actual student growth (Linn, 2000).
Large-scale, high-stakes tests can produce unintended effects. When rewards and consequences are attached to test performance, high scores may become the classroom focus and may well change the nature of instruction (Haertel, 1999; Glaser and Silver, 1994; Linn and Herman, 1997). This in turn may generate inflated scores that are not representative of what students actually know (Koretz, Linn, Dunbar, and Shepard, 1991; Madaus, 1988; Stecher and Barron, 1999; Klein, Hamilton, McCoffey, and Stecher, 2000). A key objective in aligning content and assessment is to help shape instruction and to raise expectations for student performance. Questions arise, however, about whether teachers are focusing on teaching the underlying standards-based content or simply teaching to the test. Some argue that high-stakes tests tend to narrow the curriculum. That is, teachers reduce instructional time devoted to problem-solving and open-ended investigations, and restrict their
expectations for student learning to the particular knowledge and skills included on the test (Dwyer, 1998; Barton, 1999). The use of assessments for purposes for which they were not designed may partially account for some of that concern, but similar effects have been linked to tests even when used as intended (Stecher and Barron, 1999).
Assessments do more than simply provide information about achievement, they also specify expectations for student knowledge and performance (NRC, 1993, 1996), providing “an operational definition of standards, in that they define in measurable terms what teachers should teach and students should learn” (NRC, 1996, pp. 5–6). The development and use of assessments keyed to the standards to support teaching, to drive educational improvement, and to support accountability are indicators of possible influences attributable to nationally developed standards.
HOW STANDARDS MIGHT INFLUENCE ASSESSMENT AND ACCOUNTABILITY
If nationally developed standards are influencing assessment policies and practices, assessments would be aligned with learning outcomes embodied in the standards. In particular, if state assessments and standards are aligned with the nationally developed standards, assessment at all levels would include problem solving and inquiry in addition to other skills and knowledge. Teachers would use classroom assessment results to inform instructional decisions and to provide feedback to students about their learning. Teachers, administrators, and policy makers would employ multiple sources of evidence regarding what a student knows and is able to do, as is called for in the standards, rather than relying on a single source.
Developers of student assessments would be familiar with nationally developed assessment and content standards and create assessment materials that reflect the standards by having appropriate items, clear examples of the kinds of performance that students
are expected to demonstrate, criteria by which these performances are evaluated, and reports that inform instruction as well as measure achievement. Assessment results would be reported in language accessible to parents and other stakeholders, helping them to understand what the tests measure and how results labeled as “proficient” or “basic” should be interpreted.
States and districts would have a comprehensive plan for administering the array of assessments they use with students, and the plan would enable teachers to pursue the vision of the standards as well as prepare students to take those assessments that are high stakes. Incentives linked to accountability would encourage standards-based reforms, with policies in place to ensure that schools and teachers have standards-based professional development opportunities, instructional materials, and appropriate resources to enhance their efforts to raise performance levels of their students. Finally, college entrance and placement tests would measure content that is valued by standards created at the national level and contain tasks aligned with those standards.
THE ASSESSMENT AND ACCOUNTABILITY CHANNEL AND NATIONALLY DEVELOPED STANDARDS
The Framework questions (see Figure 3–3) can guide the study of possible influences of standards on K-12 assessment practices and policies. Useful questions focused on this channel of influence include:
How has the assessment and accountability component of the education system responded to the introduction of nationally developed standards?
To what extent have teachers modified their assessment practices in line with the recommendations of the standards?
Are teachers using classroom assessment to monitor student progress in relation to the standards and adjust their instruction accordingly?
To what extent are state and district assessment and accountability systems aligned with the content, instruction, and assessment called for in the standards?
What changes have states and school districts made in the use of assessments and in the infrastructure to support the implementation of standards-based assessment programs?
To what extent do assessment systems report student achievement for demographic subgroups of the population so policy makers can determine whether all students are making progress towards higher standards?
What actions have been taken to align college entrance and placement tests with nationally developed standards?
Studies that explore answers to such questions will inform the two overarching questions: How has the system responded to the introduction of nationally developed mathematics, science, and technology standards? and What are the consequences for student learning?
The next chapter deals with influences external to the education system that might also have an impact on how standards affect classroom teaching and learning. As the chapter points out, those influences may arise within public, professional, and political communities.