4
Analyzing Quality and Impact

With the options and evaluation framework as a general guide, the workshop discussion turned to specific analyses that could be used in evaluating aspects of standards and standards-based reform.

QUALITY OF CONTENT STANDARDS

Perhaps most straightforward have been questions about the quality of content standards themselves, but even this kind of analysis is more complicated than it might seem. Karen Wixson presented an overview of the rating schemes that have been used by four national organizations—Achieve, Inc., the American Federation of Teachers, Education Week, the and Fordham Foundation—to evaluate the quality of state content standards. She found that, overall, the groups each included some, but not all, of the following evaluation criteria:

  • Specificity,

  • Clarity,

  • Subject and grade coverage,

  • Coverage of subject-specific topics,

  • Rigor/demand of topics,

  • Balance of knowledge and skills,

  • Teaching approaches, and

  • Policy and practice changes to implement standards.

None of the groups considered all of these criteria, and even when they



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 31
4 Analyzing Quality and Impact W ith the options and evaluation framework as a general guide, the workshop discussion turned to specific analyses that could be used in evaluating aspects of standards and standards-based reform. QuALITY OF CONTENT STANDARDS Perhaps most straightforward have been questions about the quality of content standards themselves, but even this kind of analysis is more complicated than it might seem. Karen Wixson presented an overview of the rating schemes that have been used by four national organizations— Achieve, Inc., the American Federation of Teachers, Education Week, the and Fordham Foundation—to evaluate the quality of state content stan- dards. She found that, overall, the groups each included some, but not all, of the following evaluation criteria: • Specificity, • Clarity, • Subject and grade coverage, • Coverage of subject-specific topics, • Rigor/demand of topics, • Balance of knowledge and skills, • Teaching approaches, and • Policy and practice changes to implement standards. None of the groups considered all of these criteria, and even when they 1

OCR for page 31
2 COMMON STANDARDS FOR K-12 EDUCATION? TABLE 4-1 Examples of Evaluation Criteria Related to Standards Content Achieve, Inc. Important subject matter—require use and application of agreed-on core subject matter. American Federation of Should include the following particular content: • In reading should cover reading basics (e.g., Teachers word attack, skills, vocabulary) and reading comprehension (e.g., exposure to a variety of literary genres). • In mathematics should cover number sense and operations, measurement, geometry, data analysis and probability, and algebra and functions. Should give attention to both content and skills. Should cover the core subjects of English, mathematics, Education Week science, and history/social studies. • Fordham Foundation Inclusion of particular content. • Avoiding influence of 1990s-era national standards. • Rigor of content. • Adequate attention to content knowledge versus skills. SOURCE: Committee compilation of criteria used in reports by the organizations listed. seem to identify the same criteria, they do not necessarily use the terms in the same way. Looking just at content, for example, Wixson demonstrated the range of perspectives. Table 4-1 shows the language related to content used by each group. Although each group’s text is best understood in the context of its overall presentation of its evaluation criteria, important differences are evident. Wixson also noted that a critical element was missing from the criteria described by all the groups: the perspectives of experts from the academic disciplines. Discipline experts are often involved when standards are developed, and many such groups have produced cogent descriptions of the ways learning develops within academic disciplines, as well as of the key concepts in each discipline that students need to master as they progress.1 Drawing on research on learning and cognition as well as on content expertise, these descriptions emphasize the integration of learn- ing and understanding and the ways that factual learning, conceptual 1 Research in mathematics and science education, in particular, has yielded advances in the understanding of how students learn complex content and of the corresponding “content pedagogy”—that is, understanding of complex content integrated with understanding of how best to teach it—that teachers need (see National Research Council, 2001, 2006).

OCR for page 31
 ANALYZING QUALITY AND IMPACT understanding, and developing skills reinforce one another throughout the K-12 learning progression they are describing (see National Research Council, 2000). These more complex views of learning, however, are often overlooked in the context of assessment and accountability programs. Discrete skills and factual knowledge are relatively easy to assess, while the aspects of learning that discipline experts describe, such as developing conceptual understanding or the capacity to retrieve and apply factual knowledge to solve problems, are not. The consequence, in Wixson’s view, is that the development process tends to winnow out the more complex and nuanced elements of learning in a discipline, and that the resulting state content standards are significantly curtailed versions of what discipline experts have described. She believes that the No Child Left Behind (NCLB) assess- ment requirements have exacerbated this problem—an unintended nega- tive consequence of a program that was intended to move states away from the low-level basic skills standards that many of them used in the 1970s and 1980s. The impact of these comparatively meager content standards is far- reaching. Without the discipline-based perspective, the focus tends to be on static, linear development of knowledge, rather than on develop- mentally appropriate learning progressions. Teachers must have what is called pedagogical content knowledge, and this has been defined for various fields, in order to foster this kind of learning in their students, but the importance of this preparation is easily overlooked if the learning progressions are not described in the state standards. In practice, Wixson observed, it is the standards for elementary and middle school students that are the least likely to reflect more sophisticated conceptions of learn- ing in the disciplines, yet students will need that type of preparation in order to succeed with the challenging coursework encouraged in high school by many states, such as the advanced placement or international baccalaureate programs. Some participants suggested that the significant problem is not so much the way that content standards frame expectations for students at each level, but rather that assessments have come to be the de facto standards. As one put it, “whatever gets on the test is what gets taught,” and thus the deciding factor tends to be whether or not a learning objec- tive can be assessed using a multiple-choice question. Others pointed out that standards that specify grade-by-grade expectations for students are critical guides for instruction, but that discipline experts may not be in a position to support firm recommendations about when particular elements should be introduced. While most people would agree that the content standards should be both logical and coherent, the many practi-

OCR for page 31
4 COMMON STANDARDS FOR K-12 EDUCATION? cal purposes for which standards are needed may not all be met by the descriptions provided by discipline experts. IMPACT OF STANDARDS ON TEACHINg AND LEARNINg Examining the extent of variation across the states was relatively straightforward. The evidence seems clearly to suggest that the current arrangement, in which each state devises its own content and perfor- mance standards, is characterized by dramatic variation in what and how much students are asked to learn. This information does not directly answer, however, the broader questions of what real impact that variation has had on teaching and learning, and what benefits a set of common standards that was widely adopted might bring. Douglas Harris explored possible ways to investigate these questions empirically.2 He began by considering the possibility that common standards might have the effect of improving education by: • roviding better instructional coordination across schools and p grade levels; • llowing students to experience higher level academic content; a • ddressing inequities, since disadvantaged students are more likely a to be offered lower level content; • roviding concrete, more conceptually based guidance to teachers; p and • mproving fairness and the effectiveness of federal and state i accountability. In theory, any set of standards (not just standards common to all states) might have these effects, so Harris considered the kinds of evidence that might indicate whether existing standards have solved these problems. His conclusion is that the most apparently promising evidence actually reveals relatively little about the impact of standards on teaching and learning. For example, the dramatic variety in performance expectations, textbook coverage, and enacted curricula seems to reflect an unequal dis- tribution of high-quality instruction. However, this variety is the result of decentralization and local experimentation, which, Harris asserted, “many see as the system’s greatest strength.” Similarly, the marked overlap in topic coverage across grades may mean dull repetition in many cases, but it is partly deliberate and may also have advantages—the analyses may 2 Harris also developed a paper on this topic, which is available at http://www7.nation- alacademies.org/cfe/State_Standards_Workshop_1_Agenda.html [May 2008].

OCR for page 31
 ANALYZING QUALITY AND IMPACT not be able to capture iterations in the instruction on a particular topic that promote growth in students’ conceptual thinking, for example. Other possible evidence includes findings from international research that highly centralized education systems are associated with higher aggregate achievements, and that schools with high achievement results tend to be those with high academic expectations. These findings also provide limited support for inferences about the effects of standards themselves, because they offer correlations, but not causal evidence. What then, might provide stronger evidence? Ideally, Harris explained, one would randomly assign entire states to treatment and control groups, measure student outcomes, and then wait patiently to observe possible systemic effects. Since this method is not possible, a feasible alterna- tive might be to conduct quasi-experimental analyses of the relationship between standards and student outcomes, by observing what happens when states make specific changes in their policies. Harris speculated, however, that this method would demonstrate no clear effects because standards have relatively minor influence by themselves. In changing its standards, a state is making “a small reform. You are not putting tons of resources in. You are not changing who is in the classroom. You are not changing what teachers know. You are not changing a lot of what’s going on in the classroom, at least in an immediate sense.” And, of course, any change that might result would take years, which is a further complica- tion, because many other factors might change markedly over the period in which one might expect to see standards have an impact—thus blur- ring the effects of the standards themselves. Even if the ideal research methods are of limited utility in studying the effects of this type of policy change, Harris argued, “we can’t just throw up our hands and say we can’t understand this. . . . That always gives the benefit of the doubt to the status quo.” There are other options. One would be to review the effects of standards on intermediate out- comes, such as instruction and the enacted curriculum, teacher prepara- tion, professional development, textbook design and adoption, or changes in the way schools are administered. Finding out, for example, whether teachers are aware of the specifics of revised standards and report that they attempt to apply them, or whether changes in textbook content or teacher professional development align with revised standards, would reveal important effects (or noneffects). Another strategy would be to use quasi-experimental techniques, such as value-added modeling, to look at changes in student outcomes in the context of changes in standards or changes in enacted curricula. Another possibility would be to find schools in which changes in standards have apparently had an influence and do “reverse engineering” by examining

OCR for page 31
 COMMON STANDARDS FOR K-12 EDUCATION? the apparent mechanisms for change and comparing the circumstances in that setting with those in a school that did not seem to change. While it is important to investigate the impact of standards, Harris observed, he was mindful that standards are intended to serve as the nucleus of the network of linked elements of an aligned education sys- tem: curriculum, instruction, assessment, teacher preparation and profes- sional development, etc. It would be unreasonable to expect that simply establishing standards would be sufficient to bring about improvement in teaching and learning, so looking for effects without considering the other elements would be missing the point. And the most influential of the related elements of the system in practice is surely the accountability system. That is, schools tend to pay most attention to standards that are assessed, so the effects of standards might be most evident in studies of accountability systems. Harris suggested that refining the research question about standards, so that one considers them as variables that moderate the effects of accountability on students outcomes, might be a more useful kind of analysis then a search for the impact of standards themselves. Harris also noted that it is important to consider possible negative impacts as well. For example, it is possible that the reduction in flexibility that may come with new standards could discourage teachers who are already effective, or it could discourage desirable prospective teachers. His primary message was that the understanding of the effects of stan- dards needed to support sound policy decisions must be pieced together from imperfect sources of evidence—and that empirical evidence alone may not be sufficient to answer the policy question of what benefits com- mon standards might bring. Laurie Wise provided another perspective on analysis of the impact of standards on achievement, which followed up on the findings from the National Assessment of Educational Progress (NAEP) analysis described earlier by Peggy Carr (National Center for Education Statistics, 2007). That study showed that states vary widely in where they set their proficiency- level cut scores—some have very high expectations for students, and some have very low ones. The question is, does it matter? One way to approach this question is to examine whether states that have set high proficiency standards subsequently showed greater improvement in stu- dent achievement scores on NAEP. If one assumes, Wise explained, that content differences matter less than differences in performance expecta- tions (because if the standard is set high, the necessary resources will be marshaled to meet it), then the achievement scores should rise with higher proficiency standards. He looked for that relationship across two periods, 2003 to 2005 and 2005 to 2007. Wise started with the results of the National Center for Education

OCR for page 31
 ANALYZING QUALITY AND IMPACT Statistics study, which provided a way of comparing the proficiency scales used in individual states with the NAEP scales, and confirmed the wide range of state proficiency standards. By comparison, however, students’ performance on NAEP varies much less by state than the dramatically different state proficiency cut scores would suggest. Wise found a modest correlation between the percentage of students who were proficient on the state’s assessment and their performance on NAEP, but he also found that “if a state had a high percent of students scoring at the proficient level, it’s almost sure that they set a very low cut point.” The bottom line, however, was that Wise found no statistically signifi- cant relationship between states’ proficiency cut scores and their students’ gains on NAEP across the two periods he examined. His initial conclusion was that other factors, such as coherent content standards and the quality of curriculum and instruction, may be more important than where the proficiency cut scores are set. He suggested that more careful study is needed, perhaps along the lines Harris suggested, to explore the full chain of causal factors that have impact on student achievement.

OCR for page 31