4
Taking Stock of the National Science Education Standards: The Research for Assessment and Accountability

Norman L. Webb and Sarah A. Mason

Wisconsin Center for Education Research

Accountability and assessment have become ingrained in national and state education systems, and accountability and assessment are not without controversy. Accountability and assessments have been criticized for lessening local control, applying inequitable sanctions on minority groups, and narrowing the curriculum. Further complaints have been registered about requirements for students, schools, and districts that have been imposed on educational systems unprepared to provide additional instruction to students who do not meet set criteria. Some districts have openly defied state mandates imposing graduation requirements. Others disparage that the pressure to improve scores on high-stakes assessments has influenced many students and school officials to “teach-to-the-test” and even cheat.

Critical to any accountability system are standards or targets for what students are to know and do. It is not surprising that the movement toward accountability systems has coincided with a greater use of curriculum standards. In fact, many view standards-based reform as including some form of accountability and assessments. However, the substance of curriculum standards can vary greatly. This frequently has been the case when the development of state standards becomes politicized with the governor having more control over the content than the superintendent of education. It cannot be a foregone conclusion that standards, such as the National Science Education Standards (NSES) and AAAS Benchmarks for Science Literacy, developed by national groups of content experts, will be fully represented in state or other standards developed through a public or political process. Thus, it is a viable question to ask what is the influence of the NSES and AAAS Benchmarks on state standards, accountability systems, and assessments. The answer to this question is important because it relates specifically to the science content integrity imposed by accountability and assessment systems.

In this paper, we draw upon a body of literature accumulated by a National Research Council (NRC) search designed to reveal how influential the NSES and AAAS Benchmarks have been on accountability and assessment systems. The search produced major documents and studies but cannot be considered exhaustive. This paper is based on the identified studies unveiled by NRC supplemented by a few other studies we contributed. Even though we did not consider all available studies the strong confirming evidence from those that were reviewed strengthen our confidence that our general findings have some validity.

The paper is divided into four parts. The first part is an overview of the growth in accountability and assessments over the previous decade. The second part is on accountability with four sections. The first section reports on the research links between national science standards and accountability systems, the main question of



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 76
4 Taking Stock of the National Science Education Standards: The Research for Assessment and Accountability Norman L. Webb and Sarah A. Mason Wisconsin Center for Education Research Accountability and assessment have become ingrained in national and state education systems, and accountability and assessment are not without controversy. Accountability and assessments have been criticized for lessening local control, applying inequitable sanctions on minority groups, and narrowing the curriculum. Further complaints have been registered about requirements for students, schools, and districts that have been imposed on educational systems unprepared to provide additional instruction to students who do not meet set criteria. Some districts have openly defied state mandates imposing graduation requirements. Others disparage that the pressure to improve scores on high-stakes assessments has influenced many students and school officials to “teach-to-the-test” and even cheat. Critical to any accountability system are standards or targets for what students are to know and do. It is not surprising that the movement toward accountability systems has coincided with a greater use of curriculum standards. In fact, many view standards-based reform as including some form of accountability and assessments. However, the substance of curriculum standards can vary greatly. This frequently has been the case when the development of state standards becomes politicized with the governor having more control over the content than the superintendent of education. It cannot be a foregone conclusion that standards, such as the National Science Education Standards (NSES) and AAAS Benchmarks for Science Literacy, developed by national groups of content experts, will be fully represented in state or other standards developed through a public or political process. Thus, it is a viable question to ask what is the influence of the NSES and AAAS Benchmarks on state standards, accountability systems, and assessments. The answer to this question is important because it relates specifically to the science content integrity imposed by accountability and assessment systems. In this paper, we draw upon a body of literature accumulated by a National Research Council (NRC) search designed to reveal how influential the NSES and AAAS Benchmarks have been on accountability and assessment systems. The search produced major documents and studies but cannot be considered exhaustive. This paper is based on the identified studies unveiled by NRC supplemented by a few other studies we contributed. Even though we did not consider all available studies the strong confirming evidence from those that were reviewed strengthen our confidence that our general findings have some validity. The paper is divided into four parts. The first part is an overview of the growth in accountability and assessments over the previous decade. The second part is on accountability with four sections. The first section reports on the research links between national science standards and accountability systems, the main question of

OCR for page 76
interest for this paper. The next two sections discuss conducting research in this area. One is on the type of research that has been done and the other is on the complexity of conducting research on accountability systems. The accountability part concludes with a section discussing issues and concerns related to researching accountability systems. The third part is on assessment and begins by defining assessment in general as applied in science. This is followed by a section that outlines recent changes in what people think about assessment including the vision for assessment in the NSES and AAAS Benchmarks. The third and fourth sections present research on the relationship between standards, including the NSES and AAAS Benchmarks (but not limited to these), and assessments. The third section discusses the alignment between standards and assessment, an important procedure for judging the relationship between standards and assessments. This is followed by a section of research on the influence of assessment on teachers’ practices and student learning. The fourth part of the paper is our conclusions and needed research. GROWTH IN ACCOUNTABILITY AND ASSESSMENT SYSTEMS OVER THE 1990S A number of initiatives have shaped education over the last decade—before the NSES and AAAS Benchmarks were written and after they were published. Over this time, accountability emerged as a dominant strategy employed by states and districts to improve education. Since the early 1990s, all 50 states have been engaged in developing education initiatives related to high standards and measurement of student performance that focus accountability on student outcomes. These efforts were spurred early in the decade by concerns about increasingly low student performance, the failure of Title I to close the achievement gap for educationally disadvantaged students, and an emphasis on basic skills and low expectations, as well as a focus on inputs and compliance rather than on academic outcomes. The Improving America’s Schools Act of 1994 (IASA) galvanized state efforts to develop new accountability systems that were meant to address these problems (Goertz, Duffy, and LeFloch, 2001). Over the rest of the decade, states took the lead in fashioning accountability and assessment systems that were based on standards and designed to provide information on student performance outcomes and school progress in addressing learning for all students. Over the 1990s, all but one state adopted state curriculum standards in an effort to increase educational quality. If states had knowledge of the national standards, it is likely that these documents would be important factors in outlining what students should know and be able to do to be competent in science and other content areas in a world undergoing significant social, economic, and technological changes. But most of the states were engaged in developing standards prior to the release of the NSES or the publication of the AAAS Benchmarks (Blank and Pechman, 1995). As a consequence, some states left out or put less emphasis on prominent topics included in these policy documents, including the nature of science, history of science, science as inquiry, science and society, and science applications. Prior to publishing the NSES and the AAAS Benchmarks, a number of people were emphasizing the need for alternative forms of assessment and higher expectations for student learning in science (Resnick, 1993; Wiggins, 1989; Forseth, 1992; Baron, 1991; Doran, Reynolds, Camplin, and Hejaily, 1992; Hoffman and Stage, 1993; Hein, 1991). Counter to these recommendations, the use of standardized, norm-referenced, fill-in-the-blank assessments has increased over the last decade, while the number of large-scale assessments incorporating open-ended activities that would reveal more of students’ underlying thinking has remained the same. Much of this has occurred since the publication of the NSES. Very little research has been done that specifically looks at the influence of the NSES or the AAAS Benchmarks on assessment and accountability, or, in turn, on the relation of science assessments or accountability to teachers’ classroom practices. An increasing amount of research is being conducted on large-scale reform in education that frequently incorporates data or information on assessments and accountability. However, much of this research focuses on mathematics and language arts rather than on science. The research that does exist is not very extensive. This makes it impossible to establish a causal link between the NSES and the AAAS Benchmarks on the one hand and assessment and accountability practices on the other. At best, research provides a description of practices that are compatible with the view of science education advanced in these standards. Much of the existing literature addressing assessment and accountability consists of historical analyses,

OCR for page 76
status reports, and the evaluation of reform initiatives. These studies may reference the NSES or report on science, but they generally do not report findings associated with the NSES or science. There are only a few studies that have incorporated a research design that involves sampling or contrasting groups that produced results with some generalizability (e.g., Stecher, Barron, Kaganoff, and Goodwin, 1998), or are a compilation of a collection of studies such as a meta-analysis (e.g., Black and Wiliam, 1998). In these latter studies, researchers collected data relevant to questions about the influence of the NSES, or of some national standards, on assessment practices or accountability. A few of the studies employed case-study methodology (e.g., Fairman and Firestone, 2001). There also are conceptual papers by authors who have drawn from their own work and the work of others to develop a point of view or to synthesize a body of literature. These studies may reference the NSES, showing at least some recognition of this standards document, but generally their authors are trying to advance a specific point, such as the importance of using writing in assessing students’ knowledge of science (e.g., Champagne and Kouba, 2000). Still other reports describe the development of assessment or accountability activities or some other resource and acknowledge the NSES, but do not report on the use of their tool or how they have informed practice (e.g., Quellmalz, Hinojosa, Hinojosa, and Schank, 2000). ACCOUNTABILITY Links Between Science Standards and Accountability Systems In reviewing the research and literature from the last decade on accountability policy and practice, science education, systemic reform, and standards-based reform, we found little evidence of a direct connection between the NSES or the related AAAS Benchmarks to accountability systems developed for public education. We did, however, find strong, indirect channels linking the standards-based reform movement, the development of state standards, the increased use of assessment to measure student performance, and the emergence of accountability systems focused on improving teaching and learning. The connections between standards and accountability discussed in the research were largely generic in nature—typically non-specific with regard to subject area, and usually focused on the state level. A common policy focus and theory of action described in the research assumed a linear and sequential relationship between the standards and accountability along the following lines: first, states develop standards and design related assessments, results are then used for accountability and school improvement, which leads to improved teaching and learning. Much of the research describes how various states and districts enacted these policies and concepts, and documented whether or not the resulting accountability systems met initial expectations and purposes. None of the research provided direct evidence of the influence of the NSES or of the Benchmarks on accountability at the state or local level. Also missing was any evidence explaining the role, or lack of a role, of science performance in accountability policies, indicators, reports, or consequences. This lack of focus on science may be attributed to the fact that most accountability systems are still in the early stages of being designed and implemented, or are undergoing change to address new policies and requirements, and it is simply too soon to evaluate standards and accountability mechanisms regarding a specific subject area such as science. Despite the lack of research that would shed light on the relationship between science standards and accountability, we did find that a review of the research was informative in telling us what is currently known about accountability systems and what is missing from those systems, specifically with regard to science education. Types of Research on Accountability Systems Researchers have taken a number of approaches in their effort to create meaningful interpretations and to develop an understanding of how standards have influenced accountability systems. The types of research reviewed for this section can be divided into three categories: (1) research focused on describing the policies and history of the development of state standards and related assessment and accountability systems, (2) reports on the status of state assessment and accountability policies and practices, and (3) formative evaluations of enacted standards-based reform efforts in specific subject areas, such as mathematics or science.

OCR for page 76
Histories, Policy Studies, Concept Papers, and Case Analysis Perhaps the most direct approach to understanding the influence of the NSES on accountability is to take a historical look at the last decade of changes, which began with the introduction of standards-based educational reform. In introducing such reforms, researchers have identified critical shifts in the conceptual, developmental, and operational evolution of educational accountability systems (CCSSO, 2000a; CPRE, 1995; Elmore, Abelmann, and Fuhrman, 1996; Goertz, 2001; Council for Basic Education, 2000). Some of this research takes the form of annual reports on key policy areas—reports designed to inform policy makers and educators about the progress and changes occurring at state or district levels. Another related set of studies on accountability are grounded in a systemic reform approach that treats accountability in the broad sense of the term—i.e., accountability viewed as part of an aligned system of policies and practice. Accountability is just one of the “assumed components” of systemic reform, which also includes curriculum, instruction, professional development, assessments, school autonomy, school improvement, and support mechanisms from states and districts (Clune, 1998). At the heart of systemic reform are standards; the alignment of new standards with all the other components is deemed critical to improving the quality of teaching and learning. Systemic analysis, which is employed to research the strengths and weaknesses of reform strategies used in policy and practice, reveals how current systems evolved, what those systems currently look like, and the directions in which they will likely change as they continue to develop. Similarly, systemic analysis can be used to draw out the alignment of standards to such system components as assessment and accountability. The research studies that take this systemic approach consist of a broad array of concept papers, policy studies, and meta-analyses. These consist of in-depth case studies of specific state-, district-, or school-level systems; reviews of design and policy; and the responses at the local level to these policies. Selections of sites for these studies are usually districts, states, and schools that have placed emphasis on standards-based reform. Often, the research draws upon existing data and results from multiple surveys in a variety of states and localities, and extant studies to produce a meta-analysis that compares a variety of educational systems (Goertz, Duffy, and LeFloch, 2001; Public Agenda, 2000; Massell, 2001; DeBray, Parson, and Woodworth, 2001). Other systemic research focuses on the changes in the conceptualization of accountability policy, design, and implementation (Goertz et al., 2001; Elmore et al., 1996). These studies look at the theories driving policy and development, and how these theories may differ from those guiding enacted practices. Research on the development and direction of accountability policies, designs, and the forces that shape and change them has contributed to our understanding of science’s role in today’s accountability systems. By recreating the path from design to development through implementation of educational accountability, we can begin to understand the complexities of these continuously evolving systems. Status Reports In contrast to treating accountability as part of a comprehensive system of reform tied to the standards, another set of studies informs us more specifically, but more narrowly, about the status of accountability systems at the state and district levels. Typically, these “status” reports provide a compilation of descriptive statistics of state systems. The reports tally the extent of standards development (i.e., content standards by state and subject), document a count of current assessment features (i.e., types of assessments by grade-level and subject area), and quantify accountability practices (i.e., consequences directed toward school, principals, or students by state). Examples of such reports are the annual publications produced by the American Federation of Teachers (Making Standards Matter), Education Week (Quality Counts), the Council of Chief State School Officers (CCSSO) series on key state education policies (CCSSO, 2000a; Blank and Langeson, 2001), and the National Education Goals Panel (1996, 1998) progress reports on the National Education Goals. Formative Evaluation and Frameworks for Review More in-depth analyses of accountability systems are found in the formative evaluations conducted on the implementation of federal policies, programs, and initiatives, or as a basis for creating and field-testing a

OCR for page 76
framework for system review. Case studies of states at the forefront of educational reform such as Kentucky, Mississippi, and Maryland (Elmore et al., 1996) and schools struggling to implement new accountability systems (DeBray et al., 2001) provide detail on system design, development, and implementation at many levels. The NSF-funded Statewide Systemic Initiatives (SSIs) and Urban Systemic Initiatives (USIs) have produced a rich set of formative evaluations of the development and implementation of systemic science and mathematics interventions in states and cities (CPRE, 1995). Porter and Chester (2001) offer a framework for critiquing district assessment and accountability systems based on their work in Philadelphia, Missouri, and Kentucky. Their framework is consistent with the AERA, NCME, and APA standards on testing and the AERA position statement on high-stakes testing, as well as the NRC publication High Stakes: Testing for Tracking, Promotion, and Graduation (NRC, 1999b). Other frameworks for reviewing the effects of standards on accountability systems are provided by Elmore et al. (1996), Clune (1998), and the National Education Association (McKeon, Dianda, and McLaren, 2001). Together, these research studies and frameworks provide insight into many details of assessment and accountability systems. Unfortunately, many of these studies focus more on mathematics than on science. Only a few of the studies touch on reform efforts related specifically to science. None of the studies provide substantive information specific to the NSES influence on reform in science education and accountability. The current body of research reviewed for this synthesis provides broad information on accountability, but lacks depth and detail related specifically to science and the impact of the NSES. Research that takes a broad, systemic approach to assessing accountability helps us to learn about the conceptual, developmental, and operational changes that bear on accountability systems and their complexity. Status reports give a specific accounting of a number of important features that may or may not exist in state and district systems, and allow for some surface-level information on the role of science in those systems. A more in-depth analysis can be gleaned from formative evaluation studies; but since these studies are formative and systemic in nature, they rarely focus on science and do not track the alignment of science standards to outcomes and impact. Complexity of Accountability Systems and Research on Them Change and Variation Change and growth have marked the development of education accountability systems over the last decade; much of this evolution has occurred as more states and districts respond to the policy emphasis on standards-based reform and measurement of progress by student performance (Goertz, 2001). CPRE researchers draw attention to the shift in state accountability systems, from regulating and ensuring compliance based on district and school inputs, to accountability systems focused on student performance. They refer to these emerging systems as representing “the new educational accountability” (Elmore et al., 1996; Goertz, 2001). This shift from compliance and process to performance and proficiency has evolved with a parallel shift from district to school-level accountability (Goertz, 2001; Elmore et al., 1996; Goertz et al., 2001; Massell, 2001). Features of the new accountability include measures of student performance that are linked to standards and that focus on school improvement through systems of rewards and sanctions (Elmore et al., 1996). What is less clear from these studies is the extent to which students, schools, and districts are held accountable for student performance in science. Today’s accountability systems are a complex array of features and responses to a variety of forces, such as federal, state, and local policies and regulations (Goertz et al., 2001). These systems are characterized by variation at all levels—within and between states, and among districts and schools. Federal, state, and public pressures for reform, as well as local context and capacity, help to shape the interpretation and the diverse implementation of accountability policies and practices at all levels. Goertz et al. (2001) acknowledge the “transitory” nature of assessment and accountability systems, noting that these systems face pressures from a variety of sources, such as federal Title I legislation and state-defined targets and sanctions, necessitating continuous redesign and modification. Goertz (2001) has also found that state and district contexts make a difference in how accountability systems are interpreted, developed, and implemented. Accountability systems vary by goals, level, and standard of accountability; types of assessments; subject areas and grades tested; and indexes and rankings,

OCR for page 76
as well as by the types of rewards and sanctions that exist (Elmore et al., 1996; CCSSO, 2000a; Education Week, 2002; American Federation of Teachers, 2001). Goertz (2001) mentions three distinct types of state accountability systems: (1) public reporting systems, the most basic, (2) locally defined systems, where districts and schools define standards, planning, and performance criteria, and (3) state-defined systems, the most common type, where the state sets the goals for districts, schools, and students. Goertz found that the more autonomy a state allows local districts, the greater the variation in the accountability system. Debray et al. (2001) found that high-performing and low-performing schools often responded differently, depending on their capacity to take action on new policies and structures, and how they filtered these new policies through their own internal theory of action regarding accountability. As a result, a great deal of variation was found to exist at every level of accountability, between states, within states, and at the district and school levels. Federal Policy Implications The new emphasis on accountability for student performance is exemplified at the federal level in legislative initiatives such as Title I and IDEA, and more recently in President Bush’s “No Child Left Behind Act of 2001,” requiring national testing. Newly legislated federal policy calls for states to be more comprehensive in their assessment practices by requiring testing at every grade level from grades 3 through 8 and enforcing inclusion of special-needs and English-language learners in the assessment and accountability systems. The act targets monies to high-poverty schools and districts; increases technical assistance; specifies more rigorous evaluation and audits; requires improvements for teacher qualifications and professional development; and emphasizes improvements in reading, literacy, and language acquisition programs and student achievement. Science education is not a main focus of the legislation—assessment of science is not required of states until the 2007-08 school year. The legislation requires state accountability systems to be: (1) based on standards, (2) inclusive of all students, and (3) uniform statewide. Schools and districts must meet targets for Adequate Yearly Progress (AYP) as set forth in Title I and defined by each state. The legislation also requires that only one test be used to measure AYP in each state—the system for Title I and state accountability needs to be the same. Schools must reach state-established performance targets and demonstrate progress for each student subgroup. A single accountability system will be applied to all schools in each state, but the sanctions under Title I will be applied to Title I schools only. States will have discretion in establishing consequences for non-Title I schools. For the first time, states themselves will also be held accountable to meet AYP targets for each subgroup of students, and to demonstrate attainment of English for Limited English Proficient students. States will undergo the same type of peer review process as that currently required for districts and schools under Title I (National Council for Measurement of Education–Invited Address, 2002). While the new legislation attempts to place a new level of consistency and comparability on assessment and accountability nationwide, the tendency for states, districts, and schools to put their own spin on interpreting policies and developing local systems will make for significant challenges in the transition to the new requirements. Indeed, an Education Commission of the States report, issued in 2000, showed a great deal of variability in the states’ progress to date and in their readiness to implement the new assessment and accountability initiatives called for in the Bush plan. School Accountability Schools have become the focal point of many accountability systems. Most state accountability systems examined by Goertz et al. (2001) held schools accountable for student performance and directed consequences to the school, using a variety of monetary rewards, intervention policies, school improvement support, and technical assistance. An increasing number of districts are beginning to supplement and customize state accountability policies by (1) developing their own standards, (2) creating multiple assessments to measure student performance growth more frequently than state testing programs, and (3) creating a vast array of local rewards and sanctions aimed at school improvement, improving teacher quality, and closing achievement gaps (Council for Basic Education, 2000). This emphasis on school responsibility for improving student achievement creates local incentives for school improvement, encourages the use of data for decision making, and motivates school staff to focus on state and district goals (CBE, 2000; Massell, 2001).

OCR for page 76
Student Accountability The question of who is responsible for student performance, who is held accountable, and who bears the burden of consequences lies at the heart of the new educational accountability. While accountability systems are increasingly holding schools accountable for demonstrating improvements and progress in student achievement, the growth in assessment at all levels has also created a high-stakes environment for students. Goertz (2001) explains that early in the 1990s, state systems lacked incentives, motivation, and consequences for students to take testing seriously, especially at the secondary level. States began to introduce promotion “gate” policies and set performance standards that required students to meet or exceed target levels measured by state testing programs in order to progress to the next grade level. The reliance of states on norm-referenced standardized assessments for state- and district-level accountability purposes proved a convenient vehicle for measuring student accountability. Goertz concludes that such performance-based accountability systems are becoming the norm in standards-based reform and that, increasingly, many state and district accountability systems hold students alone to high-stakes accountability. However, a recent study presented at the American Educational Research Association Annual Meeting by researchers at the National Board on Testing and Public Policy found that of the 25 states judged to have high- or moderate-level stakes for students, all 25 states also had high levels of “regulated or legislated sanctions/decisions of a highly consequential nature based on test scores” for teachers, schools, and/or districts. Only seven states were found to have high-level stakes for students and moderate-to low-level stakes for teachers, schools, and districts (Abrams, Clarke, Pedulla, Ramos, Rhodes, and Shore, 2002). Groups such as the NEA have expressed concern about the inadequacy of accountability systems that depend on high-stakes testing, set unrealistically high expectations, and hold students and teachers accountable without providing adequate opportunities for them to learn, or sufficient resources to implement standards-based reform (McKeon et al., 2001). Science Performance and Accountability Information on the extent to which science is targeted in assessment and accountability systems and, more specifically, the role played by the NSES and the AAAS Benchmarks in those systems that can be gleaned from reviewing a wide array of “status reports” is insightful, but limited. For example, one can learn that a great deal of progress has occurred at the state level regarding the development of science standards, science course requirements, and science assessment. By 2000, 46 states had established content standards in science, 14 states had increased their graduation requirements by one or more credits in science since 1987, and 20 states required specific science courses for high school graduation (CCSSO, 2000a). While by 1999 most states had established mathematics, reading, science, and social studies standards, less than half of the states had established science and social studies standards at all three K-12 educational levels (elementary, middle, and high school) (Education Commission of the States, 2000). What these data do not reveal is whether or not science is included in state accountability systems—one can learn that students are required to take science courses, to be assessed in science, and to meet science content standards—but are students, schools, and/or districts held accountable for performance in science? The data also do not tell us what the influence or connections are between the NSES and accountability. For example, a close look at Education Week’s annual Quality Counts: The State of the States (2002) report on standards and accountability shows that 45 states have developed clear and specific standards in science, 28 states use criterion-referenced assessments aligned to state standards in science, and 42 states participate in National Assessment of Education Progress (NAEP) testing (which included a science assessment in 2001). We have found no comprehensive source of information regarding whether science performance is incorporated in public report cards; whether science performance is used to evaluate schools, and to identify and target sanctions to low-performing schools; or whether science performance is a criterion used to determine student promotion, placement, and graduation. What is needed is a comprehensive study of policies of all 50 states that would reveal the linkages between science standards, science assessment, and science accountability.

OCR for page 76
Issues and Concerns in Researching Accountability Systems We have learned from the research that the majority of educational accountability systems are characterized by variation and fluidity and defined by a variety of pressures, such as standards-based reform and demands for public and political accounting. Overall, there is an increasing emphasis on improving student learning and on raising teacher and school quality. Currently, accountability systems at all levels of the educational system are undergoing significant change. State, district, and school systems must respond to new and revised federal legislation, emerging state policies and standards, new and more comprehensive assessment programs, and local pressures to demonstrate and publicly report the condition of education in schools. This constant state of change makes it difficult for researchers to identify effective models of accountability and describe common trends, much less evaluate the impact of accountability systems. Researchers have expressed concerns about the complexities and inconsistencies that result from the different approaches to design, development, and implementation of the new standards-based accountability systems. Key concerns are directed at ensuring accountability systems that are (1) fair and equitable, (2) supported with adequate resources and professional development, (3) based on valid and reliable measures with reasonable targets for student achievement and school improvement, (4) focused on incentives and consequences that are balanced among students, teachers, and schools, and (5) understood and trusted by the public. Porter and Chester (2001) highlight some of the key complexities and inconsistencies related to phasing in and adjusting new assessment and accountability systems, while at the same time ensuring that the systems promote balanced accountability for students and schools and are both instructionally relevant and fairly implemented. These authors have developed a framework for building effective assessment and accountability systems that are based on three criteria. First, they recommend that effective accountability systems should provide good targets for schools and students that focus efforts in constructive directions, such as standards-based curriculum and well-defined performance expectations for students. Although not explicitly stated, this first criterion could incorporate science and be one means for the NSES to influence teacher practices and student learning in science. Second, they propose that effective accountability practices must also be symmetrical, with balanced responsibility for improving student performance shared among states, districts, schools, and students. Finally, the authors advise that good accountability systems are fair and equitable, with all students having opportunities to learn, appropriate supports and resources, and phased-in accountability based on multiple measures and decision consistency. Porter and Chester recommend that assessment and accountability systems be regularly evaluated, with particular emphasis on determining consequential validity. They also provide some cautions about seeking impact evidence from the systems prematurely, suggesting that these systems are still evolving. This being the case, the assessments and indicators are under continual refinement, making it difficult to research and judge true changes in instructional practice, student persistence, and student achievement. Moreover, given the wide range of reform initiatives simultaneously implemented in most districts, it is difficult to attribute improvements to accountability and assessment systems alone. These concerns and recommendations are confirmed by several other researchers. Educators attending the Wingspread Conference (CBE, 2000) supported the evidence from emerging research that standards are a prominent force for reform at every level, but that many challenges still remain to implementing standards-driven reform, including: (1) improvements in high-stakes, state-level standardized test alignment and opportunities for students to learn what is tested, (2) lack of coherent professional development to prepare teachers for the new high standards, (3) a paucity of strong leadership for reform, (4) ensuring equity and providing all students the chance to meet high standards, and (5) maintaining the public’s trust. Similarly, the National Education Association (McKeon et al., 2001) expressed concerns about the “missteps” of implementing standards-based reform, claiming that the reform expectations for education have been raised without the sufficient supports within education systems necessary to implement and achieve them. They (1) focus on the inadequacy of the accountability systems that depend on high-stakes testing, (2) advocate the use of multiple measures for promotion, placement, and graduation, (3) suggest that the alignment of standards, curriculum, instruction, and assessment be reexamined, and (4) propose a review of equity safeguards, opportunities-to-learn, and the fairness of the standards’ impact on all students.

OCR for page 76
In addition, Massell (2001) found that data used for accountability at the state, district, and school levels remain fragmented, and recommends that further professional development is needed to effectively align learning to standards and to connect data to improving classroom instruction at a deeper level. Massell also cautions against quick fixes or simplistic uses of data, or expecting data to provide a one-size-fits-all solution; she recommends further study of how data can best be utilized in accountability systems to build capacity and shed light on standards-based reform. Debray et al. (2001) raise some interesting questions about the strengths and weaknesses in how accountability systems play out at the school level. The authors challenge states to rethink their assumptions regarding how accountability policies will be interpreted and implemented at the school level. In particular, they challenge the assumption that low-performing schools will respond adequately to public pressure to improve poor performance. Low-performing schools may need assistance to align their internal accountability with the new external accountability mechanisms, such as assistance with school improvement planning, optimal use of data, incentives for motivating instructional change, and addressing feasible short-term improvement goals. Public concerns about accountability systems that involve high-stakes assessment have been portrayed widely in the popular press. These concerns center on the narrowing of the curriculum to only what is on the assessments, inappropriate pressures on students without holding teachers and schools to the same degree of accountability, the lack of validity of the high-stakes assessments to adequately measure what students should know and do, and overloading testing companies with work resulting in serious mistakes in scoring that cause students to inappropriately attend summer school or comply to other consequences. These issues have raised the profile of accountability systems, in general, and certainly point to the critical importance of the need for fair, valid, and reliable assessments. ASSESSMENT About two-thirds of the states use large-scale assessments in science, including nontraditional forms of assessments. This increase in the number of states assessing in science mainly took place prior to the release of the NSES. Over half of the states testing in science used forms of assessment other than multiple-choice items. However, about the time the NSES document was published, at least four states suspended the use of assessment that more aligned with the NSES. Between 1984 and 1999, the number of states requiring statewide testing in science more than doubled, increasing from 13 to 33. This growth was achieved mainly prior to the 1995-96 school year. During this school year, 30 states administered assessments in science at some grade level (Bond, Roeber, and Braskamp, 1997). Nearly all of these—27—states used some form of nontraditional assessments besides norm-referenced multiple-choice tests. Most of these states assessed student science performance using multiple-choice tests in grades 4, 8, and 11. Twelve states used a norm-referenced multiple-choice test and some other form of assessment, 20 used a criterion-referenced multiple-choice test, and 17 used an alternative form of assessment, including short or extended constructed-response, fill-in-the-blanks, or hands-on performance assessment (CCSSO, 2000a; 2001). In 1995-96 or before, at least four states that had used or were preparing to use performance assessments in their state assessments suspended or reduced their use—Arizona, Kentucky, Wisconsin, and Indiana (Bond et al., 1997). Cost was a major consideration in suspending the use of the alternative assessments. Just counting the number of states that assess students in science does not provide evidence of the influence of the NSES or AAAS Benchmarks. If such evidence does exist, it will most likely be found in the nature of assessment practices as used by teachers in classrooms and less likely to be found in large-scale assessments. To identify possible influences of the NSES and AAAS Benchmarks requires a deeper understanding of what science assessment is and what assessments that have been influenced by these documents look like. In the next section, we will define science assessment and describe more about what assessments are more compatible with the NRC and AAAS reform documents.

OCR for page 76
Assessment in Science Assessment in science is the comprehensive accounting of an individual’s or group’s functioning within science, or in the application of science (Webb, 1992). It is a process of reasoning from evidence that can only produce an estimate of what a student knows and can do. Any assessment process generally has five components: (1) a situation or tasks, (2) a response, (3) a scoring scheme, system, or analysis, (4) an interpretation of the score, or student response, and (5) a report of the results. The NSES influence on assessment can be experienced in any one or all five of these general components. Assessments influenced by, or consistent with, the NSES will engage students in situations that require inquiry, the construction of explanations, the testing of these explanations, and the application of science questions to new content. Students will be asked to demonstrate what they know and can do in science by responding in different ways, including recording the results of an investigation, writing, keeping a log, or collecting examples of work in a portfolio. It is critical for the assessment task or situation to elicit students’ responses that make their thinking process visible (NRC, 2001b). Students’ work may be scored in a variety of ways, including right/wrong, level of proficiency, growth over time, and depth of knowledge of important scientific ideas. Students’ writing will be analyzed on the basis of the scientific accuracy of the writing and on the quality of reasoning (Champagne and Kouba, 2000). Teachers will interpret what students do and what scores they receive in relation to cognitive models and understandings about how students learn science, develop competence in science, and use science to draw meaning about the world in which they live. Reporting results from assessments will incorporate ways for tracking students’ progress over time, giving students appropriate feedback that emphasizes learning goals derived from the NSES (NRC, 2001a), and informing instruction. If assessment is a channel through which the NSES influence teachers’ practices and then subsequently student learning, one hypothesis is that their recommendations and expectations will be represented in the different components of assessments and the context for assessments. This means that what teachers, administrators, and the public believe assessments are and believe how assessments should be used should be compatible with what is advanced by the NSES. This should be true for all purposes of gathering information on students, including making instructional decisions, monitoring students’ progress, evaluating students’ achievement, and evaluating programs. Thus, ideally the tenets of the NSES should be represented in any form of assessment, including large-scale or classroom, formative or summative, norm-referenced or criterion-referenced, high-stakes or low-stakes, or certification or self-evaluation. Assessments influenced by the NSES will be different from common forms of assessment confined to paper-and-pencil, short-answer, or multiple-choice formats, the dominant forms of assessment used by states. Assessments that fulfill the expectations of the NSES will meet the full range of the goals for science as expressed in that document and will reflect the complexity of science as a discipline of interconnected ideas (NRC, 2001a). For example, science as a way of thinking about the world, a view expressed in the NSES, should be reflected in what data and information are gathered on students to determine their growth in knowledge of the subject and how it effects their world view. An Expanding View of Science Assessment The NSES and AAAS Benchmarks were not developed in isolation and were themselves influenced by a changing view of assessment. This makes it extremely difficult to attribute assessment practices strictly to these documents. What is more reasonable is to identify assessment practices that are compatible with the NSES and AAAS Benchmarks. Coinciding with and contributing to the movement toward standards-based reform and accountability was an expanding view of the nature of knowing and learning. These developments in the learning sciences have put increased emphasis on learning with understanding that is more than memorizing disconnected facts (NRC, 2000b). Different perspectives on the nature of the human mind help to describe different forms of assessments. Traditional forms of assessment are more compatible with a differential perspective (discrimination of individual differences) and behaviorist perspective (accumulation of stimulus-response associations), whereas alternative forms of assessments represent a cognitive perspective (development of structures of knowledge) and a situative

OCR for page 76
perspective (knowledge mediated by context or cultural artifacts) (Greeno, Pearson, and Schoenfeld, 1996; NRC, 2001b). These different perspectives are not independent, but serve to provide a foundation for expanding the type of activities and situations that are used to determine what students know and can do. The perspective of knowing science as portrayed in the NSES is compatible with the more recently developed cognitive and situative models of knowing, while also recognizing the importance of facts and skills. But disentangling the influence on assessments and accountability of the NSES from the expanding views of knowing is very complex and will require very extensive research. Assessment practices that will produce information on students’ knowledge of science as expected in the NSES and the AAAS Benchmarks require the use of different techniques. The goals for student learning articulated in these documents go beyond teaching students basic facts and skills to engaging students in doing science, asking questions, constructing and testing explanations of phenomena, communicating ideas, working with data and using evidence to support arguments, applying knowledge to new situations and new questions, solving problems and making decisions, and understanding the history and nature of science. The NRC (2001a) developed a guide on classroom assessment that would be compatible with the vision expressed in the NSES. It emphasizes both informal and formal assessment practices that teachers can use that are integral to the teaching process. Drawing upon existing research, it identifies assessment practices that can inform both teachers and students about students’ progress toward achieving a quality understanding of science. For teachers to monitor students’ progress in developing inquiry skills requires that teachers observe and record students’ thinking while they do experiments and investigations. Student peer- and self-assessment strategies have been shown to be positively related to increases in student achievement and are compatible with the students doing science. Champagne and Kouba (2000) draw upon their research, the research of others, and the theory of social constructivism to make an argument for students to engage in writing as an integral part of learning activities designed to develop the understanding and abilities of inquiry. Writing as a form of discourse not only is an essential mechanism for the development of science literacy, but also it produces evidence from which inferences can be made about student learning. A critical factor for the NSES in advancing hands-on science for all students is that science assessment has cultural validity along with construct validity (Solano-Flores and Nelson-Barber, 2001). The need for cultural validity is supported by evidence that culture and society shape an individual’s mind and thinking. Solano-Flores and Nelson-Barber illustrate the point that some areas of scientific importance in some cultures are not incorporated into the NSES—e.g., body measures are important to determine which kayak would be most appropriate for which person, a very important everyday problem in many indigenous cultures. However, body-based measurement skills are not included in the NSES. The qualities that make for good assessment need to include cultural factors, along with sound scientific principles that may require going beyond what is included in the NSES document. The vision for assessment in the NSES and AAAS Benchmarks and the type of assessments needed to measure student learning as expressed in these documents are compatible with an emerging view of how students learn and what assessments should be. However, this is more a validation of these documents than evidence of their influence. There is some evidence that even these documents do not communicate all of the nuances and details needed for measuring learning for all students in all contexts. To draw these conclusions, we primarily have used conceptual papers and compared what is advanced in them with what are included in the NSES. Analyzing the alignment between assessments and standards is another technique that can be used to judge the compatibility between standards, such as the NSES and the AAAS Benchmarks, and assessments. Alignment of Standards and Assessments Central to the development of standards that drive curriculum, assessment, and learning is the concept of alignment (Linn and Herman, 1997; La Marca, Redfield, and Winter, 2000; Webb, 1997). Although the alignment of standards and assessments has been defined in different ways, there is some convergence in describing alignment of standards, assessments, and other system components as the degree to which their components

OCR for page 76
are working toward the same goals and serve to guide instruction and student learning to the same ends (La Marca et al., 2000). Alignment is not a unitary construct, but is determined by using multiple criteria. Webb (1997) identified five criteria for judging system alignment—content focus, articulation across grades and ages, equity and fairness, pedagogical implications, and system applicability. As an example that will illustrate one of these criteria, large-scale or classroom assessments that discourage students from engaging in doing investigations and formulating questions would have pedagogical implications that are not consistent with expectations advanced by the NSES or AAAS Benchmarks. In this case, there would be insufficient alignment. Generally, when educators say an assessment is aligned with a set of standards, they are referring only to content focus and most likely only to topic match. There also is some evidence that test-developers’ notion of science inquiry is different from that expressed in the NSES inquiry standards (Quellmalz and Kreikemeier, 2002). Webb (1999) has demonstrated in an analysis of two states’ standards and assessments that by using multiple criteria, a better understanding can be reached of how standards and assessments may work together. In a total of five grade levels between the two states, only two-thirds or fewer of the standards had enough items on the assessment to meet the criterion of categorical concurrence. The other standards had less than six items corresponding to these standards. In four of the five grade analyses, half or fewer of the standards had a sufficient number of items comparable to the standards on the depth-of-knowledge criterion. With respect to range, at most, only one-third of the standards had items that corresponded to at least half of the objectives under these standards. That is, a very low percentage of the content under the standards were being addressed. All of the assessments were on-demand, large-scale instruments. Although the study used state standards, there is some comparability of these with the NSES, but, as has been noted above, the state standards do not cover all of the content expectations in the NSES nor do they use formats needed to assess the full intent of the NSES. This would imply that the alignment between the NSES and these state standards would even be worse, particularly in assessing students’ abilities to do investigations and achieve an understanding of the nature of science. Some groups are engaged in developing assessment resources that are aligned with the NSES to lessen the burden on teachers and schools. SRI International has developed the Performance Assessment Links in Science (PALS) as an online, standards-based, interactive resource bank of science performance assessments (Quellmalz, Schank, Hinojosa, and Padilla, 1999). This resource bank has drawn heavily on tasks generated by the State Collaborative on Assessment and Students of the Council of Chief State School Officers for K-12 science (Roeber, 1993). Tasks in this resource bank are indexed by the NSES and for selected state and curriculum frameworks. PALS has engaged in research and evaluation to determine its usage and the likelihood of teachers to use specific performance tasks along with quality and utility judgments by educators. Findings indicate that teachers and administrators have found PALS generally easy to use and anticipate using the assessment tasks for classroom assessment and to work with other teachers (Herman, 2000). AAAS is developing a tool that can be used to analyze the alignment between items and standards, using multiple criteria (AAAS, 2001c). This tool will complement other tools that AAAS has developed to analyze curriculum. Frequently, standards and assessments have been judged to be aligned if the assessments were developed based on the standards. This is true of the National Assessment of Educational Progress (NAEP) in science. The science framework used to develop the assessment for the 1996 and 2000 administration was done concurrently with the development of the NSES. Writers of the science framework were very aware of the work on the NSES and incorporated content from the existing drafts of the NSES and the AAAS Benchmarks. Thus there was a direct influence of these standards on the NAEP assessment. However, no studies were included in the literature used in our analysis that would substantiate that the NAEP science assessment is fully aligned with the NSES. Alignment studies have found state standards that do not fully match the content knowledge students are intended to know as expressed in the state standards. Such alignment is difficult to achieve because science content and, consequently, standards are very broad and complex at any grade level. Since most assessments are restricted in what content can be tested, without extensive testing it is virtually impossible to achieve full alignment. It is not unreasonable that state standards, and by inference the NSES and Benchmarks, expect students to learn more than can be assessed on a large-scale, on-demand assessment. Alignment studies between state standards and assessments then can be used to confirm partial relationships between the NSES and AAAS Benchmarks, up to the degree these documents are represented in the state standards, but to determine if there

OCR for page 76
is full alignment requires considering the full range of assessment in an assessment system—including those used in the classroom. Influence of Assessment on Teachers’ Practices and Student Learning Assessment practices, both at the classroom level and district or state levels, do influence teachers’ practices and student learning. Black and Wiliam (1998) did an extensive meta-analysis of research on classroom assessment and student learning over a nine-year period. They concluded from the compilation of the evidence that improving formative assessment raises standards, that formative assessment still can be improved, and that information exists on how to improve formative assessment. These researchers found effect sizes of 0.4 to 0.7 in formative assessment experiments, indicating that strengthening the practice of formative assessment produced significant learning gains. They reported a corollary finding indicating that low achievers were helped more than other students through improved formative assessments. This type of assessment is very compatible with continuous assessment in the science classroom needed to teach for understanding, a very important concept in the NSES. Evaluation studies of state and district reforms have produced some suggestive evidence of the relationship between the NSES and teachers’ practices. In 1995, writing teams in Philadelphia drafted content standards based on those developed by national professional organizations (CPRE, 1997). The district chose the SAT-9, a criterion-referenced assessment, in part because this assessment was based on national standards. However, later in 1997 and after more than half of the teachers reported the assessment was not aligned with Philadelphia’s standards, the district modified the assessment to be more fully aligned. In a later study, the evaluators reported that the accountability system and assessment did drive classroom instruction by focusing teachers’ attention on the content of the SAT-9 and that this type of learning became more important in the classroom than developing challenging material. The hope that teachers would incorporate classroom-based assessments and review student work against the standards never became a high priority of the teachers (Christman, 2001). The state of Vermont received funding from the National Science Foundation in 1992 to establish a Statewide Systemic Initiative (SSI). Led by the Vermont Institute for Science, Mathematics, and Technology (VISMT), the SSI was instrumental in developing the state’s Framework of Standards and Learning Opportunities in science, mathematics, and technology. The writing team reviewed national standards and other state standards in constructing those for Vermont. The state’s science standards, released in 1995-96, closely resembled those of the NSES. VISMT worked with a commercial testing company to modify an available standardized science test so that it was aligned with the state standards. The test was piloted in 1995, with full implementation of the state assessment system to extend over a five-year period (Matson, 1998). The Philadelphia and Vermont case studies illustrate at least two situations in which local standards in science were informed by the national standards and in which the effort was made to bring existing assessments into alignment with the local standards. In Philadelphia, the assessment was reported to have exerted an influence on teachers’ practices. The implication, although not stated in the studies, is that the national standards had an influence on teachers’ practice as mediated through the assessment. How much importance a system gives to assessments is a critical factor in determining how much influence the assessment has on classroom practices and students’ opportunity to learn. This finding is supported by three studies. However, two of three studies determined this for mathematics and not science. Stecher, Barron, Kaganoff, and Goodwin (1998) conducted a multi-year research project investigating the consequences of standards-based assessment reform at the school and classroom levels in Kentucky. A random sample of about 400 teachers from the state responded to a written questionnaire on their classroom practices. Teachers were asked about current practices and change in practices over the past three years. Statistical differences between responses for teachers in low- and high-gain schools were computed, using chi-squared and t-tests. Over one-third of the elementary teachers included in the sample from Kentucky reported increasing the amount of time spent on science to four hours a week. Over half of the elementary teachers said they increased the frequency of their efforts to integrate mathematics with science. Thus, the reform, including high-stakes

OCR for page 76
testing, had resulted in more science being taught in elementary schools. In mathematics, two-thirds of the grade 8 mathematics teachers from high-gain schools reported that the National Council of Teachers of Mathematics (NCTM) Curriculum and Evaluation Standards (1989) had a great deal of influence over content and teaching strategies. This was nearly twice the percentage of the 37 percent of grade 8 mathematics teachers from low-gain schools that reported significant influence (Stecher et al., 1998). Although not for science, this finding for mathematics indicates standards can influence classroom practice. In a study of how state policies were locally interpreted, Fairman and Firestone (2001) studied grade 8 mathematics assessments in Maryland and Maine. They used an embedded case study design that looked at teachers within districts within states. The sample included two middle schools from each of two districts in Maryland and a total of six middle schools or junior high schools from three Maine districts. The two states differed in the duration of a performance assessment component in the state assessment program. In 1995-96, Maryland was in the fifth year of using these assessments and Maine was in the first year. As was the case in Philadelphia, they reported that a common view from other research was that high-stakes assessments would work against standards-based teaching, in part, by focusing teachers’ practices on test performance rather than on deep student learning. Among their findings in Maryland, they discovered that teachers who gave increased attention to test-related activity in the higher-capacity districts only engaged in instructional practices that were partially consistent with state or national mathematics standards. Teachers did conduct isolated lessons related to items on the test, and thus compatible with the standards, that included a greater emphasis on mathematics topics not previously taught. However, the teachers continued to emphasize procedural skills and factual knowledge rather than creating opportunities for students to engage in reasoning, complex problem-solving, and connecting important concepts in mathematics. Some teachers in Maine made similar changes, but not in response to state policies and more because of their lack of professional development. Fairman and Firestone (2001) conclude that a considerable effort is needed if teachers are to be expected to change from more conventional teaching to standards-based teaching. In an analysis of data from the Third International Mathematics and Science Study (TIMSS), Bishop (1998) found persuasive evidence that countries with curriculum-based external exit examinations in science showed higher performance by 13-year-olds, with an impact of 1.3 U.S. grade-level equivalents. In computing impact, the level of economic development of the countries was taken into consideration. This suggests that learning environments with some consequences attributed to assessment have a positive effect on learning. Thus there is evidence that the importance given to assessments at the state or system level does influence what teachers do in their classrooms. But even in states with high-stakes tests compatible with national standards, such as Maryland, teachers are still resistant to give up their traditional approaches for more reform practices as described in the national standards. CONCLUSIONS Accountability and assessment systems increased in importance as the NSES and AAAS Benchmarks gained greater prominence. A clear link between these science reform documents and the major shift over the past decade toward increased accountability and assessments was not found in the literature accumulated by NRC for this review. Two case studies of reform, one in a large city and the other in a state, documented that those who wrote the district and state content standards attended to the national documents including the NSES and AAAS Benchmarks. It is reasonable to infer that these cases are not unusual and that other states and districts took advantage of these documents if available at the time they engaged in developing standards. This inference is supported by the greater amount of available evidence of the influence of the mathematics standards produced by NCTM on state standards and assessments. Because the release of the NCTM Curriculum and Evaluation Standards in 1989 preceded the movement by states to develop their own standards and assessments, it is understandable that states would at least attend to these mathematics standards produced by a national professional group. It is reasonable that states would also attend to the NSES and Benchmarks over time as they revise standards and refine their accountability and assessment systems.

OCR for page 76
There was a clear trend toward an increase in accountability and the use of assessment over the 1990s. Interestingly, the increase in assessment came early in the decade and before the crescendo in the state and district accountability systems. By the end of the decade, 46 states had content standards in science, but less than half had them for all three grade ranges. Two-thirds of the states had state assessments in science, but there was some evidence that states using alternative forms of assessment more aligned with the national standards, such as performance assessment, actually declined about the time the NSES document was released. What importance states gave to student performance on science assessments in accountability systems was unavailable in the literature reviewed. Most accountability systems held schools accountable for student performance and directed consequences to low-performing schools. Of the one-half of the states that had moderate to high level of stakes attached to student performance on assessments, almost all also distributed the consequences among students, teachers, schools, and districts—a desirable trait in an accountability system. It is likely that assessment and accountability in science will continue to be given less emphasis with the new federal legislation “No Child Left Behind,” which does not require states to assess in science until the 2007-2008 school year. Determining the influence of the NSES and AAAS Benchmarks on assessments and accountability systems is confounded by a number of other initiatives and developments that coincided with the publication of these documents. The assessment practices and targets for assessments portrayed in the NSES and Benchmarks are compatible with current understandings about how students learn and how this learning can be measured. Assessment practices, such as using multiple measures or having students write about their understandings, are both consistent with teaching for understanding and teaching for inquiry as described in the NSES. Even though a clear link could not be made between assessment practices used by states and districts and the NSES and the Benchmarks, the research does provide convincing evidence that assessment practices do influence both teachers’ practices and subsequent student learning. An increase in formative assessment produces learning gains. This is significant because the emphasis in the NSES and the Benchmarks on teaching for understanding requires assessments that are integral to instruction and continuous as implied by formative assessment. In states that have given high importance to assessment scores, teachers do change their practices some, but not completely, to include more test-like activities in their teaching. However, not all state assessments are fully aligned with state standards indicating that those teachers who just “teach for the test” will likely fall short in students achieving the full expectations as expressed in the standards. The research review did not directly establish that the NSES and AAAS Benchmarks have influenced accountability and assessment systems. If this link could be established, then there is evidence that assessment and accountability systems do influence teachers’ classroom practices and student learning. Our review of the literature and the type of research used in this area did reveal some inadequacies in the available research. What is missing and is needed is a comprehensive study of policies of all 50 states that would reveal the linkages between science standards, science assessment, and science accountability. This comprehensive study should include systematic analyses of the alignment between state standards and the NSES and Benchmarks. Such a comprehensive study would provide the missing link by establishing what has been the influence of the national science standards documents with the state standards. Research also is needed to describe and analyze the full science assessment system being used in states, districts, schools, and classrooms. Such an analysis would describe the full range of content being assessed; to what depth the content is assessed; at what level within the system the content is assessed; and how the information is applied to further learning. Such a detailed analysis would attend to the different attributes of assessments including what questions are asked, what responses are elicited, how student responses are scored, how the scores are interpreted, and what is reported. We also did not find any studies related to college placement examinations, another area for other research. Accountability systems have not stabilized and are still undergoing significant change. These systems also are extremely complex. It is not surprising that definitive research has not been done on how accountability and assessment systems fully work and how these systems are influenced by documents such as the NSES and AAAS Benchmarks. What is clear is the increasing importance these policy components have in education. It is no longer sufficient for science educators who are most interested in the curriculum and the content to ignore the policy arena. Research that bridges and enlightens the relationship between content standards and policy is essential.