The No Child Left Behind Act of 2002 (NCLB, Public Law 110-107) extends the accountability provisions of the 1994 reauthorization of the Elementary and Secondary Education Act (Improving America’s Schools Act) to all public schools and districts in states that receive federal Title I funds.1 NCLB has two primary goals: improving student achievement overall and narrowing the achievement gap between students of different backgrounds. These goals are to be achieved by means of strong accountability measures for schools and districts and the imposition of sanctions on those that cannot demonstrate that their students are making adequate yearly progress in meeting challenging standards of academic achievement.
NCLB moves beyond the 1994 law both because it affects all public schools and districts and because it includes science in its requirements for standards and assessments. By including science in the requirements, Congress has signaled to the American public that science literacy is a national priority and schools should ensure that all students leave public education with the scientific knowledge, skills, and understandings that are necessary to be scientifically literate citizens.
NCLB REQUIREMENTS FOR SCIENCE
NCLB requires that all states must have challenging academic content and achievement standards for science in place by 2005–2006. They must begin measuring student attainment of those standards in 2007–2008 with assessments that are fully aligned with the standards and that meet accepted professional standards for technical quality for each purpose for which they will be used. The law further specifies that states’ assessment systems must include multiple up-to-date measures of student achievement, including measures that assess higher order thinking skills and understanding of challenging content. Science assessments are to be administered annually to all students, including those with disabilities and those who are not fluent in English, at least once in each of three grade bands, 3–5, 6–9, and 10–12. At present, they need not be included in the calculation of adequate yearly progress that is used to monitor states’ progress toward NCLB goals. States are required to make reasonable accommodations for students with disabilities and limited English proficiency to allow them to participate in the assessments, and they must have in place alternate assessments for students who cannot participate in the regular assessment even with accommodations.
In recognition of the decentralized nature of public education governance in the United States, as well as of the differences in states’ circumstances and priorities, the legislation allows some flexibility in meeting the law’s requirements. States may choose to include in their assessment systems either criterion-referenced assessments, augmented norm-referenced assessments, or both (assessments that support only norm-referenced interpretations are not acceptable).2 Assessment systems, which can take many forms under NCLB, may be comprised of a uniform set of assessments statewide or a combination of state and local assessments. However, regardless of the form that the assessment system takes, the results must be reported publicly and be expressed in terms of the state’s academic achievement standards. The results must be reported in the aggregate for the full group of test takers and be disaggregated for specified population groups and provide information that is descriptive, interpretive,3 and diagnostic at the individual level. Box 1-1 includes excerpts from the assessment provisions of NCLB as they relate to science; they are referenced throughout this report.
Although NCLB requires states and districts that receive Title I funds to participate in the biennial state-level assessments in reading and mathematics conducted under the National Assessment of Educational Progress (NAEP), no such requirement for science is in place as this report goes to press. Thus, partici-
Criterion-referenced tests are those that report student performance in terms of a defined body of skills and knowledge, while norm-referenced tests are those that report performance in terms of comparisons with the performance of groups of similar students. Both are discussed further in Chapter 5.
Interpretive results provide guidance on what the results mean.
pation in state-level NAEP in science remains voluntary. Nonetheless, the committee sees the potential for the NAEP science assessment framework, which is currently under revision, to exert an indirect but important influence on the content of state science assessments and curricula.
The Committee’s Charge
Recognizing the challenges that states face in meeting NCLB requirements for the design and development of science assessments,4 the National Science Foundation (NSF) asked the National Research Council (NRC) to form a committee to contribute in the following ways to the national effort:
provide guidance and make recommendations that will be useful to states in designing, developing, and implementing quality science assessments to meet the 2007–2008 implementation requirement of the No Child Left Behind Act; and
foster communication and collaboration between the NRC committee and key stakeholders in the states and in schools so that the guidance provided by the committee’s report is responsive and can be practically implemented in states and schools.
The Committee on Test Design for K–12 Science Achievement was established, and this report is the result of our research, collaborations, and deliberations in response to this charge. Because states and localities across the nation vary widely in their goals and approaches to assessment and to science education, the advice in this report is targeted to policy makers and practitioners at a level that is specific enough to address the important issues raised by NCLB science requirements, yet adaptable to a wide range of contexts.
In their initial discussions with the committee, the sponsors urged members not only to address the specific requirements of NCLB, but also to consider the design and development of high-quality science assessment more broadly. The committee was asked to consider the work of two earlier NRC committees, the
Subpart 1—Basic Program Requirements
SEC. 1111. STATE PLANS
(b) ACADEMIC STANDARDS, ACADEMIC ASSESSMENTS, AND ACCOUNTABILITY
(3) ACADEMIC ASSESSMENTS
(c) ACADEMIC STANDARDS, ACADEMIC ASSESSMENTS, AND ACCOUNTABILITY
(5) STATE AUTHORITY
SOURCE: P.L. 107-110, No Child Left Behind Act of 2002, Title I Part A, Subpart I, Basic Program Requirements, Section 1111, State Plans.
Committee on the Cognitive Foundations of Assessment (National Research Council, 2001b) and the Committee on Assessment in Support of Instruction and Learning (National Research Council, 2003). Both of these committees called for the creation of balanced assessment systems that are supported by the larger education system and are based on what is known about how people learn and gain expertise in a specific domain of knowledge. These ideas provided the foundation for the committee’s thinking.
In this report, the term assessment is used to mean a process for collecting information that can be used for a variety of purposes—for example, to exemplify the state’s learning goals, to categorize the achievement of individual students, to provide the basis for instructional decisions or decisions about resources, or to monitor and evaluate the success of instructional programs. High-quality assessment is critical to science education because it is both the way in which states exemplify the goals for science education embodied in the standards and a major source of the information that states use in making important decisions about education.
Based on our review of relevant research and extensive practical experience with the design of assessment programs, the committee decided to take a systems approach in thinking about the nature and role of science assessment in education. This approach explicitly recognizes that the elements that make up the education system are independent but also interrelated and interacting, so that changes in one element necessarily create changes in others. Indeed, this is the premise on which NCLB is based—set high standards, implement assessments aligned to those standards, hold schools and districts accountable for the assessment results, and use the improvement of assessment results as a lever to foster changes in curriculum and instruction in ways that will lead to better student outcomes.
Many of the points made in this report may apply equally well to assessment in other areas. The measurement principles that have guided the committee’s thinking about science assessment could guide assessment in other domains as well. However, there are aspects of science as a discipline—the abstract nature of many of the concepts that students are expected to learn and the emphasis on scientific inquiry and investigation in many state standards, for example—that present specific challenges for assessment. Thus, to design high-quality science assessment, states will need to focus on both the general precepts of sound educational measurement and the features that are unique to science assessment. The report as a whole presents goals for states to consider in developing science assessments that meet high technical standards and are tailored to the demands of science as a discipline, but much of the discussion has a wider application.
Gathering the Evidence
The committee used many sources of information to prepare this report. We looked for evidence in the scientific and professional literature on science assessment and on science education and in policy reports on the implementation and effects of NCLB. We reviewed the body of work that has been done on science assessment by scientific disciplinary societies, such as the American Association for the Advancement of Science, the National Science Teachers Association, the American Chemical Society, the American Physics Society, and others. We examined state science standards and considered others’ evaluations of both science standards and science assessments that are currently being used in states, even though it was clear, in the context of the looming NCLB deadlines, that these things would be changing quite rapidly. We also relied on the work of earlier NRC committees that synthesized research on how people learn, what is known about the cognitive foundations of assessment, and the uses and potential of technologies for assessment. We considered at length many reports and analyses of science curricula, textbooks, and instructional approaches that have been conducted by Project 2061 (American Association for the Advancement of Science) and NSF-supported curriculum and instructional projects.
To ensure that our advice would be practical and responsive to states’ concerns, we also relied on the experience of experts who have had responsibility for testing programs in states and districts, as well as others with relevant practical experience. The committee formed and collaborated with three working groups, consisting of state assessment directors, state-level science supervisors, and science teachers. We relied heavily on their experiences and knowledge in considering the design of science assessment systems. Biographical information about the working group members appears in Appendix C.
In order to base our conclusions on a broad understanding of the possible conceptual models for the design of assessment systems, the committee asked four design teams to develop plans for science assessment systems that would
meet the requirements of NCLB, but also move beyond them in ways they thought most likely to improve students’ science learning. Each had a specific focus, selected to be consistent with different approaches that states may use in the design of their science assessments. These models are summarized in Chapter 2 and additional information about the design teams and their work appears in Appendix B.
The committee also asked two additional teams of experts to develop designs for assessments that would reflect current research on the ways in which students learn and represent knowledge in a given domain. These teams were made up of scientists, science educators, cognitive scientists, and teacher educators (see Appendix B). Using research on children’s learning, they developed learning progressions to depict the ways in which students might acquire knowledge over time, as well as ways in which that knowledge might be assessed. The models developed by these teams are summarized in Chapter 5.
The committee held a workshop at which representatives of education and policy organizations discussed the challenges related to science assessment facing legislatures, governors, chief state school officers, school administrators, school boards, teachers, and others. A second workshop provided the committee and design teams with stakeholders’ reactions to the model science assessment system designs described above. Discussions at the workshop helped the committee to conceptualize some of the important issues states would face in implementing any of these proposed designs.
Finally, the committee commissioned several papers to develop greater depth of understanding on particular topics. These papers addressed a range of topics: an analysis of frequently used procedures for gauging the alignment of assessments with standards; advances in the roles of technology in assessment systems; international approaches to science assessment; and the ways in which science assessments can be vertically scaled to better represent students’ achievement over time (see Appendix B). The papers as well as those written by the design teams are available at the committee’s web site at www7.nationalacademies.org/BOTA/Test_Design_K-12_Science.html.
ABOUT THIS REPORT
While research suggests principles to guide the development and operation of assessment systems and provides some guidance to states in choosing among available options, the design of assessment systems is not an exact science and has not been thoroughly researched. Therefore, the committee’s advice to states is in some cases based on our combined judgment and the experiences of our working group members. Although the research base is not complete, the range of ideas from which states can benefit is growing, as more states implement innovative
approaches to assessment. Nevertheless, additional research on the design, implementation, validity, and uses of assessment systems that reflect new thinking on the ways in which students develop scientific knowledge, skills, and understanding is needed. We call attention to specific areas of need throughout the report, but note that in many cases research-based methodologies for accomplishing the particular goals discussed in this report have not yet been developed. We acknowledge that we are calling on states to consider some ideas that are not yet supported by a well-developed empirical base, as well as some that have been tried only in relatively confined settings. However, taken together, the existing research literature and the innovative work that has been done in some states provide key means of meeting the challenges of developing NCLB science assessments that are technically sound and support high-quality science education.
In carrying out our charge, the committee did not make recommendations about the science content that should be included in state science standards or represented in assessment, because we view standards as a state responsibility. For this report we turned to the standards that have been developed by the National Research Council (1996) and the American Association for the Advancement of Science (1993) as a good starting place, noting that these too could be improved to make them more useful in the design of curriculum materials and assessments. While we lay out a process for developing and criteria for evaluating the quality of science content and performance standards, and we recommend that states use these criteria to review their own science standards, we did not systematically evaluate existing state standards, because doing so would require a deep understanding of individual states’ goals and purposes and would have been outside the scope of our charge.
There is no single science test that would be equally useful in all the states and territories affected by the NCLB requirements, and the committee did not try either to develop a model assessment system or to break new ground in assessment development. Instead, we examined how what is currently being done could be improved in the context of an assessment system. The committee chose not to recommend that states include or not include particular item types in their assessments or to provide exemplary items for states to emulate. We recognize that no individual item or test would be universally admired, and we note that while the improvement of items and tests is important, it will require more than that to improve the quality and utility of science assessment more generally.
Finally, the committee thinks that the results of science assessment should be publicly reported in a timely manner to all interested stakeholders, along with all other data about student achievement. However, we did not take a stance on future decisions to include or exclude science assessment results from accountability decisions, including measures of adequate yearly progress, as is required by NCLB for reading and mathematics; this policy decision is beyond the committee’s charge.
The report begins with a discussion of the nature of an assessment system. This is followed by a discussion of the goals for science literacy and the insights provided by research on learning that shed light on how students’ understanding of important science concepts can be assessed. Subsequent chapters address the nature and structure of science standards and the design of assessments that reflect the foundations that have guided the committee. The report continues with an examination of strategies for building, operating, and supporting an assessment system and with discussions of issues related to fairness and adequacy. Chapter 8 covers monitoring and evaluation of the system to ensure that it is functioning as intended. The report closes with a discussion of the ways in which the federal government, agencies that fund research, researchers, and others can assist states in their efforts to improve science assessment.
Three appendixes complete the report. Appendix A is a set of practical tips that came from our discussions with participants in the state assessment process. Appendix B is a list of the background papers and design teams that provided valuable information to aid the committee in its work. Appendix C contains biographical sketches of the committee, staff, and working group members.
Using This Report
A major goal for this study was that the committee’s report be practical and be presented in such a way that individual elements can easily be considered and implemented by states. The report is also intended to be useful to states that are at different stages in designing and modifying their science assessment programs. Some states, for example, may already have developed high-quality science standards that meet the criteria laid out in Chapter 4, but they may find in other chapters suggestions for improvements they have not considered. Moreover, most states are not in a position to rethink completely the ways in which they assess students in science, but are more likely to view the process of change and improvement as ongoing. While the report underscores the importance of considering the assessment system as a whole, states might begin by targeting their areas of greatest need and using some of the ideas contained in the report to do so.
For each of the major topics addressed, the committee presents a set of questions that states can use to review elements of their systems for science education and assessment and consider aspects they may want to change. The committee’s overarching recommendation to states is that they think carefully about the issues raised by these questions and consider the extent to which their assessment system attends to them.