Under the No Child Left Behind Act of 2002 (NCLB), states must develop challenging standards in science and assess students’ achievement of those standards. The assessment requirement for science takes effect in the 2007–2008 school year, so states have an opportunity to carefully develop their response to the law’s requirements.
The National Science Foundation, recognizing the importance of this opportunity, asked the National Research Council (NRC) to form a committee to help states prepare for the implementation of the law. The Committee on Test Design for K–12 Science Achievement was charged with two tasks: (1) providing advice and guidance and making recommendations that will be useful to states in designing, developing, and implementing quality science assessments to meet the 2007–2008 implementation requirements of the No Child Left Behind Act; and (2) fostering communication and collaboration between the NRC committee and key stakeholders in states and schools in order that the guidance provided by the NRC committee’s report is responsive and can be practically implemented.
In conducting its study, the committee followed the fundamental position of the National Science Education Standards: science literacy should be the goal for K–12 science education. An essential element of science literacy is a strong foundation in the content knowledge of the life, physical, earth, and space sciences. It is also critically important for students to understand science as a specific way of knowing and to develop the skills necessary to both understand and appropriately apply the strategies of scientific inquiry. The states and the designers of assessments need to incorporate these fundamental aspects of science literacy in designing science assessments for NCLB.
This report is intended as a guide for states in making decisions about assessment to meet the NCLB requirements and in planning more broadly for assessment as a tool for supporting student learning. The committee recognizes that each state has its own goals for science education and assessment. This report, therefore, provides guidance that is specific enough to address the important issues raised by NCLB science requirements, but general enough to be adaptable to a wide range of contexts. The committee’s advice to states is offered in the form of questions that all those responsible for designing and implementing state assessment programs should ask themselves as they develop science assessments. These questions are intended to focus state decision makers on important issues that need to be addressed as assessments are developed, implemented, and used. The questions appear throughout the report and are included in their entirety in Chapter 9. They are not included in the executive summary, which instead summarizes the findings that underlie the questions.
Although the science assessments that are developed to meet NCLB requirements will constitute but a small fraction of the science assessment that is conducted in schools across the nation, they are likely to exert a powerful influence on science curricula and instruction. It is therefore very important that the effects of states’ NCLB science assessments be thoroughly explored before they are introduced and become mandatory.
High-quality science standards are central to science education and assessment. They are the way that states articulate their goals for student learning and focus the attention of teachers, students, parents, and all others concerned with education on what students should know and be able to do. Content standards serve as the basis for developing curricula, selecting textbooks, setting instructional priorities, and developing assessments. Achievement standards make clear what information will be accepted as evidence that students have achieved the standards and how competence is defined.
Content standards should be clear, detailed, and complete; reasonable in scope; rigorous and scientifically correct; and built around a conceptual framework that reflects sound models of student learning. They should also describe examples of performance expectations for students in clear and specific terms so that all concerned will know what is expected of them. The committee found that although some state standards reflect many of these criteria, no current state standards meet all of them.
States should regularly review and revise standards documents at least every 10 years. Revisions to content standards documents should be mirrored by changes in curriculum, curricular materials, assessments, and instruction. In turn, ongoing teacher professional development will be required to ensure that the changes in the standards are reflected in classrooms and schools.
State standards should be organized and elaborated in ways that clearly specify what students need to know and be able to do and how their knowledge and skills will develop over time with instruction. Learning progressions and learning performance are two strategies that states can use in organizing and elaborating their standards to guide curriculum, instruction, and assessment. Learning progressions are descriptions of successively more sophisticated ways of thinking about an idea that follow one another as students learn: they lay out in words and examples what it means to move toward more expert understanding. Learning progressions should be developed around the organizing principles of science such as evolution and kinetic molecular theory. Such organizing principles—which are sometimes referred to as the big ideas of science—are the coherent foundation for the concepts, theories, principles, and explanatory schemes for phenomena in a discipline. Organizing standards around big ideas represents a fundamental shift from the more traditional organizational structure that many states use in which standards are grouped under discrete topic headings. A potentially positive outcome of a reorganization in state standards from discrete topics to big ideas is a shift from breadth of coverage to depth of coverage around a relatively small set of foundational principles and concepts. Those principles and concepts should be the target of instruction so that they can be progressively refined, elaborated, and extended over time.
Creating learning performances is a strategy for elaborating on content standards by specifying what students should be able to do if they have achieved a standard. Learning performances might indicate that students should be able to describe phenomena, use models to explain patterns in data, construct scientific explanations, or test hypotheses. A clear understanding of how students can demonstrate that they have attained a standard allows assessment developers to create items and tasks that are directed at these skills and provides teachers with targets for instruction. This approach helps build coherence between what is taught and what is tested.
Assessment, which includes everything from classroom observations to national tests such as the National Assessment of Educational Progress, is a systematic process for gathering information about student achievement. It provides critical information for many parts of the education system, including guiding instructional decisions, holding schools accountable for meeting learning goals, and monitoring program effectiveness. Assessment is also a way that teachers, school administrators, and state and national education policy and decision makers exemplify their goals for student learning.
Although assessment can serve all of these purposes, no one assessment can do so. To support valid inferences, every assessment has to be designed specifically to serve its purpose. An assessment that is designed to provide information
about students’ difficulties with a single concept so that it can be addressed with instruction would be designed differently from an assessment that is to provide information to policy makers for evaluating the effectiveness of the overall education system. The former requires that students’ understanding of a concept be tested deeply and thoroughly; the latter requires that the assessment cover broadly all of the topics deemed important by education policy makers. Results from either of these assessments would not be valid for the purposes of the other.
Assessment, by itself, cannot improve student learning—it is the appropriate use of assessment results that can accomplish that goal. Thus, the committee concluded that states should think about assessment in the context of the education system in which it functions. Assessment is one of a number of elements—which include curriculum, instruction, professional development, fiscal, and other resources—that interact in the classroom, school, school district, and state and that together support student learning. To serve its function well, assessment must be tightly linked to curriculum and instruction so that all three elements are directed toward the same goals. Assessment should measure what students are being taught, and what is taught should reflect the goals for student learning articulated in the standards. Thus, all of the elements in the education system have to be built on a shared vision of what is important for students to know and understand about science, how instruction affects that knowledge and understanding over time, and what can be taken as evidence that learning has occurred.
A SYSTEM OF ASSESSMENT
The committee concluded that a single assessment strategy would not, by itself, meet the requirements of NCLB. The committee therefore recommends that states develop a system of science assessment that can meet the various purposes of NCLB and provide education decision makers with assessment-based information that is appropriate for each specific purpose for which it will be used. The system should be comprised of a variety of assessment strategies, designed in ways that are fundamentally different from each other and which collectively would meet NCLB requirements. In particular, the law states that assessment must:
be fully aligned with state standards;
meet accepted professional standards for validity, reliability, and fairness for each purpose for which it will be used;
be reported to parents, teachers, and administrators in ways that are diagnostic, interpretive, and descriptive so that the results can be used to address individual students’ academic needs; and
be reported in ways that provide evidence that all students in the state, regardless of race, ethnicity, economic status, or proficiency in English, are meeting the state’s challenging academic standards.
The system that each state develops in response to NCLB will vary according to the state’s goals and priorities for science education and its uses for assessment information. For example, a state might choose to develop a single hybrid test in which students take a core assessment that provides individual results along with an assessment with a matrix-sampling design that provides information about the achievement of groups of students across a broad content domain. Or a state might choose to combine standardized classroom assessments that provide diagnostic, descriptive, and interpretive information with an external assessment of progress that all students are making toward achieving state standards. Or a state may decide to eschew a statewide test and opt instead for one of many different models in which results from local, district, or state assessments are combined, aggregated, and reported for specific purposes.
Similarly, a single assessment strategy cannot provide all of the information that education decision makers in classrooms, schools, school districts, and states need to support student learning. Teachers need ongoing information on how well their students are learning so they can target instruction; students need timely feedback on how they are meeting expectations so they can adjust their learning strategies; districts need information on the effectiveness of their programs; and policy makers need to know how well their policies are working and where resources might best be targeted. Addressing all of these needs for assessment-based information requires multiple assessment strategies, each designed to serve its own specific purpose. These multiple assessment strategies should be designed from the beginning to function as part of a coherent system of assessment.
A successful system of standards-based science assessment is coherent in a variety of ways. It is horizontally coherent: curriculum, instruction, and assessment are all aligned with the standards; target the same goals for learning; and work together to support students’ developing science literacy. It is vertically coherent: all levels of the education system—classroom, school, school district, and state—are based on a shared vision of the goals for science education, of the purposes and uses of assessment, and of what constitutes competent performance. The system is also developmentally coherent: it takes into account how students’ science understanding develops over time and the scientific content knowledge, abilities, and understanding that are needed for learning to progress at each stage of the process.
DEVELOPING AND SUPPORTING A COHERENT ASSESSMENT SYSTEM
Coherent assessment systems do not develop by accident; they must be deliberately designed so that all of the measures work together both conceptually and operationally. To ensure coherence, states should develop a master plan for their assessment system, in which they clearly specify its purposes and the individual assessments that are needed to serve those purposes. The plan should document the constructs each assessment will measure; the ways in which the results of each
assessment are to be used; who will be tested; where each component will be administered, and by whom; who is responsible for developing the component; when the assessment will be administered; and how the results will be scored, combined, and reported for specific purposes.
States should establish a system of interacting advisory groups that are in place before system design begins or as early in the process as possible. One of the advisory groups should advise the state about the technical measurement issues associated with a testing program; other groups should focus on the content areas that are part of the assessment program. Science content committees should include scientists, science educators, researchers who study science assessment, and individuals with expertise on how people learn science. There should be some overlapping members of the content and technical groups or structured interactions between them.
Reporting Assessment Results
The reports of assessments are a critical element of a coherent system. How and to whom results will be reported are questions that should be considered during the first stages of designing an assessment system because the answers will guide almost all subsequent decisions about assessment design.
Information about students’ progress is needed at all levels of the education system, and reporting practices must meet the needs of parents, teachers, school and district administrators, policy makers, the public, and, of course, the students themselves. However, not all of these groups need the same information, and reports should be tailored to meet the needs of different users.
For assessment to function well, each of those who play a part in the interpretation and use of assessment results needs to have an understanding of assessment, the state’s goals for assessment, the ways different assessments function, and how to interpret and use assessment results appropriately. Those who need the opportunity to develop their understanding of how assessment works range from students to elected officials to curriculum developers, but teachers are the group with the greatest need for understanding assessment.
Teachers play a pivotal role in the education system. The decisions that they make, the ways in which they interact with students, and their appropriate use of assessment affect the knowledge and attitudes that students acquire. Teachers cannot cultivate a deep conceptual understanding among their students unless they themselves have such understanding. A strong grounding in science subject matter knowledge as well as subject-specific pedagogical knowledge is fundamental to good teaching and assessment.
Teachers need to be able to use a variety of classroom assessment strategies and tools such as observation, student conferences, portfolios, performance tasks,
rubrics, and student self-assessment. They must also understand the uses and limitations of external assessment and be cognizant of the ways in which such assessment affects their teaching.
Professional development strategies that involve the evaluation of student work are one important means for increasing teachers’ understanding of assessment and for helping them to deepen their own understanding of science. In-service professional development opportunities, which schools and districts use for many different purposes, are not sufficient to provide teachers with the skills they need in order to use and understand assessment effectively. The committee therefore calls on colleges and universities that prepare teachers to include in their curricula courses on educational measurement that are both general and specific to science. Such courses should include information on the uses and limitations of state tests and on new and emerging assessment methods. In-service professional development could then build on this knowledge by including opportunities for teachers to refine or learn about and practice new assessment strategies.
Because the course requirements for teacher preparation programs are largely set by state licensure requirements, the committee calls on states to include in their standards for certification and recertification a provision that teachers demonstrate assessment competence as a condition for teacher licensure.
Opportunity to Learn
Excellence in science education embodies the idea that all students can achieve science literacy if they are given the opportunity to learn. Students will achieve understanding of science concepts in different ways and at different depths of understanding and at different rates of progress, but opportunity to learn implies that all students have the chance to the maximum extent possible. NCLB reflects this goal and mandates the interpretation of test-based information in ways that may highlight discrepancies in opportunity to learn among different groups of students, schools, and school districts within a state. Therefore, schools and school districts need to implement curricula and instructional approaches for all students that are aligned with both content and performance standards. States need to actively monitor and evaluate the effectiveness of schools’ and school districts’ efforts to provide all students with a sufficient opportunity to learn science. School-level data on the opportunity to learn will be critical in helping states to ensure that science education is accessible to all students.
The fairness of assessments and the validity of results depend on both the extent to which students have had the opportunity to learn the skills and material that are assessed and the use of assessments that are unbiased and accessible to a wide range of students with different abilities and disabilities. If students do not have the opportunity to learn the material or to demonstrate their knowledge in the context of appropriately designed assessments, it is impossible to know whether the results shed light on aspects of the curriculum, instructional strate-
gies, or students’ efforts or abilities, or whether they simply indicate that students have not been given a chance to learn what is being assessed or that the assessments are somehow not tapping into what they know in appropriate ways.
NCLB requires that all students, including students with disabilities and English language learners, participate in state accountability programs, and states are required to provide appropriate accommodations to these students. However, the effects of accommodations on test performance and on the inferences that can be made from test results are not well understood. As states make decisions about how to assess students’ science literacy, they will need to consider the needs of English language learners and students with special needs and the challenges of devising technically sound accommodations for them. They will also need to consider the extent to which students with disabilities and English language learners have had an opportunity to learn the material covered by an assessment. These issues are particularly salient for states that make use of innovative assessment methods, for which there is little research about the effects of accommodations.
The allocation of time and money is an element in virtually every decision that education officials make. The assessment of science learning has resource implications for states and schools that could far outstrip the actual costs of the assessments themselves. New assessments may reveal inadequacies in the existing science education program in a state, as well as inequities in science education across schools and school districts. Such findings may trigger legal requirements to address inequities. As a state raises the stakes, the demand for high-quality science education may also increase. Financial incentives may be needed to encourage qualified science teachers to enter teaching or to remain in schools that serve disadvantaged students. Assessments also can reveal exemplary practices that contribute significantly to increased student learning: resources should be set aside to disseminate and implement these practices.
Monitoring and Evaluation
For an assessment system to achieve its goals, those responsible for it need to continuously monitor and periodically evaluate its effectiveness. NCLB holds states, districts, and schools accountable for student performance; it is equally important that they be held accountable for the quality, utility, and consequences of their assessment systems. States and districts should have a detailed plan for evaluating how well the assessment system is working, whether it is accomplishing its goals, and whether there are unanticipated effects. At the same time, states
and districts need plans for continually refining their policies and procedures in response to evidence.
While the focus of this report is to provide advice and guidance to states, the committee recognizes that states cannot do all that is required on their own. Below we describe some important ways that scientists, science educators, professional societies, granting organizations, the federal government, and education policy organizations can assist states in their efforts to design, implement, and evaluate science assessment systems. The text below summarizes the text of the recommendations to these individuals and groups. The complete set of recommendations is contained in Chapter 9 of the report.
In its recommendations to others, the committee calls on federal granting agencies and others to support with funding and expertise the design and validation of prototype science assessment systems on which states could base their own efforts. These prototypes should include systems in which information that is used for accountability purposes is gathered in classrooms, as well as at the district and state levels. We also call on funding agencies to support research programs that can help states to develop and refine procedures for determining alignment, reliability, accuracy, fairness, and validity of assessment systems that are comprised of multiple measures and for setting achievement levels when multiple assessment strategies are used. Because the assessment of inquiry will be a key component in most states’ science assessments, the committee recommends that expertise and funding also be provided to help states address issues related to the development, validation, and implementation of appropriate assessments of students’ understanding and application of inquiry skills.
Standards are the heart of a science assessment system and we call on the U.S. Department of Education to take an active role in assuring that every state has high-quality standards. We recommend that it require every state to have an independent body evaluate the quality of its science content standards and procedures for developing and setting achievement levels. We recommend that the results of these evaluations be made public and that they be included in any review process that the Secretary of Education uses for evaluating and certifying compliance with key NCLB provisions.
The research base on which high-quality assessment systems should rest is incomplete. We call on the research community to propose and conduct studies on the ways in which students’ understanding of the big ideas of science develop over time and the ways in which students represent their understanding of these ideas as they develop competence. Results of this research should be used to help states develop state science standards and create valid assessments of students’ understanding of key scientific concepts as such understanding develops and changes over time.