Read "Improving Indicators of the Quality of Science and Mathematics Education in Grades K-12" at NAP.edu

« Previous: Appendix A: Colloquium on Indicators of Precollege Science and Mathematics Education: Participants

Page 175 Cite

Suggested Citation:"Appendix B: Review of Science Content in Selected Student Achievement Tests." National Research Council. 1988. Improving Indicators of the Quality of Science and Mathematics Education in Grades K-12. Washington, DC: The National Academies Press. doi: 10.17226/988.

Page 176 Cite

Page 177 Cite

Page 178 Cite

Page 179 Cite

Page 180 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix B Review of Science Content in Selected Student Achievement Tests Given the many criticisms of achievement tests, the committee wished to have better information on the quality of the science con- tent of frequently used tests to assess student achievement in science. At a time when achievement test scores have frequently been cited as evidence of declining educational quality in schools, a review of the subject content in science tests appeared to be a potentially useful and important step toward the committee's formulation of recom- mendations on how to improve indicators of the condition of science and mathematics education. Two objectives of our review distinguish it from other test re- views. First, we were concerned only with the science content of tests, not the statistical reliability or discriminating power of the items or the test. Second, the review was not designed to produce an evaluation of any particular test or type of test instead, it was designed to provide information on the quality of the science content found in a variety of achievement tests. Rather than reviewing just one test, reviewers assessed and compared several tests to develop a general picture of the state of science content in achievement tests. Two primary criteria were used in selecting tests for review: (1) tests of national importance due to the way their results are being used or because they serve as models for other tests and (2) tests that illustrate major variations in purpose and approach so as to provide for a broad assessment of the science content being tested and allow 175

176 APPENDIX B for examination of any difference in content by test purpose. To keep the size of the project within manageable bounds and to provide some test comparability, the age/grade level was limited. Nine tests were selected for review including nationally used norm-referenced tests, state curriculum-based tests, and national and international assessment tests. Table B-1 provides a list of the tests. A multidisciplinary pane! of 12 scientists and science teachers was selected to conduct the test review (the pane! list appears at the end of Appendix B; two of the individuals listed did not re- view tests individually but wrote general comments). The panel was constituted so as to combine the perspectives of people from different science fields and different professional positions- college professors, research scientists, and secondary school science teach- ers. Two specialists in cognitive learning processes who have studied science education and testing were also included in the group. The test review process was planned to have two stages, individ- ual ratings of tests and a subsequent meeting for group discussion. In the first stage, each test was reviewed by one physical scientist, one life scientist, one cognitive scientist. two teachers. and a sixth · ~. ~. ~ 7 · reviewer trom one ot these cate~c~r~es to al low for r*.omn~.ri.~nn~ of test reviews by type of reviewer. vidual test item and then analyzed each test as a whole. The test items were rated according to two criteria: (1) importance the re- viewer's assessment of how important the knowledge being tapped is for a student and (2) adequacy-how adequately the item tests that knowledge, given the purpose of the test. Several patterns emerged from the ratings: The reviewers rated each indi . The scientists in the group were more critical of the science content of the tests than were most of the teachers. One explanation for this difference might be that scientists expect greater quality of science content in the tests than do teachers. Another possible explanation is that the teachers are more familiar with these tests, as well as with other achievement tests, and do not see as many problems in the actual use of the items. The science teachers were more critical of the norm-referenced tests than of the other types of tests. The teacher reviewers seemed to find more problems with this type of science test than with the tests used for national assessments or curriculum-based tests. . There appeared to be a relationship between the science field and the item ratings of a reviewer. The two biologists rated the New York State Regents biology test lower than any other reviewers.

APPENDIX B TABLE B-1 Science Tests Selected for Review and Average Student Scores on Each Test Test High School and Beyond (HSB) National Assessment of Educational Progress: 13-year-olds (NAEP-13) National Assessment of 60.0 Educational Progress: 17-year-olds (NAEP-17) California Assessment Program (CAP) Comprehensive Tests of Skills (CTBS) Tests of Achievement and Proficiency (TAP) International Association for the Evaluation of Education Achievement (IEA) New York State Regents: Science (NYSR-ES) New York State Regents: Biology (NYSR-BIO) 177 Average Number Percentage of Correct Items 46.5 20 Comments Science portion of test; score is for national sample of 1980 10th-grade students 52.4 77 Scores are for 1981 test given to a national sample; no scores were available for ~1985-1986 test that was 56 J reviewed 53.8 52.5 40 53.3 60 (M) 64.7 90 (F) 58.3 77.1 105 74.7 103 1984-1985 field test of 1,650 items given to over 10,000 California 8th-grade students; average score over six different categories of questions 1982 norm for end of 9th-grade score at 50th percentile of all students taking the test 1982 norm for spring 9th-grade score at 50th percentile of all students taking the test; no norm was available for 1985-1986 test that was reviewed 1983 test; score is for U.S. sample of 9th-grade students 65 percent correct is Earth minimum passing score; 79.8 percent of 37,175 students passed in June 1984 65 percent correct is minimum passing score; 72.8 percent of 114,068 students passed in June 1984

178 APPENDIX B Possibly the biologists could find more problems with the items due to greater knowledge and familiarity with the current state of the field. The second stage of the review process consisted of a group discussion of the nine tests among the group of reviewers. For this purpose, a two-day meeting was held at the National Academy of Sciences. The meeting had three components: discussion of the item ratings and qualitative test analyses by the reviewers, identification of common findings concerning the science content in the nine tests, and outlining of the characteristics of good science tests. The major outcome of the meeting was the development of some qualitative conclusions on the current state of science testing and suggested improvements that should be pursued. Differences in average ratings between the tests were relatively small compared with the variability between the reviewers. However the science test reviewers reached four general conclusions: ~ The nine science achievement tests typically cover broad con- tent areas, and the content is generally appropriate for the intended grade level; however, a majority of the tests are weak in testing core science concepts and depth of student knowledge. ~ Five to ten percent of the items on each test include inaccurate or misleading science statements that decrease the usefulness of the test results. . The tests vary widely in the quality and balance of items intended to test different types of skills, that is, factual knowledge, concepts, science processes, reasoning, and problem solving. in, ~ . ~ the format, language, and structure of science tests strongly affect the usefulness of test results for educational and assessment purposes. Based on its discussions, the group identified characteristics of high-quality science tests according to testing purpose. For national, state, or local assessment: . Assessment items should be based on a sampling of the ideal or desired curriculum in the subject area. . Items should focus on central concepts for the course or grade level. ~ Given the identification of the core subject matter to be cov- ered, the test should be designed from a matrix of desired learning

APPENDIX B 179 objectives, consisting of elements of the subject knowledge base clas- sified by the types of desired skills. ~ A few items should offer new ways of thinking about a con- cept or solving a problem and provide topics for teachers to use in subsequent instruction. Test results should be reported to local test users, for ex- ample, administrators, teachers, parents, and students, in relation to the matrix of objectives so as to increase the educational use of assessment results. For rank-ordering of student performance: The test should be designed to assess knowledge that is closely related to the reason for the ranking. . There should be less stratification of students by test perfor- mance, because often it is based on misuses of small differences in test results. . For diagnosis and guiding instruction: . Diagnostic tests should be written with a real-world orien- tation, that is, without subject-specific jargon and terminology, and they should include samples of different kinds of science experience the student may have had and science ideas the student may under- stand. ~ Time allowed for conducting a diagnostic test is an important design variable, because some students do not perform well under time constraints. ~ Test results should be reviewed item by item rather than as an overall test score. Since a test can sample only a limited portion of the total knowledge of a student, performance on individual items rather than on the test as a whole should be used to assess student knowledge for purposes of diagnosis. As they employ diagnostic tests, teachers should prepare ad- ministrators, parents, and students to understand the meaning of test results and carefully explain how they will be used. ~ The use of achievement tests for diagnosis and improving in- struction could be advanced if testing were less dependent on meth- ods involving only paper and pencil. Alternative technologies for diagnostic testing in science need to be further developed. . The results of research in cognitive science and other educa- tional research should be used in test development.

180 APPENDIX B The group also made the following suggestions to avoid the misuse of tests: ~ Results from tests constructed for one purpose, for example, rank-ordering of student performance, should not be used for a quite different purpose, for example, assessing instructional quality. ~ School or classroom average test scores should not be applied to individuals, and individual test scores should not be interpreted as a rating or ranking of the persons, but only of performance on a test that assesses specific skills. Test results or tests of the kind reviewed should not be used as the major force driving curriculum and instruction. . SCIENCE TEST REVIEW PANEL Marshall S. Smith (Chair), Stanford University (education measurement, and evaluation) Andrea diSessa, University of California, Berkeley (cognitive science) Rachel Egan, Orchard Ridge Middle School, Madison, Wisconsin (science teacher: eighth grade) Joyce Gelthorn Greene, Boulder High School, Boulder, Colorado (science teacher: biology) Henry Heikkinen, University of Maryland (chemistry) Jack Lochhead, University of Massachusetts (cognitive science) Lucy McCorkle, Cardozo High School, Washington, D.C. (science L' teacher: chemistry) Jose Mestre, University of Massachusetts (physics/cognitive science) James Minstrell, Mercer Island High School, Mercer Island, Washington (science teacher: physics) Philip Morrison, Massachusetts Institute of Technology (physics) Phylis Morrison, Cambridge, Massachusetts (elementary science teacher) Wayne Moyer, Franklin Institute Science Museum and Planetarium, Philadelphia, Pennsylvania (biology) David Policansky, Commission on Life Sciences, National Research Council (biology)

Next: Appendix C: Summaries of Meetings with Representatives of State and Local Education Agencies »

Improving Indicators of the Quality of Science and Mathematics Education in Grades K-12 (1988)

Chapter: Appendix B: Review of Science Content in Selected Student Achievement Tests

Welcome to OpenBook!

Get Email Updates