Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 175
Appendix B
Review of Science Content in Selected
Student Achievement Tests
Given the many criticisms of achievement tests, the committee
wished to have better information on the quality of the science con-
tent of frequently used tests to assess student achievement in science.
At a time when achievement test scores have frequently been cited as
evidence of declining educational quality in schools, a review of the
subject content in science tests appeared to be a potentially useful
and important step toward the committee's formulation of recom-
mendations on how to improve indicators of the condition of science
and mathematics education.
Two objectives of our review distinguish it from other test re-
views. First, we were concerned only with the science content of
tests, not the statistical reliability or discriminating power of the
items or the test. Second, the review was not designed to produce
an evaluation of any particular test or type of test instead, it was
designed to provide information on the quality of the science content
found in a variety of achievement tests. Rather than reviewing just
one test, reviewers assessed and compared several tests to develop a
general picture of the state of science content in achievement tests.
Two primary criteria were used in selecting tests for review: (1)
tests of national importance due to the way their results are being
used or because they serve as models for other tests and (2) tests that
illustrate major variations in purpose and approach so as to provide
for a broad assessment of the science content being tested and allow
175
OCR for page 176
176
APPENDIX B
for examination of any difference in content by test purpose. To keep
the size of the project within manageable bounds and to provide
some test comparability, the age/grade level was limited. Nine tests
were selected for review including nationally used norm-referenced
tests, state curriculum-based tests, and national and international
assessment tests. Table B-1 provides a list of the tests.
A multidisciplinary pane! of 12 scientists and science teachers
was selected to conduct the test review (the pane! list appears at
the end of Appendix B; two of the individuals listed did not re-
view tests individually but wrote general comments). The panel
was constituted so as to combine the perspectives of people from
different science fields and different professional positions- college
professors, research scientists, and secondary school science teach-
ers. Two specialists in cognitive learning processes who have studied
science education and testing were also included in the group.
The test review process was planned to have two stages, individ-
ual ratings of tests and a subsequent meeting for group discussion.
In the first stage, each test was reviewed by one physical scientist,
one life scientist, one cognitive scientist. two teachers. and a sixth
· ~. ~. ~
7
·
reviewer trom one ot these cate~c~r~es to al low for r*.omn~.ri.~nn~ of
test reviews by type of reviewer.
vidual test item and then analyzed each test as a whole. The test
items were rated according to two criteria: (1) importance the re-
viewer's assessment of how important the knowledge being tapped is
for a student and (2) adequacy-how adequately the item tests that
knowledge, given the purpose of the test. Several patterns emerged
from the ratings:
The reviewers rated each indi
.
The scientists in the group were more critical of the science
content of the tests than were most of the teachers. One explanation
for this difference might be that scientists expect greater quality
of science content in the tests than do teachers. Another possible
explanation is that the teachers are more familiar with these tests,
as well as with other achievement tests, and do not see as many
problems in the actual use of the items.
The science teachers were more critical of the norm-referenced
tests than of the other types of tests. The teacher reviewers seemed
to find more problems with this type of science test than with the
tests used for national assessments or curriculum-based tests.
. There appeared to be a relationship between the science field
and the item ratings of a reviewer. The two biologists rated the
New York State Regents biology test lower than any other reviewers.
OCR for page 177
APPENDIX B
TABLE B-1 Science Tests Selected for Review and Average Student Scores on
Each Test
Test
High School and
Beyond (HSB)
National Assessment of
Educational Progress:
13-year-olds (NAEP-13)
National Assessment of 60.0
Educational Progress:
17-year-olds (NAEP-17)
California Assessment
Program (CAP)
Comprehensive Tests of
Skills (CTBS)
Tests of Achievement and
Proficiency (TAP)
International Association
for the Evaluation of
Education Achievement (IEA)
New York State Regents:
Science (NYSR-ES)
New York State Regents:
Biology (NYSR-BIO)
177
Average Number
Percentage of
Correct Items
46.5 20
Comments
Science portion of test;
score is for national sample
of 1980 10th-grade students
52.4 77 Scores are for 1981 test
given to a national sample;
no scores were available for
~1985-1986 test that was
56 J reviewed
53.8
52.5 40
53.3 60
(M) 64.7 90
(F) 58.3
77.1 105
74.7 103
1984-1985 field test of
1,650 items given to over
10,000 California 8th-grade
students; average score over
six different categories of
questions
1982 norm for end of
9th-grade score at 50th
percentile of all students
taking the test
1982 norm for spring
9th-grade score at 50th
percentile of all students
taking the test; no norm was
available for 1985-1986 test
that was reviewed
1983 test; score is for U.S.
sample of 9th-grade students
65 percent correct is Earth
minimum passing score; 79.8
percent of 37,175 students
passed in June 1984
65 percent correct is
minimum passing score; 72.8
percent of 114,068 students
passed in June 1984
OCR for page 178
178
APPENDIX B
Possibly the biologists could find more problems with the items due
to greater knowledge and familiarity with the current state of the
field.
The second stage of the review process consisted of a group
discussion of the nine tests among the group of reviewers. For this
purpose, a two-day meeting was held at the National Academy of
Sciences. The meeting had three components: discussion of the item
ratings and qualitative test analyses by the reviewers, identification
of common findings concerning the science content in the nine tests,
and outlining of the characteristics of good science tests. The major
outcome of the meeting was the development of some qualitative
conclusions on the current state of science testing and suggested
improvements that should be pursued.
Differences in average ratings between the tests were relatively
small compared with the variability between the reviewers. However
the science test reviewers reached four general conclusions:
~ The nine science achievement tests typically cover broad con-
tent areas, and the content is generally appropriate for the intended
grade level; however, a majority of the tests are weak in testing core
science concepts and depth of student knowledge.
~ Five to ten percent of the items on each test include inaccurate
or misleading science statements that decrease the usefulness of the
test results.
. The tests vary widely in the quality and balance of items
intended to test different types of skills, that is, factual knowledge,
concepts, science processes, reasoning, and problem solving.
in, ~ .
~ the format, language, and structure of science tests strongly
affect the usefulness of test results for educational and assessment
purposes.
Based on its discussions, the group identified characteristics of
high-quality science tests according to testing purpose.
For national, state, or local assessment:
. Assessment items should be based on a sampling of the ideal
or desired curriculum in the subject area.
. Items should focus on central concepts for the course or grade
level.
~ Given the identification of the core subject matter to be cov-
ered, the test should be designed from a matrix of desired learning
OCR for page 179
APPENDIX B
179
objectives, consisting of elements of the subject knowledge base clas-
sified by the types of desired skills.
~ A few items should offer new ways of thinking about a con-
cept or solving a problem and provide topics for teachers to use in
subsequent instruction.
Test results should be reported to local test users, for ex-
ample, administrators, teachers, parents, and students, in relation
to the matrix of objectives so as to increase the educational use of
assessment results.
For rank-ordering of student performance:
The test should be designed to assess knowledge that is closely
related to the reason for the ranking.
.
There should be less stratification of students by test perfor-
mance, because often it is based on misuses of small differences in
test results.
.
For diagnosis and guiding instruction:
. Diagnostic tests should be written with a real-world orien-
tation, that is, without subject-specific jargon and terminology, and
they should include samples of different kinds of science experience
the student may have had and science ideas the student may under-
stand.
~ Time allowed for conducting a diagnostic test is an important
design variable, because some students do not perform well under
time constraints.
~ Test results should be reviewed item by item rather than as
an overall test score. Since a test can sample only a limited portion
of the total knowledge of a student, performance on individual items
rather than on the test as a whole should be used to assess student
knowledge for purposes of diagnosis.
As they employ diagnostic tests, teachers should prepare ad-
ministrators, parents, and students to understand the meaning of
test results and carefully explain how they will be used.
~ The use of achievement tests for diagnosis and improving in-
struction could be advanced if testing were less dependent on meth-
ods involving only paper and pencil. Alternative technologies for
diagnostic testing in science need to be further developed.
. The results of research in cognitive science and other educa-
tional research should be used in test development.
OCR for page 180
180
APPENDIX B
The group also made the following suggestions to avoid the
misuse of tests:
~ Results from tests constructed for one purpose, for example,
rank-ordering of student performance, should not be used for a quite
different purpose, for example, assessing instructional quality.
~ School or classroom average test scores should not be applied
to individuals, and individual test scores should not be interpreted
as a rating or ranking of the persons, but only of performance on a
test that assesses specific skills.
Test results or tests of the kind reviewed should not be used
as the major force driving curriculum and instruction.
.
SCIENCE TEST REVIEW PANEL
Marshall S. Smith (Chair), Stanford University (education
measurement, and evaluation)
Andrea diSessa, University of California, Berkeley (cognitive
science)
Rachel Egan, Orchard Ridge Middle School, Madison, Wisconsin
(science teacher: eighth grade)
Joyce Gelthorn Greene, Boulder High School, Boulder, Colorado
(science teacher: biology)
Henry Heikkinen, University of Maryland (chemistry)
Jack Lochhead, University of Massachusetts (cognitive science)
Lucy McCorkle, Cardozo High School, Washington, D.C. (science
L'
teacher: chemistry)
Jose Mestre, University of Massachusetts (physics/cognitive science)
James Minstrell, Mercer Island High School, Mercer Island,
Washington (science teacher: physics)
Philip Morrison, Massachusetts Institute of Technology (physics)
Phylis Morrison, Cambridge, Massachusetts (elementary science
teacher)
Wayne Moyer, Franklin Institute Science Museum and Planetarium,
Philadelphia, Pennsylvania (biology)
David Policansky, Commission on Life Sciences, National Research
Council (biology)
Representative terms from entire chapter:
achievement tests