Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 19
4
Some International Examples
The primary focus of the steering committee's efforts was to find examples
of the many forms that an assessment program built around improving learning
can take. The committee looked at programs in seven states, several international
examples, and three programs developed by researchers: the Berkeley Evaluation
and Assessment Research assessment model, Facet-Based Assessment, and
Model-Based Assessment. Presenters for each of these programs were asked to
discuss not only the goals and characteristics of their programs, but also the ways
in which the programs exemplify the criteria the committee had identified. They
were also asked to talk about problems and obstacles they had encountered, as
well as successes they believed they had achieved and methods of securing
evidence of their results. In this chapter, the examples from abroad are discussed.
The notion of gaps between different elements and goals of the educational
system may not have been as much on the minds of education officials in other
countries, but the assessment systems in several countries nevertheless seem to
have much to offer the discussion in the United States. Two different Australian
systems, for example, offer interesting ways of thinking about alignment and
coherence. Studies from Great Britain demonstrate a way teachers can use as-
sessments to help students make progress in their learning, while the Interna-
tional Baccalaureate program shows the role that teachers can play in a widely
dispersed system. The Organisation for Economic Co-operation and Develop-
ment's (OECD) Programme for International Student Assessment (PISA) dem-
onstrates one way in which diverse constituents can focus on the material that is
most important to assess.
19
OCR for page 20
20
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
AUSTRALIA
At the national level, Australia has built a large-scale assessment on the basis
of a preexisting framework, or "map of progress," that outlined the knowledge
and skills students should develop. Geoff Masters, chief executive officer at the
independent, nonprofit Australian Council for Educational Research (ACER),
explained to workshop participants that the resulting system developed almost by
happenstance, yet has many interconnected and mutually supporting parts.
The original framework took the form of a detailed matrix showing levels of
competence in different aspects of each subject area. In English, for example, the
first subject for which a framework was developed, descriptions of competence
in reading, writing, listening, speaking, and viewing were developed. The frame-
work describes eight different levels of competency for each skill and is designed
to cover the years of compulsory schooling.
ACER recognized that teachers needed some guidance in monitoring stu-
dent progress along the framework. On its own initiative, ACER developed an
assessment resource for teachers that they could use in making their own assess-
ments of how children were progressing in terms of the framework. The resource
kits, which were sold to schools around the country, included activities and
materials and a range of assessment methods to be used individually and with
groups of students.
When the national government later decided to conduct a national survey of
primary children's literacy skills, to obtain data similar to that provided in the
United States by the National Assessment of Educational Progress, ACER sub-
mitted a proposal to develop an assessment based on the model they had already
devised for the teachers' assessment kits. Government officials agreed to adopt
the ACER model, thus establishing a national assessment system that relied on
teachers to conduct and score the assessments.
A number of means of ensuring consistency and fairness were built into the
system. First, as with the original resource kits, the assessment supplied guide-
lines and scoring rubrics. A group of experienced external assessors trained and
monitored teachers in the use of the assessment methods. These assessors also
visited schools and monitored a subset of the assessments as they were con-
ducted. Second, all the student work generated for assessment purposes was
collected for further monitoring at a central office in Melbourne. The work was
sampled and, where discrepancies were found, rescored.
The initial assessment was successful, yielding results for nearly 9,000 stu-
dents in the third and fifth grades. Student performance was shown in terms of
their progress along the matrix; the relative performance of socioeconomic sub-
groups was also shown.
Unfortunately, as Masters explained, the national government was surprised
in the end to find no indication of how many students had "passed." Since the
assessment was designed only to show how far groups of students had progressed
OCR for page 21
SOME INTERNATIONAL EXAMPLES
21
through the stages identified in the matrix, no cutpoints had been identified for
either grade. However, ACER was able to go back and conduct a standard-
setting exercise to determine what minimum level of competency in reading and
writing should be expected at each grade. Pass rates could then be determined
retroactively, and although the results turned out to be controversial, the exercise
demonstrated the adaptability of the assessment system for the accountability
purposes that are particularly important to policy makers and politicians.
QUEENSLAND
Richard Shavelson, professor of education and psychology at Stanford Univer-
sity, described for the audience the somewhat different situation in the Australian
state of Queensland, whose system he has studied. There the state had for many
years relied on a set of "A-level" examinations prepared by the University of
Queensland, similar to those used in Great Britain, both to determine how well
students were prepared for college study in different subjects and as an element in
the college selection process. In 1970-1971, concern began to mount that the
exams were too difficult and were the cause of an undesirable narrowing of the
curriculum. Queensland decided to replace the A levels with formative assess-
ments that would more directly address students' needs, and then to build on
those to obtain summative information about student performance that would be
of value beyond the classroom.
In essence, as Shavelson explained, Queensland officials decided to develop
"a system for auditing the local implementation of curriculum and assessment
and accountability." Teachers and local schools are responsible for both curricu-
lum and assessment and their work is monitored to ensure that it is consistent
across the state and meets standards for quality. An infrastructure was set up to
accomplish the monitoring, which includes a Board of Senior Secondary Studies,
which set the syllabi the essential goals for content, cognitive skills, and
domain-specific skills for each subject and the general methods for conducting
assessments. The board is also responsible for moderation of scores, a process by
which teachers' scores are calibrated with one another to achieve consistency
across classes and schools. Below this board, a series of district-level content
panels in each of the A-level subjects provides more direct support to schools and
teachers. Each school is then free to develop its own two-year, A-level curricu-
lum in each subject, as well as a culminating exam. The exams are scored
according to a Queensland-wide, five-point, domain-referenced scale, and
moderated.
Thus, schools and teachers are given a considerable amount of both direction
and latitude. They use formative and summative assessments throughout the two
A-level years, based on guidelines provided by Queensland, using both kinds
(and students are always aware of the purpose of a particular assessment) to help
students understand in detail the expectations they are striving to meet. To
OCR for page 22
22
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
Shavelson, the key to the system' s apparent success over thirty years is the very
close link made between the curriculum and the content of the assessments.
To American eyes, one striking aspect of both Australia's national assess-
ment system and the Queensland model is the degree to which each, in its way,
accords significant value to the judgments of teachers about their students. In
these systems, teachers have many different opportunities for training and devel-
opment to improve the knowledge and skills they need to play a key role in the
assessment program. They can become involved in development and scoring of
assessments (as are many of their counterparts in the United States), and receive
the trust necessary to develop evaluative assessments of students on their own.
GREAT BRITAIN: ENHANCED FORMATIVE ASSESSMENT
Dylan Wiliam of King's College, London, described efforts in Great Britain
to focus closely on the ways teachers can use assessments to help students make
progress in their learning. He began by describing an overview of approximately
250 studies that explored the effectiveness of a formative classroom assessment
(also sometimes called assessment for learning) in which clear evidence of a
positive effect on learning was found. Specifically, Wiliam explained, when
teachers provide students with clear feedback that gives them guidance on the
steps they need to take to improve, students progress at a greater rate than they do
in response to other kinds of feedback.
Wiliam also described a study in which a group of twenty-four mathematics
and science teachers were asked to develop their use of formative assessment
with one class in several specific ways: by making greater use of higher-order
questioning, providing task-involving rather than ego-involving feedback, devel-
oping the use of peer- and self-assessment strategies, and exploring the use of
summative tests for formative purposes (Black, Harrison, Lee, Marshall, and
Wiliam, 2002~. For each class, the local class that could best be used as a control
was identified so that any improvements in learning could potentially be mea-
sured, and in this study as well evidence of a positive effect was found.
While the methods sound simple allowing a longer wait time while stu-
dents consider how to answer a question, for example Wiliam stressed the
importance not of the methods themselves, but of the insights into how students
learn that led to them. The idea, he explained, is to initiate students into a culture
of learning in which they not only take responsibility for their learning but are
supported in the steps they need to take to progress. At the same time, teachers'
capacity to make useful inferences about their students are enhanced, just as their
opportunities to use these inferences are increased (Black et al., 2002~.
OCR for page 23
SOME INTERNATIONAL EXAMPLES
THE INTERNATIONAL BACCALAUREATE (IB)
DIPLOMA PROGRAMME
23
The International Baccalaureate (IB) Diploma Programme offered workshop
participants an additional way to think about the role of teachers in assessment.
George Pook, head of assessment for the International Baccalaureate Organisation,
explained that the IB was developed to provide a common curriculum for stu-
dents around the world, as well as a grading system that would be recognized and
understood by colleges and universities around the world. Thus, consistency is
very important to the success of the program, but at the same time there is a need
to entrust considerable responsibility to widely dispersed schools and teachers.
The IB uses a variety of assessment strategies for summative purposes. For
example, students must complete an extended essay on a topic of their own
choosing at the end of the program, which is scored centrally. Examinations may
include tasks ranging from multiple-choice questions to full-length essays, as
appropriate for each subject. Oral presentations are also required in language
subjects, and these are scored by teachers using criteria supplied by the IB pro-
gram. All of the results are reported in terms of a seven-point scale that is linked
to defined levels of performance that program administrators try to keep consis-
tent from year to year as well as across participating schools around the world,
who of course work in different languages. The points on the scales describe
content and skills, and the scoring is intended only to indicate how well students
have mastered them, not to spread students out for comparative purposes.
Internal, teacher-generated assessments play a significant role in the pro-
gram for both formative and summative purposes. Teacher-generated assess-
ments address a different range of subject matter and skills than the IB-generated
assessments do. The two types are intended to complement one another in
creating an overall measure of a student's achievement. Teachers' ongoing
formative assessments are viewed as opportunities for students to see how they
are progressing along the criteria defined in the seven-point scale. Released test
questions, rubrics, and student work are all used to provide this feedback. Many
IB teachers serve as external assessors for other schools, and also have opportu-
nities to review and revise the curricula in their disciplines. All IB teachers
receive support in the form of resource materials, workshops, and an online
curriculum center. Moderators are available to give teachers feedback on their
internal assessment methods, as well as their assignments and their grading.
PROGRAMME FOR INTERNATIONAL STUDENT ASSESSMENT
(PISA)
The OECD, which was formed as part of the Marshall Plan after World War
II, is composed of thirty nations, all of which are democratic market economies.
As Barry McGaw, director for education at OECD, explained at the workshop, a
OCR for page 24
24
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
primary function of the OECD is to collect data in a number of policy areas, and
in the late 1980s the organization began a process of upgrading its statistical work
in education, with the particular goal of ensuring that the data used to represent
national systems become more comparable. While the OECD had been using
data regarding educational outcomes supplied by the International Association
for the Evaluation of Educational Achievement for a number of years, it began to
gather data of its own in the mid-199Os through PISA. The focus is on summative
data that can be used be make useful comparisons among the member nations.
The primary initial goal for PISA was, as McGaw explained, "to estimate the
yield of national education systems," and he acknowledged that this is a grand
ambition. Yield is an economic concept not generally used in the study of
education, but it led the developers of PISA to focus on what students can do with
what they have learned, and thus avoid the difficulty of identifying the material
that had been covered in common across many countries. Thus PISA assesses the
"literacy" of fifteen-year-olds in reading, mathematics, and science. They use a
variety of measurement approaches multiple-choice questions as well as open-
ended short questions and written pieces, but the assessments are not intended to
be used for formative, classroom purposes.
McGaw provided some examples of the kinds of questions that can be con-
sidered using PISA data, using tables and graphs, for example, to show how the
member countries vary in terms of the balance they achieve between equity and
quality. He also showed graphically that countries vary considerably in terms
both of how much spread they have between their lowest and highest performing
students, and also in terms of how much of that spread occurs within schools and
how much occurs across schools. Probing that question even deeper, he pre-
sented a table that broke down the variation that occurs across schools according
to whether it was intended that is, the result of deliberate tracking of students
into academic or vocational programs, for instance or unintended. Data such as
these, McGaw explained, are very useful for helping countries see that there are
alternatives to the way they are structuring their education systems. South Korea,
for example, has been remarkably successful at achieving both high quality and
high equity; it has the lowest degree of spread among high- and low-performing
students, while overall performance is high.
Although PISA does not fit particularly well with the criteria laid out by the
committee, McGaw noted that it does offer formative possibilities in a system
context. Denmark, he noted, has found that though it spends among the largest
amounts per students, its average student performance figures are quite low. As
a consequence, the ministry of education is working to make the system operate
more efficiently and improve student performance. Doing so, of course, implies
that it is confident that the constructs measured by PISA are genuinely important,
even though they are not directly linked to the curriculum taught in Denmark or
any other country.
OCR for page 25
SOME INTERNATIONAL EXAMPLES
25
It is in this sense that PISA' s experience might be most useful to educators
looking for ways to bridge the gaps. The process of developing PISA was an
extensive effort to build a framework that defined a reasonable set of expecta-
tions for fifteen-year-olds in each of the domains. International groups drew on
assessments from around the world and worked through cultural and language
differences to come up with two versions of the test, one in English and one in
French, that represented their best effort to assess what is really important for
fifteen-year-olds to be able to do. McGaw suggested that any concepts that got
past the double translations and other reviews, field tests, differential item func-
tioning (DIF) analyses,] and other screens were likely to be truly key concepts.
He does not believe that PISA focuses mostly on what is easy to assess, rather
than what is important, and does believe that it assesses understanding and rea-
soning, not factual recall.
1DIF analyses flag test questions that perform differently for a particular subgroup of test takers
than for the group as a whole. Thus, for example, if students in one country, or those who are native
speakers of a particular language, have difficulty with a question for cultural reasons rather than
because of their skill with its content, it can be identified so that it need not count against them.
Representative terms from entire chapter:
international baccalaureate