Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 107
6
Synthesis and Policy Implications
J
oan Herman, chair of the steering committee and discussant at the
workshop, posed two questions: Should we assess 21st century skills?
If so, do we know how to do it?
In response to the first question, she said her answer was a whole -
hearted “yes.” In her view, all of the workshop presentations demon-
strated the importance of these skills. Beginning with Richard Murnane’s
presentation that highlighted the critical relationships between these skills
and labor market outcomes to presentations by Nathan Kuncel, Stephen
Fiore, and Rick Hoyle, speakers emphasized the need for these skills to
function well in today’s society. One after another, each presenter made a
case for the need for students to be well rounded in their abilities to think
critically; problem solve; interact effectively with others; and manage
their own learning, emotions, and development. To Herman, it would be
a disservice to students and society at large to focus schooling solely on
narrow academic content while neglecting the broader aspects of develop-
ment. 1 More important than simply assessing the skills, Herman noted,
we should be integrating the assessment and teaching of 21st century
skills with academic content. As she put it, “This should not be something
added on to what teachers are already required to do, but should be part
of their routine practice for building academic knowledge.”
But, do we know how to assess 21st century skills? Herman’s answer
to this question was that it depends on the kinds of skills. With respect to
1 For additional discussion about breadth of instruction, see Bok (2006) and Lewis (2006).
107
OCR for page 108
108 ASSESSING 21ST CENTURY SKILLS
cognitive skills, Herman thinks we know how to assess problem solving
embedded in content, as Kuncel was arguing for. She noted that we also
know how to develop assessments that require students to apply their
knowledge, to evaluate evidence, and to perform other critical thinking
and analytical reasoning tasks. There appear to be rich learning models on
which to base these assessments, she added, but evaluating higher-order
thinking skills has not received the attention it might have over the past
few years.
With respect to some of the interpersonal and intrapersonal skills
discussed at the workshop, she was somewhat more hesitant, but she
said her hesitancy was in relation to the purposes and uses for the assess -
ments, not the relative importance of the skills. She noted these days
the word “assessment” has come to mean only large-scale, summative,
accountability assessment, and, in her judgment, many of the measures
of interpersonal and intrapersonal skills are clearly not ready to be used
for this purpose. As she put it, “The long research histories in each area
give rise to any number of measures for assessing individual constructs,
but measures that are suitable for summative accountability purposes are
few and far between.” Assessments can serve many purposes, however.
For teachers, she pointed out, assessments are most useful if they pro -
vide information that can be used for formative purposes, to help make
instructional decisions on a day-to-day basis. Some of the measures of
interpersonal and intrapersonal skills seem to be well suited for this pur-
pose or for purposes that involve small-scale administration.
As part of this discussion session, presenters and audience members
raised a number of issues with regard to strategies for assessing 21st
century skills, particularly the skills classified as interpersonal and intra -
personal. This chapter provides a synthesis of some of the main points
raised by steering committee members and workshop participants and
closes with a discussion of the implications for policy and strategies for
moving forward.
REFLECTIONS ON ASSESSMENT STRATEGIES
Naming the Skill, Defining the Constructs
One point that arose repeatedly over the course of the workshop
was the issue of labeling and defining the skills—from the name given to
21st century skills in general to the specific definitions of the constructs.
Together, the collection of 21st century skills are sometimes referred to
as “noncognitive” skills, a term to which several participants objected
because all of the skills require some sort of cognition. These skills are
sometimes referred to as “soft skills,” a term that some participants dislike
OCR for page 109
109
SYNTHESIS AND POLICY IMPLICATIONS
because it seems to downplay their importance. Others quibbled with the
term “21st century skills” because it implies the skills were not needed in
the 20th century and appears not to recognize that more than a decade of
the 21st century has already passed. Thus, there is an issue with terminol-
ogy at the broadest level.
There were also concerns expressed about placing these skills into
three clusters (cognitive, interpersonal, and intrapersonal), as the commit -
tee had done. Some workshop participants pointed out it is misleading to
imply the clusters of skills are independent and mutually exclusive. For
instance, all of the skills included within the interpersonal and intraper-
sonal skills require cognition. That is, it is impossible to perform skills
such as collaboration, complex communication, or self-regulation without
using cognition. Likewise, intrapersonal skills and interpersonal skills are
interdependent. For instance, self-management skills certainly come into
play when participating in a collaborative task. The committee’s classi -
fications were useful for the purposes of structuring the workshop, but
there are issues with implying that the clusters are discrete and unrelated.
At a finer level, there are also issues with defining the constructs sub-
sumed under the three broad categories identified by the committee. Stephen
Fiore addressed this in his remarks in relation to interpersonal skills, noting
“there is a proliferation of concepts associated with interpersonal skills, and
it is problematic because we have different labels that may be describing
the same construct, and we have the same label that may be describing a
different construct.” For example, with regard to interpersonal skills, terms
like social competence, soft skills, social self-efficacy, and social intelligence
may all be used to refer to the same skills, or they may each refer to a dif-
ferent set of capabilities. Likewise, in discussing intrapersonal skills, Rick
Hoyle pointed out the lack of consensus in the field with regard to defin-
ing skills like self-regulation. There is little agreement among researchers,
he said, and sometimes the same researcher defines it differently within a
single paper.
Settling on terminology for this set of skills and definitions for the
constructs needs to be done before assessments can be developed. As
Hoyle described this need in relation to self-regulation, “the current state
of the conceptualization of self-regulation is the primary obstacle to pro-
ducing assessments of it.” Defining the skills in a clear and precise way is
fundamental to development of assessment tasks and essential for ensur-
ing that the resulting scores support the intended inferences.
Validity, Reliability, and Authenticity
Another issue highlighted by workshop participants was the extent to
which assessments of these skills are trustworthy and have fidelity. This
OCR for page 110
110 ASSESSING 21ST CENTURY SKILLS
concern is essentially about reliability and validity: that is, do the assess -
ments provide accurate results that support the intended inferences? The
discussion centered around a number of issues related to reliability and
validity, such as if the assessments measure what they are intended to
measure; how susceptible they are to faking; how well they capture the
actual processes involved in demonstrating the skill; and how reliable
they are. The summary below elaborates on these issues in relation to
each cluster of skills.
Cognitive Skills
With regard to skills in the cognitive cluster, such as critical thinking
and problem solving, Kuncel pointed out, “We have a good understand-
ing of these constructs when they are considered from a domain-specific
perspective.” As he described, “we know what it means to think critically
in certain contexts, such as when considering a physics problem or evalu -
ating a study in cognitive psychology, and we have a good understand-
ing of how to assess these skills from a domain-specific perspective.”
The example assessments of cognitive skills presented at the workshop
were all set within a context. For the PISA problem-solving test, each task
specifies the context, which all come from situations encountered in daily
life. The Multistate Bar exam poses critical thinking questions within the
context of the situations lawyers encounter. Operation ARIES! focuses
on evaluating scientific evidence, and Packet Tracer focuses on solving
problems with computer networking.
According to Kuncel, the problems arise with domain-general concep-
tions of these skills. In his view, focusing on broad critical thinking skills,
such as understanding the law of large numbers, and training students
to apply these skills, is not a useful endeavor. In his work, he has found
no evidence that learning these sorts of skills improves critical thinking
in general or in ways that can be transferred from one domain to another.
Further, he finds little evidence that a domain-general concept of critical
thinking is distinct from general cognitive ability.
Interpersonal Skills
With regard to interpersonal skills, Fiore reminded the audience of
the complexity of interpersonal interactions. Interpersonal skills involve a
mix of attitudinal, behavioral, and cognitive factors, all of which are used
to read the person in the context of the interaction and determine the most
appropriate way to respond. Designing assessments to measure these pro-
cesses is challenging. One issue that Fiore described is the fidelity of the
assessment: that is, the extent to which the assessment involves observa -
OCR for page 111
111
SYNTHESIS AND POLICY IMPLICATIONS
tions of actual interactions and actual emotional responses to the interac-
tions. He noted the scenario-based learning examples described by Louise
Yarnall and the portfolio assessments described by Bob Lenz represent
real-life interactions with authentic exchanges. With the scenario-based
learning examples, the students are introduced to a problem through a
real-life mechanism, such as an online letter from a manager. The students
have to work in teams to address the situation and collaborate to figure
out how to solve a complex work-related problem. These assessments
integrate technical and social skills.
Fiore views the portfolio examples as somewhat less authentic. While
the portfolios are structured collections of student work in which students
have documented the application of knowledge in a particular classroom
context, the evaluation of interpersonal skills is based on self-, peer, and
teacher ratings. Although these ratings are drawn from actual situations
the student was involved in, there is no control over the context or the
nature of the interaction. For instance, the situations may or may not have
involved conflict in the context of the collaborative projects. The type of
communication on which the student is evaluated may differ from one
student to the other. These variations interfere with both reliability and
validity, Fiore commented, in that the sampling of behavior and perfor-
mance included in the portfolio may not be consistent from year to year
or even from student to student.
The other two examples—situational judgment tests and assessment
center tasks—assess interpersonal skills in more contrived, controlled
situations, Fiore said. The assessor sets up the situation to which the test
taker is responding or in which the test taker is interacting. This guaran -
tees that certain samplings of behavior are observed, but they are not as
authentic as the other approaches. For instance, assessment centers obtain
simulated examples of behavior; the observers see how job candidates
perform in the situation simulated at the assessment center but not how
the candidate performs when he or she actually encounters that situation
in real life.
Fiore thinks situational judgment tests are even more removed from
real-world situations in that the test taker simply chooses what he or she
judges to be the best response. The candidate does not have to perform the
skill or demonstrate the capability. Fiore characterizes these assessments
as low in fidelity—low in enactive fidelity (the amount of true interaction
that takes place) and low in affective fidelity (the extent to which the expe-
rience elicits an emotion response). He also highlighted certain problems
that have arisen with these assessments. First, there is some complex -
ity in understanding why a candidate may have responded incorrectly.
To respond to the problem, the test taker has to choose the appropriate
response to the situation, but he or she also has to interpret the situation.
OCR for page 112
112 ASSESSING 21ST CENTURY SKILLS
When the test taker responds incorrectly, it is impossible to discern if he
or she did not know the appropriate response or did not understand the
situation. Fiore said situational judgment tests are also susceptible to fak -
ing in that test takers can make guesses about the most socially acceptable
response. To address this concern, some assessments ask the test taker
to choose the best and worst response, not just the best response. These
issues present potential threats to the validity of the test results.
Intrapersonal Skills
Assessment of intrapersonal skills is also challenging because of the
complexity of the processes involved. Hoyle reminded audience mem-
bers that intrapersonal skills involve planfulness, self-discipline, delay
of gratification, dealing with distractions, and adjusting the course when
things do not go as planned—all characteristics of self-regulation or, put
another way, the management of goal pursuit. The examples presented
involved assessments of integrity, conduct disorders/antisocial behavior,
self-regulated learning, and emotional intelligence. While these are all
skills involved with self-regulation, Hoyle said one of the first things
to consider is whether these skills are separable from personality. For
instance, with regard to integrity, is there a certain personality profile
associated with people who are prone to engage in dishonest behavior,
or conversely people who are likely to operate with integrity in the work -
place? Similar issues were raised by Gerald Matthews with respect to the
distinction between emotional intelligence and personality.
The examples included a variety of strategies for assessing these
skills. For tests of integrity, the strategies include both direct measures,
such as self-report in which the test taker clearly knows the purpose of
the assessment, and indirect measures, where the purpose is masked
from the test taker. With regard to self-report measures of integrity, Hoyle
questioned their utility, asking “How useful is it to ask a person who is
dishonest to tell you if they are dishonest?” Nevertheless, he pointed out,
considerable evidence documents their reliability, validity, and useful -
ness in employee selection. It is important to remember, however, that
these assessments are used to reduce the prevalence of counterproductive
behaviors in the aggregate, and test takers never receive their scores or
any feedback on their performance. This is an important distinction from
the type of testing done in the K-12 setting, where the focus is on report -
ing and interpreting scores in order to improve performance.
For evaluating antisocial behaviors and conduct disorders, a sin-
gle assessment strategy has been adopted by the field—the childhood
behavior checklist (or Achenbach system). In this case, there is broad
consensus in the field about the characteristics of the disorder, and the
OCR for page 113
113
SYNTHESIS AND POLICY IMPLICATIONS
construct is well defined. The checklist includes permutations that allow
it to be administered and scored from the point of view of the child or
adolescent, the parents, or the teachers, which permits multiple sources
of information in making a diagnosis. Hoyle noted it has been shown to
be both valid and reliable. He highlighted Odgers’ research documenting
that early identification and intervention can vastly improve outcomes
for people with these disorders. Several participants also called atten -
tion to the recently skyrocketing problems with bullying in schools and
noted that early identification of conduct disorders may help reduce the
incidence of this behavior.
The other two examples were of assessments still used for research
purposes. Hoyle found the assessments of self-regulated learning that Tim
Cleary is exploring to be both intriguing and promising. The assessment
strategies allow the researchers to directly observe someone engaged in
the activity of learning, and one of the alternatives that Cleary discussed
is having children report online as they actually proceed through the
learning process. Hoyle commented on the multitude of insights that
can be obtained by having children report on what they are doing before
they begin an activity, while they are engaged in the activity, and then
reflecting on it afterward. Preliminary work suggests these measures are
predictive of course grades. With regard to the assessments of emotional
intelligence, Hoyle tended to agree with Matthews that the construct is
not yet well defined, and questions remain about its distinction from
personality. As Hoyle put it, the measures Matthews discussed tend to be
highly correlated with personality to the extent that “one wonders if one
really needs separate measures of emotional intelligence or if, in fact, one
is able to capture that variability in standard personality measurement.”
Fairness and Accessibility
A third issue discussed throughout the workshop was fairness. As
explained in Chapter 5, in a testing context, fairness means the assessment
should be designed so that test takers can demonstrate their proficiency
on the targeted skill without irrelevant factors interfering with their per-
formance. Fairness is an essential component of validity. Some of the
constructs discussed during the workshop raised considerable concern
about fairness and possible sources of bias. One issue alluded to previ-
ously in this chapter is whether the assessments are measuring the skills
they purport to measure or are actually measuring personality traits or
intelligence. To what extent is a domain-general conception of critical
thinking distinct from general cognitive ability (intelligence)? To what
extent are emotional intelligence and integrity distinct from personality?
OCR for page 114
114 ASSESSING 21ST CENTURY SKILLS
There is some research to help answer these questions, but it is important
to be clear on what exactly is being assessed.
Related to this is the notion of trainability or malleability: that is, that
proficiency on the particular skill can increase as a consequence of train -
ing and practice. To what extent can a person learn to have more integrity,
to become more self-regulated, or to have better social skills? Some stu-
dents may come to school better prepared to collaborate with others or
to manage their own learning. This may occur as a result of family back-
ground characteristics, home environment, or other out-of-school experi -
ences. To what extent would assessments be measuring skills that can
be learned in school versus family background? There is some research
on these issues as well, but as Greg Duncan, professor of education with
the University of California, Irvine, noted, the findings are not definitive.
Related to this issue is the notion of opportunity to learn. If these skills
are indeed trainable, to what extent will all students have equal exposure
to instruction in the skills? If students are expected to acquire these skills
and teachers are held accountable for teaching them, instructional pro -
grams will be needed so that students have the opportunity to learn them.
This issue has direct bearing on fairness and ultimately on the validity of
assessments. Workshop participants noted that these issues will need to
be investigated and understood before moving into wide use of assess -
ments of these skills, particularly if the results are used to make important
decisions about students.
There were also considerable concerns about the issue of construct
irrelevant variance, particularly as it relates to English language learn -
ers. Patrick Kyllonen, director of the Center for Academic and Workplace
Readiness and Success at the Educational Testing Service, cited statistics
that in the state of California, 25 percent of all public school students are
English language learners, with the numbers increasing rapidly in other
states as well (e.g., see National Research Council, 2011). For an assess -
ment like the situational judgment test that presents a verbally dense
description of a situation, language skills are critically important. For
students with weak English language skills, the assessments would be a
reading test, not a measure of interpersonal skills.
IMPLICATIONS FOR POLICY
Herman posed two additional questions to the group during the dis -
cussion session. If 21st century skills were included in assessments, what
would the assessment system look like? And how would we go about
implementing such a system?
In responding to the first question, she returned to her point about the
many types of assessments and the many ways of using the results. She
OCR for page 115
115
SYNTHESIS AND POLICY IMPLICATIONS
highlighted the fact that throughout the workshop, participants repeat-
edly raised questions about the purposes of the assessments and the
levels at which they would be used. In her view, the full spectrum of
assessment purposes should be explored in determining ways to incor-
porate these skills into K-12 schooling. She said she would advocate for
a system that included a variety of formative components intended both
to guide instructional decision making and to enable early identification
of potential problems. These might be combined with assessments used
for a variety of summative purposes, including accountability for schools,
teachers, and students, under the goal of ensuring students receive the
exposure and engagement they need to develop the skills that are critical
for college and workforce readiness.
In addressing the second question, she called for work to identify
the constructs on which to focus. Throughout the workshop, a variety of
skills and constructs were discussed, but as Herman put it, “we cannot
do everything at once.” The initial work would be to identify the most
critical skills and predispositions for students to learn, set priorities on
what is most important, and then develop strategies for teaching and
assessing them.
She referred to the Race to the Top (RTTT) assessment consortia2 as
one vehicle for moving this work forward. She said the changes enacted
through the RTTT efforts provide a timely opportunity for bringing atten -
tion to new skills. The cognitive skills of critical thinking and problem
solving, she noted, are already incorporated into the common core stan -
dards. The next step would be to make sure these skills are included in
the curriculum and the assessments and then to encourage focus on some
of the interpersonal and intrapersonal skills.
As part of this discussion, Patrick Kyllonen commented about the
idea of “consequential validity” or the social/educational consequences
of having the assessment in place and making use of the test results.
There are many examples, he noted, of tests inserted into testing systems,
not necessarily because they will improve psychometric properties, but
because of the consequences they might bring about. An example would
be the inclusion of writing assessments in many standardized assess -
ments—such as the SAT, GRE, MCAT, and LSAT—despite the fact that
they may not significantly improve the predictive validity of the assess-
ment. In this case, the notion is that including an assessment of writing,
and attaching stakes to it, should bring about an increased focus on devel-
oping writing skills, both by teachers in their instruction and by potential
test takers as they prepare for the assessment. Currently, in K-12 educa-
tion, Kyllonen continued, accountability systems revolve almost entirely
2 See http://www2.ed.gov/programs/racetothetop-assessment/index.html [May 2011].
OCR for page 116
116 ASSESSING 21ST CENTURY SKILLS
around the ability of students to take reading and math tests. Thus, one
consequence of incorporating 21st century skills into the assessment or
the accountability system would be to encourage teachers and students
to spend more time on these skills. As characterized by one workshop
participant, what is tested is taught, and what is not tested is not taught.
Herman also spoke about teacher and teaching capacity. She sum-
marized comments from workshop participants who pointed out that the
development of 21st century skills and their integration with academic con-
tent is not a regular feature of curriculum or instruction; in some school sys-
tems, there may be some focus on the cognitive skills, but this is certainly
not the case for the interpersonal and intrapersonal skills. While some
teachers may have experience with assigning grades for effort, attitude, and
behavior, the interpersonal and intrapersonal skills discussed at the work-
shop go far beyond these measures. This means the teaching and assess-
ment of 21st century skills will require changes in curriculum and teacher
practices that will require a substantial amount of teacher development.
As emphasis on these skills takes on new meaning, teachers would need a
good deal of assistance both to understand the nature of these constructs
and to learn how to develop them in their students so that all students
have the opportunity to learn them. This has implications both for teacher
preparation programs and for teacher inservice professional development.
Herman also called for transparency. She noted the changes required
in curriculum, instruction, teacher training, and assessment can be made
more smoothly by transparency. Being transparent will help teachers
and students understand the skills that are being emphasized and will
help the assessment developers better understand the skills that are to
be measured.
Feasibility and Moving Forward
As one workshop participant pointed out, students in U.S. schools
already spend considerable time taking tests. Many educators would not
readily welcome the idea of adding more tests to the school day. How -
ever, this idea assumes the assessments would be something put upon
students rather than an integrated part of the curriculum. The view of
the assessments endorsed by Herman and other workshop participants
was that the various constructs would be incorporated into the academic
curriculum so that their teaching would be an integral part of the instruc -
tional program. For instance, it is not difficult to imagine incorporating
a team project into the regular science, social studies, language arts, or
mathematics program. Incorporating activities in which students must
problem solve, think creatively, and communicate their work to others
using multiple types of mediums seems natural in academic settings.
OCR for page 117
117
SYNTHESIS AND POLICY IMPLICATIONS
Adding ongoing formative assessments that help to guide instruction
of these skills does not seem like a heavy burden to place on teachers
and students. As John Behrens noted in describing the Packet Tracer, the
system relies on “stealth assessments”; often students do not even realize
they are being tested.
At the same time, other workshop participants stressed it is important
not to lose sight of the need to ensure that students in the United States
learn the basic academics. As Paul Sackett put it, “If we were at a different
conference, we would be spending time lamenting the fact that students
in the U.S. are not up to par on some fundamental academic skills.” Like-
wise, Deirdre Knapp noted all 21st century skills are not equal—some
are clearly more important for students to learn than others, and we are
further along in knowing how to assess some skills than others. Thus, it
is critical to set priorities for where and how to spend the limited time,
money, and resources.
Kyllonen also emphasized the importance of considering the cost
tradeoffs. He noted the various examples of assessments included some
“ingenious low-cost assessments and some dazzling high-cost assess-
ments.” He encouraged work to study the differences in order to figure
out where high-cost investment is cost-effective and where it might not
make a difference. He and others pointed to examples other than those
presented at the workshop that might be important resources and models.
For instance, Herman mentioned the work that David Conley, with the
Educational Policy Improvement Center (EPIC), has been doing to iden -
tify critical components of college and career readiness, as well as similar
efforts by the National Assessment of Educational Progress (NAEP) to
focus the 12th grade assessment on these skills. Kyllonen also spoke of
the exams used to assess critical-thinking skills at the college level, such
as the Collegiate Learning Assessment (CLA), the ACT CAP Test, and
the ETS Proficiency Profile Test. They are all operational programs, he
pointed out, that may serve as models. Knapp noted the work the mili -
tary has been doing to evaluate temperament, persistence, and stamina.
Others commented that while the Envision High School was featured at
the workshop, a number of such high schools throughout the country are
working to incorporate instruction and assessment of 21st century skills
into the curriculum in innovative ways.
Defining the overall purpose of the assessments was an issue raised
repeatedly in deciding on a path for moving forward. Sackett framed the
issue as deciding between a focus on individual results or group-level
results. He asked, “Do we want students to leave school with an individual-
ized certificate that documents their level of competence in each skill? Or do
we want to document how the nation is doing in aggregate?” He cautioned
obtaining precise and reliable assessment at the individual level is difficult,
OCR for page 118
118 ASSESSING 21ST CENTURY SKILLS
costly, and time consuming. On the other hand, Steve Wise questioned how
best to address the different aspirations that students have. While there is
currently a heavy emphasis on ensuring all students pursue higher educa-
tion, in reality, that is not likely to occur. Students have different goals. Do
we design a system that is a “one size fits all plan,” he asked, do we focus
on minimal competency across the board, or do we design a system that
attends to the specific needs of the individual?
Several workshop participants spoke of the types of research needed
in order to move forward with assessments of these skills. Deirdre Knapp
pointed out many assessments are “pushing the envelope” as far as psy -
chometric capabilities. For example, how does one evaluate the reliability
of assessments such as those used by Art Graesser’s Auto Tutor? Greg
Duncan called for research in two areas. First, he noted, if we are to
relate these skills to training in school, we need to know what it takes to
change these skills. That is, how malleable are they and what is involved
in improving them? Second, he called for more in-depth study of the
predictive power of the various skills, noting that what is needed is not
simply correlations among the variables but well-controlled analyses to
demonstrate that improvement in these skills results in improvement in
academic and labor market outcomes. Finally, Juan Sanchez, professor of
management and international business at Florida International Univer-
sity, called for increased levels of cross-disciplinary efforts, stressing that
successfully tackling these issues will require the collaboration of exper-
tise from many disciplines including measurement, cognitive psychology,
and information technology.