Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 15
2
Assessing Cognitive Skills
A
s described in Chapter 1, the steering committee grouped the five
skills identified by previous efforts (National Research Council,
2008, 2010) into the broad clusters of cognitive skills, interpersonal
skills, and intrapersonal skills. Based on this grouping, two of the identi -
fied skills fell within the cognitive cluster: nonroutine problem solving
and systems thinking. The definition of each, as provided in the previous
report (National Research Council, 2010, p. 3), appears below:
Nonroutine problem solving: A skilled problem solver uses expert
thinking to examine a broad span of information, recognize patterns,
and narrow the information to reach a diagnosis of the problem. Moving
beyond diagnosis to a solution requires knowledge of how the informa-
tion is linked conceptually and involves metacognition—the ability to
reflect on whether a problem-solving strategy is working and to switch
to another strategy if it is not working (Levy and Murnane, 2004). It
includes creativity to generate new and innovative solutions, integrat -
ing seemingly unrelated information, and entertaining possibilities that
others may miss (Houston, 2007).
Systems thinking: The ability to understand how an entire system
works; how an action, change, or malfunction in one part of the sys-
tem affects the rest of the system; adopting a “big picture” perspective
on work (Houston, 2007). It includes judgment and decision making,
systems analysis, and systems evaluation as well as abstract reasoning
about how the different elements of a work process interact (Peterson et
al., 1999).
15
OCR for page 16
16 ASSESSING 21ST CENTURY SKILLS
After considering these definitions, the committee decided a third
cognitive skill, critical thinking, was not fully represented. The committee
added critical thinking to the list of cognitive skills, since competence in
critical thinking is usually judged to be an important component of both
skills (Mayer, 1990). Thus, this chapter focuses on assessments of three
cognitive skills: problem solving, critical thinking, and systems thinking.
DEFINING THE CONSTRUCT
One of the first steps in developing an assessment is to define the
construct and operationalize it in a way that supports the development
of assessment tasks. Defining some of the constructs included within the
scope of 21st century skills is significantly more challenging than defining
more traditional constructs, such as reading comprehension or mathemat-
ics computational skills. One of the challenges is that the definitions tend
to be both broad and general. To be useful for test development, the defi-
nition needs to be specific so that there can be a shared conception of the
construct for use by those writing the assessment questions or preparing
the assessment tasks.
This set of skills also generates debate about whether they are domain
general or domain specific. A predominant view in the past has been
that critical thinking and problem-solving skills are domain general: that
is, that they can be learned without reference to any specific domain
and, further, once they are learned, can be applied in any domain. More
recently, psychologists and learning theorists have argued for a domain-
specific conception of these skills, maintaining that when students think
critically or solve problems, they do not do it in the absence of subject
matter: instead, they think about or solve a problem in relation to some
topic. Under a domain-specific conception, the learner may acquire these
skills in one domain as he or she acquires expertise in that domain, but
acquiring them in one domain does not necessarily mean the learner can
apply them in another.
At the workshop, Nathan Kuncel, professor of psychology with Uni-
versity of Minnesota, and Eric Anderman, professor of educational psy -
chology with Ohio State University, discussed these issues. The sections
below summarize their presentations and include excerpts from their
papers,1 dealing first with the domain-general and domain-specific con-
1 For Kuncel’s presentation, see http://www7.national-academies.org/bota/21st_
Century_Workshop_Kuncel.pdf. For Kuncel’s paper, see http://www7.national-
academies.org/bota/21st_Century_Workshop_Kuncel_Paper.pdf. For Anderman’s presenta-
tion, see http://www7.national-academies.org/bota/21st_Century_Workshop_Anderman.
pdf. For Anderman’s paper, see http://nrc51/xpedio/groups/dbasse/documents/
webpage/060387~1.pdf [August 2011].
OCR for page 17
17
ASSESSING COGNITIVE SKILLS
ceptions of critical thinking and problem solving and then with the issue
of transferring skills from one domain to another.
Critical Thinking: Domain-Specific or Domain-General
It is well established, Kuncel stated, that foundational cognitive
skills in math, reading, and writing are of central importance and that
students need to be as proficient as possible in these areas. Foundational
cognitive abilities, such as verbal comprehension and reasoning, mathe -
matical knowledge and skill, and writing skills, are clearly important for
success in learning in college as well as in many aspects of life. A recent
study documents this. Kuncel and Hezlett (2007) examined the body of
research on the relationships between traditional measures of verbal and
quantitative skills and a variety of outcomes. The measures of verbal
and quantitative skills included scores on six standardized tests—the
GRE, MCAT, LSAT, GMAT, MAT, and PCAT.2 The outcomes included
performance in graduate school settings ranging from Ph.D. programs
to law school, medical school, business school, and pharmacy programs.
Figure 2-1 shows the correlations between scores on the standardized
tests and the various outcome measures, including (from bottom to
top) first-year graduate GPA (1st GGPA), cumulative graduate GPA
(GGPA), qualifying or comprehensive examination scores, completion
of the degree, estimate of research productivity, research citation counts,
faculty ratings, and performance on the licensing exam for the profes -
sion. For instance, the top bar shows a correlation between performance
on the MCAT and performance on the licensing exam for physicians of
roughly .65, the highest of the correlations reported in this figure. The
next bar indicates the correlation between performance on the LSAT and
performance on the licensing exam for lawyers is roughly .35. Of the 34
correlations shown in the figure, all but 11 are over .30. Kuncel charac -
terized this information as demonstrating that verbal and quantitative
skills are important predictors of success based on a variety of outcome
measures, including performance on standardized tests, whether or not
people finish their degree program, how their performance is evaluated
by faculty, and their contribution to the field.
Kuncel has also studied the role that broader abilities have in predict -
ing future outcomes. A more recent review (Kuncel and Hezlett, 2010)
examined the body of research on the relationships between measures of
general cognitive ability (historically referred to as IQ) and job outcomes,
2 Respectively,
the Graduate Record Exam, Medical College Admission Test, Law School
Admission Test, Graduate Management Admission Test, Miller Analogies Test, and Phar-
macy College Admission Test.
OCR for page 18
18 ASSESSING 21ST CENTURY SKILLS
Licensing
Exam
Faculty
Ratings
Citation
GRE-T
Count GRE-S
Career Outcome
MCAT
Research LSAT
Productivity GMAT
MAT
Degree PCAT
Completion
Qualifying
Exam
GGPA
1st GGPA
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Correlation
FIGURE 2-1 Correlations between scores on standardized tests and academic and
Figure 2-1.eps
job outcome measures.
SOURCE: Kuncel and Hezlett (2007). Reprinted with permission of American As -
sociation for the Advancement of Science.
including performance in high, medium, and low complexity jobs; train -
ing success in civilian and military settings; how well leaders perform on
objective measures; and evaluations of the creativity of people’s work.
Figure 2-2 shows the correlations between performance on a measure of
general cognitive ability and these outcomes. All of the correlations are
above .30, which Kuncel characterized as demonstrating a strong relation-
ship between general cognitive ability and job performance across a vari -
ety of performance measures. Together, Kuncel said, these two reviews
present a body of evidence documenting that verbal and quantitative
skills along with general cognitive ability are predictive of college and
career performance.
Kuncel noted that other broader skills, such as critical thinking or ana-
lytical reasoning, may also be important predictors of performance, but
he characterizes this evidence as inconclusive. In his view, the problems
lie both with the conceptualization of the constructs as domain-general
(as opposed to domain-specific) as well as with the specific definition of
the construct. He finds the constructs are not well defined and have not
OCR for page 19
19
ASSESSING COGNITIVE SKILLS
Job Performance,
High Complexity
Job Performance,
Medium Complexity
Job Performance,
Low Complexity
Training Success,
Civilian
Training Success,
Military
Objective Leader
Effectiveness
Creativity
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Correlation
FIGURE 2-2 Correlations between measures of cognitive ability and job
Figure 2-2.eps
performance.
bitmap
SOURCE: Kuncel and Hezlett (2011). Copyright 2010 by Sage Publications. Re -
printed with permission of Sage Publications.
been properly validated. For instance, a domain-general concept of the
construct of “critical thinking” is often indistinguishable from general
cognitive ability or general reasoning and learning skills. To demonstrate,
Kuncel presented three definitions of critical thinking that commonly
appear in the literature:
1. “[Critical thinking involves] cognitive skills or strategies that
increase the probability of a desirable outcome—in the long run,
critical thinkers will have more desirable outcomes than ‘noncriti-
cal’ thinkers. . . . Critical thinking is purposeful, reasoned, and
goal-directed. It is the kind of thinking involved in solving prob-
lems, formulating inferences, calculating likelihoods, and making
decisions” (Halpern, 1998, pp. 450-451).
2. “Critical thinking is reflective and reasonable thinking that is
focused on deciding what to believe or do” (Ennis, 1985, p. 45).
3. “Critical thinking [is] the ability and willingness to test the valid -
ity of propositions” (Bangert-Drowns and Bankert, 1990, p. 3).
OCR for page 20
20 ASSESSING 21ST CENTURY SKILLS
He characterizied these definitions both very general and very broad.
For instance, Halpern’s definition essentially encompasses all of problem
solving, judgment, and cognition, he said. Others are more specific and
focus on a particular class of tasks (e.g., Bangert-Drowns and Bankert,
1990). He questioned the extent to which critical thinking so conceived is
distinct from general cognitive ability (or general intelligence).
Kuncel conducted a review of the literature for empirical evidence of
the validity of the construct of critical thinking. The studies in the review
examined the relationships between various measures of critical thinking
and measures of general intelligence and expert performance. He looked
for two types of evidence—convergent validity evidence3 and discrimi-
nant validity4 evidence.
Kuncel found several analyses of the relationships among different
measures of critical thinking (see Bondy et al., 2001; Facione, 1990; and
Watson and Glaser, 1994). The assessments that were studied included the
Watson-Glaser Critical Thinking Appraisal (WGCTA), the Cornell Critical
Thinking Test (CCTT), the California Critical Thinking Skills Test (CCTST),
and the California Critical Thinking Disposition Inventory (CCTDI). The
average correlation among the measures was .41. Considering that all of
these tests purport to be measures of the same construct, Kuncel judged
this correlation to be low. For comparison, he noted a correlation of .71
between two subtests of the SAT intended to measure critical thinking
(the SAT-critical reading test and the SAT-writing test).
With regard to discriminant validity, Kuncel conducted a literature
search that yielded 19 correlations between critical-thinking skills and
traditional measures of cognitive abilities, such as the Miller Analogies
Test and the SAT (Adams et al., 1999; Bauer and Liang, 2003; Bondy et
al., 2001; Cano and Martinez, 1991; Edwards, 1950; Facione et al., 1995,
1998; Spector et al., 2000; Watson and Glaser, 1994). He separated the
studies into those that measured critical-thinking skills and those that
measured critical-thinking dispositions (i.e., interest and willingness to
use one’s critical-thinking skills). The average correlation between gen -
3 Convergent validilty indicates the degree to which an operationalized construct is simi -
lar to other operationalized constructs that it theoretically should also be similar to. For
instance, to show the convergent validity of a test of critical thinking, the scores on the test
can be correlated with scores on other tests that are also designed to measure critical think -
ing. High correlations between the test scores would be evidence of convergent validity.
4 Discriminant validity evaluates the extent to which a measure of an operationalized con -
struct differs from measures of other operationalized constructs that it should differ from.
In the present context, the interest is in verifying that critical thinking is a construct distinct
from general intelligence and expert performance. Thus, discriminant validity would be
examined by evaluating the patterns of correlations between and among scores on tests of
critical thinking and scores on tests of the other two constructs (general intelligence and
expert performance).
OCR for page 21
21
ASSESSING COGNITIVE SKILLS
eral cognitive ability measures and critical-thinking skills was .48, and
the average correlation between general cognitive ability measures and
critical-thinking dispositions was .21.
Kuncel summarized these results as demonstrating that different
measures of critical thinking show lower correlations with each other (i.e.,
average of .41) than they do with traditional measures of general cognitive
ability (i.e., average of .48). Kuncel judges that these findings provide little
support for critical thinking as a domain-general construct distinct from
general cognitive ability. Given this relatively weak evidence of conver-
gent and discriminant validity, Kuncel argued, it is important to deter-
mine if critical thinking is correlated differently than cognitive ability with
important outcome variables like grades or job performance. That is, do
measures of critical-thinking skills show incremental validity beyond the
information provided by measures of general cognitive ability?
Kuncel looked at two outcome measures: grades in higher education
and job performance. With regard to higher education, he examined data
from 12 independent samples with 2,876 subjects (Behrens, 1996; Gadzella
et al., 2002, 2004; Kowalski and Taylor, 2004; Taube, 1997; Williams, 2003).
Across these studies, the average correlation between critical-thinking
skills and grades was .27 and between critical-thinking dispositions and
grades was .24. To put these correlations in context, the SAT has an
average correlation with 1st year college GPA between .26 to .33 for the
individual scales and .35 when the SAT scales are combined (Kobrin et
al., 2008).5
There are very limited data that quantify the relationship between
critical-thinking measures and subsequent job performance. Kuncel
located three studies with the Watson-Glaser Appraisal (Facione and
Facione, 1996, 1997; Giancarlo, 1996). They yielded an average correlation
of .32 with supervisory ratings of job performance (N = 293).
Kuncel described these results as “mixed” but not supporting a con -
clusion that assessments of critical thinking are better predictors of college
and job performance than other available measures. Taken together with
the convergent and discriminant validity results, the evidence to support
critical thinking as an independent construct distinct from general cogni -
tive ability is weak.
Kuncel believes these correlational results do not tell the whole story,
however. First, he noted, a number of artifactual issues may have contrib -
uted to the relatively low correlation among different assessments of criti-
cal thinking, such as low reliability of the measures themselves, restriction
in range, different underlying definitions of critical thinking, overly broad
5 It is important to note that when corrected for restriction in range, these coefficients
increase to .47 to .51 for individual scores and .51 for the combined score.
OCR for page 22
22 ASSESSING 21ST CENTURY SKILLS
definitions that are operationalized in different ways, different kinds of
assessment tasks, and different levels of motivation in test takers.
Second, he pointed out, even though two tests correlate highly with
each other, they may not measure the same thing. That is, although the
critical-thinking tests correlate .48, on average, with cognitive ability mea-
sures, it does not mean that they measure the same thing. For example,
a recent study (Kuncel and Grossbach, 2007) showed that ACT and SAT
scores are highly predictive of nursing knowledge. But, obviously, indi -
viduals who score highly on a college admissions test do not have all the
knowledge needed to be a nurse. The constructs may be related but not
overlap entirely.
Kuncel explained that one issue with these studies is they all con-
ceived of critical thinking in its broadest sense and as a domain-general
construct. He said this conception is not useful, and he summarized his
meta-analysis findings as demonstrating little evidence that critical think-
ing exists as a domain-general construct distinct from general cognitive
ability. He highlighted the fact that some may view critical thinking as a
specific skill that, once learned, can be applied in many situations. For
instance, many in his field of psychology mention the following as specific
critical-thinking skills that students should acquire: understanding the
law of large numbers, understanding what it means to affirm the conse-
quent, being able to make judgments about sample bias, understanding
control groups, and understanding Type I versus Type II errors. However,
Kuncel said many tasks that require critical thinking would not make use
of any of these skills.
In his view, the stronger argument is for critical thinking as a domain-
specific construct that evolves as the person acquires domain-specific
knowledge. For example, imagine teaching general critical-thinking skills
that can be applied across all reasoning situations to students. Is it rea -
sonable, he asked, to think a person can think critically about arguments
for different national economic policies without understanding macro-
economics or even the current economic state of the country? At one
extreme, he argued, it seems clear that people cannot think critically about
topics for which they have no knowledge, and their reasoning skills are
intimately tied to the knowledge domain. For instance, most people have
no basis for making judgments about how to conduct or even prioritize
different experiments for CERN’s Large Hadron Collider. Few people
understand the topic of particle physics sufficiently to make more than
trivial arguments or decisions. On the other hand, perhaps most people
could try to make a good decision about which among a few medical
treatments would best meet their needs.
Kuncel also talked about the kinds of statistical and methodological
reasoning skills learned in different disciplines. For instance, chemists,
OCR for page 23
23
ASSESSING COGNITIVE SKILLS
engineers, and physical scientists learn to use these types of skills in
thinking about the laws of thermodynamics that deal with equilibrium,
temperature, work, energy, and entropy. On the other hand, psycholo-
gists learn to use these skills in thinking about topics such as sample bias
and self-selection in evaluating research findings. Psychologists who are
adept at thinking critically in their own discipline would have difficulty
thinking critically about problems in the hard sciences, unless they have
specific subject matter knowledge in the discipline. Likewise, it is dif -
ficult to imagine that a scientist highly trained in chemistry could solve
a complex problem in psychology without knowing some subject matter
in psychology.
Kuncel said it is possible to train specific skills that aid in making
good judgments in some situations, but the literature does not demon-
strate that it is possible to train universally effective critical thinking
skills. He noted, “I think you can give people a nice toolbox with all sorts
of tools they can apply to a variety of tasks, problems, issues, decisions,
citizenship questions, and learning those things will be very valuable, but
I dissent on them being global and trainable as a global skill.”
Transfer from One Context to Another
There is a commonplace assumption, Eric Anderman noted in his
presentation, that learners readily transfer the skills they have learned
in one course or context to situations and problems that arise in another.
Anderman argued research on human learning does not support this
assumption. Research suggests such transfer seldom occurs naturally,
particularly when learners need to transfer complex cognitive strategies
from one domain to another (Salomon and Perkins, 1989). Transfer is only
likely to occur when care is taken to facilitate that transfer: that is, when
students are specifically taught strategies that facilitate the transfer of
skills learned in one domain to another domain (Gick and Holyoak, 1983).
For example, Anderman explained, students in a mathematics class
might be taught how to solve a problem involving the multiplication of
percentages (e.g., 4.79% × 0.25%). The students then might encounter a
problem in their social studies courses that involves calculating com-
pounded interest (such as to solve a problem related to economics or
banking). Although the same basic process of multiplying percentages
might be necessary to solve both problems, it is unlikely that students will
naturally, on their own, transfer the skills learned in the math class to the
problem encountered in the social studies class.
In the past, Anderman said, there had been some notion that critical-
thinking and problem-solving skills could be taught independent of con -
text. For example, teaching students a complex language such as Latin,
OCR for page 24
24 ASSESSING 21ST CENTURY SKILLS
a computer programming language such as LOGO, or other topics that
require complex thinking might result in an overall increase in their abil-
ity to think critically and problem solve.
Both Kuncel and Anderman maintained that the research does not
support this idea. Instead, the literature better supports a narrower
definition in which critical thinking is considered a finite set of specific
skills. These skills are useful for effective decision making for many, but
by no means all, tasks or situations. Their utility is further curtailed by
task-specific knowledge demands. That is, a decision maker often has
to have specific knowledge to make more than trivial progress with a
problem or decision.
Anderman highlighted four important messages emerging from
recent research. First, research documents that it is critical that students
learn basic skills (such as basic arithmetic skills like times tables) so the
skills become automatic. Mastery of these skills is required for the suc -
cessful learning of more complex cognitive skills. Second, the use of
general practices intended to improve students’ thinking are not usually
successful as a means of improving their overall cognitive abilities. The
research suggests students may become more adept in the specific skill
taught, but this does not transfer to an overall increase in cognitive ability.
Third, when general problem-solving strategies are taught, they should be
taught within meaningful contexts and not as simply rote algorithms to be
memorized. Finally, educators need to actively teach students to transfer
skills from one context to another by helping students to recognize that
the solution to one type of problem may be useful in solving a problem
with similar structural features (Mayer and Wittrock, 1996).
He noted that instructing students in general problem-solving skills
can be useful but more elaborate scaffolding and domain-specific appli-
cations of these skills are often necessary. Whereas general problem-
solving and critical-thinking strategies can be taught, research indicates
these skills will not automatically or naturally transfer to other domains.
Anderman stressed that educators and trainers must recognize that 21st
century skills should be taught within specific domains; if they are taught
as general skills, he cautioned, then extreme care must be taken to facili -
tate the transfer of these skills from one domain to another.
ASSESSMENT EXAMPLES
The workshop included examples of four different types of assess -
ments of critical-thinking and problem-solving skills—one that will be
used to make international comparisons of achievement, one used to
license lawyers, and two used for formative purposes (i.e., intended to
support instructional decision making). The first example was the com-
OCR for page 25
25
ASSESSING COGNITIVE SKILLS
puterized problem-solving component of the Programme for International
Student Assessment (PISA). This assessment is still under development
but is scheduled for operational administration in 2012.6 Joachim Funke,
professor of cognitive, experimental, and theoretical psychology with the
Heidelberg University in Germany, discussed this assessment.
The second example was the Multistate Bar Exam, a paper-and-pencil
test that consists of both multiple-choice and extended-response compo -
nents. This test is used to qualify law students for practice in the legal
profession. Susan Case, director of testing with the National Conference
of Bar Exams, made this presentation.
The two formative assessments both make use of intelligent tutors,
with assessments embedded into instruction modules. The “Auto Tutor”
described by Art Graesser, professor of psychology with the University
of Memphis, is used in instructing high school and higher education stu -
dents in critical thinking skills in science. The Auto Tutor is part of a sys-
tem Graesser has developed called Operation ARIES! (Acquiring Research
Investigative and Evaluative Skills). The “Packet Tracer,” described by
John Beherns, director of networking academy learning systems develop-
ment with Cisco, is intended for individuals learning computer network -
ing skills.
Problem Solving on PISA
For the workshop, Joachim Funke supplied the committee with the
draft framework for PISA (see Organisation for Economic Co-operation
and Development, 20107) and summarized this information in his presen-
tation.8 The summary below is based on both documents.
PISA, Funke explained, defines problem solving as an individual’s
capacity to engage in cognitive processing to understand and resolve
problem situations where a solution is not immediately obvious. The
definition includes the willingness to engage with such situations in
order to achieve one’s potential as a constructive and reflective citizen
(Organisation for Co-operation and Development, 2010, p. 12). Further,
the PISA 2012 assessment of problem-solving competency will not test
simple reproduction of domain-based knowledge, but will focus on the
cognitive skills required to solve unfamiliar problems encountered in life
and lying outside traditional curricular domains. While prior knowledge
6 For a full description of the PISA program, see http://www.oecd.org/pages/0,3417,en_
32252351_32235731_1_1_1_1_1,00.html [August 2011].
7 Available at http://www.oecd.org/dataoecd/8/42/46962005.pdf [August 2011].
8 Available at http://www7.national-academies.org/bota/21st_Century_Workshop_
Funke.pdf [August 2011].
OCR for page 28
28 ASSESSING 21ST CENTURY SKILLS
For model building, full credit is awarded if the generated model is
correct. If one or two errors are present in the model, partial credit is given.
If more than two errors are present, then no credit is awarded.
For forecasting, full credit is given if the target goals are reached.
Partial credit is given if some progress toward the target goals can be
registered, and no credit is given if there is no progress toward target
goals at all.
PISA items are classified as static versus interactive. In static prob -
lems, all the information the test taker needs to solve the problem is
presented at the outset. In contrast, interactive problems require the test
taker to explore the problem to uncover important relevant information
(Organisation for Economic Co-operation and Development, 2010, p. 15).
Two sample PISA items appear in Box 2-1.
Funke and his colleagues have conducted analyses to evaluate the con-
struct validity of the assessment. They have examined the internal structure
of the assessment using structural equation modeling, which evaluates
BOX 2-1
Sample Problem-Solving Items for PISA 2012
Digital Watch–interactive:
A simulation of a digital watch is presented. The watch is controlled by four
buttons, the functions of which are unknown to the student at the outset of the
problems. The student is required to (Q1) determine through guided exploration
how the buttons work in TIME mode, (Q2) complete a diagram showing how to
cycle through the various modes, and (Q3) use this knowledge to control the watch
(set the time).
Q1 is intended to measure exploring and understanding, Q2 measures repre-
senting and formulating, Q3 measures planning and executing.
Basketball–static
The rules for a basketball tournament relating to the way in which match time
should be distributed between players are given. There are two more players than
required (5) and each player must be on court for at least 25 of the 40 minutes
playing time. Students are required to (Q1) create a schedule for team members
that satisfies the tournament rules, and (Q2) reflect on the rules by critiquing an
existing schedule.
Q1 is designed to measure planning and executing, Q2 measures monitoring
and reflecting.
SOURCE: Organisation for Economic Co-operation and Development (2010, p. 28). Reprinted
with permission of Organisation for Economic Co-operation and Development.
OCR for page 29
29
ASSESSING COGNITIVE SKILLS
the extent to which the items measure the dimensions they are intended
to measure. The results indicate the three dimensions are correlated with
each other. Model Building and Forecasting correlate at .77; Forecasting
and Information Retrieval correlate at .71; and Information Retrieval and
Model Building correlate at .75. Funke said that the results also document
that the items “load on” the three dimensions in the way the test developers
hypothesized. He indicated some misfit related to the items that measure
Forecasting, and he attributes this to the fact that the Forecasting items have
a skewed distribution. However, the fit of the model does not change when
these items are removed.
Funke reported results from studies of the relationship between test
performance and other variables, including school achievement and two
measures of problem solving on the PISA German National Extension
on Complex Problem Solving. The latter assessment, called HEIFI, mea-
sures knowledge about a system and the control of the system separately.
Scores on the PISA Model Building dimension are statistically significant
(p < .05) related to school achievement (r = .64) and to scores on the HEIFI
knowledge component (r = .48). Forecasting is statistically significant
(p < .05) related to both of the HEIFI scores (r = .48 for HEIFI knowledge
and r = .36 for HEIFI control). Information Retrieval is statistically signifi -
cant (p < .05) related to HEIFI control (r = .38). The studies also show that
HEIFI scores are not related to school achievement.
Funke closed by discussing the costs associated with the assess -
ment. He noted it is not easy to specify the costs because in a German
university setting, many costs are absorbed by the department and its
equipment. Funke estimates that development costs run about $13 per
unit,9 plus $6.5 for the Cognitive Labs used to pilot test and refine the
items.10 The license for the Computer Based Assessment (CBA) Item -
builder and the execution environment is given for free for scientific use
from DIPF11 Frankfurt.
The Bar Examination for Lawyers12
The Bar examination is administered by each jurisdiction in the
United States as one step in the process to license lawyers. The National
Council of Bar Examiners (NCBE) develops a series of three exams for
use by the jurisdictions. Jurisdictions may use any or all of these three
9A unit consists of stimulus materials, instructions, and the associated questions.
10 Costs are in American dollars.
11 DIPF stands for the Deutsches Institut für Internationale Pädagogische Forschung, which
translates to the German Institute for Educational Research and Educational Information.
12 The summary is based on a presentation by Susan Case, see http://www7.national-
academies.org/bota/21st_Century_Workshop_Case.pdf [August 2011].
OCR for page 30
30 ASSESSING 21ST CENTURY SKILLS
exams or may administer locally developed exam components if they
wish. The three major components developed by the NCBE include the
Multi-state Bar Exam (MBE), the Multi-state Essay Exam (MEE), and
the Multi-state Performance Test (MPT). All are paper-and-pencil tests.
Examinees pay to take the test, and the costs are $54 for the MBE, $20
for the MEE, and $20 for the MPT.
Susan Case, who has spent her career working on licensing exams—
first the medical licensing exam for physicians and then the bar exam
for lawyers—noted the Bar examination is like other tests used to award
professional licensure. The focus of the test is on the extent to which the
test taker has the knowledge and skills necessary to be licensed in the
profession on the day of the test. The test is intended to ensure the newly
licensed professional knows what he/she needs to know to practice law.
The test is not designed to measure the curriculum taught in law schools,
but what licensed professionals need to know. When they receive the cre -
dential, lawyers are licensed to practice in all fields of law. This is analo -
gous to medical licensing in which the licensed professional is eligible to
practice any kind of medicine.
The Bar exam includes both multiple-choice and constructed-response
components. Both require examinees to be able to gather and synthesize
information and apply their knowledge to the given situation. The ques -
tions generally follow a vignette that describes a case or problem and
asks the examinee to determine the issues to resolve before advising the
client or to determine other information needed in order to proceed. For
instance, what questions should be asked next? What is the best strategy
to implement? What is the best defense? What is the biggest obstacle to
relief? The questions may require the examinee to synthesize the law and
the facts to predict outcomes. For instance, is the ordinance constitutional?
Should a conviction be overturned?
The MBE
The purpose of the MBE is to assess the extent to which an examinee
can apply fundamental legal principles and legal reasoning to analyze a
given pattern of facts. The questions focus on the understanding of legal
principles rather than memorization of local case or statutory law. The
MBE consists of 60 multiple-choice questions and lasts a full day.
A sample question follows:
A woman was told by her neighbor that he planned to build a new fence
on his land near the property line between their properties. The woman
said that, although she had little money, she would contribute something
toward the cost. The neighbor spent $2,000 in materials and a day of his
time to construct the fence. The neighbor now wants her to pay half the
cost of the materials. Is she liable for this amount?
OCR for page 31
31
ASSESSING COGNITIVE SKILLS
The MEE
The purpose of the MEE is to assess the examinee’s ability to (1)
identify legal issues raised by a hypothetical factual situation; (2) separate
material that is relevant from that which is not; (3) present a reasoned
analysis of the relevant issues in a clear, concise, and well-organized
composition; and (4) demonstrate an understanding of the fundamental
legal principles relevant to the probable resolution of the issues raised by
the factual situation.
The MEE lasts for 6 hours and consists of nine 30-minute questions.
An excerpt from a sample question follows:
The CEO/chairman of the 12-member board of directors (the Board) of
a company plus three other members of the Board are senior officers
of the company. The remaining eight members of the Board are wholly
independent directors.
Recently, the Board decided to hire a consulting firm to market a new
product . . .
The CEO disclosed to the Board that he had a 25% partnership interest
in the consulting firm. The CEO stated that he would not be involved
in any work to be performed by the consulting firm. He knew but did
not disclose to the Board that the consulting firm’s proposed fee for this
consulting assignment was substantially higher than it normally charged
for comparable work . . .
The Board discussed the relative merits of the two proposals for 10 min-
utes. The Board then voted unanimously (CEO abstaining) to hire the
consulting firm . . .
1. Did the CEO violate his duty of loyalty to his company? Explain.
2. Assuming the CEO breached his duty of loyalty to his company, does
he have any defense to liability? Explain.
3. Did the other directors violate their duty of care? Explain.
The MPT
The purpose of the MPT is to assess fundamental lawyering skills in
realistic situations by asking the candidate to complete a task that a begin -
ning lawyer should be able to accomplish. The MPT requires applicants to
sort detailed factual materials; separate relevant from irrelevant facts; ana-
lyze statutory, case, and administrative materials for relevant principles of
law; apply relevant law to the facts in a manner likely to resolve a client’s
problem; identify and resolve ethical dilemmas; communicate effectively
in writing; and complete a lawyering task within time constraints.
Each task is completely self-contained and includes a file, a library,
and a task to complete. The task might deal with a car accident, for
OCR for page 32
32 ASSESSING 21ST CENTURY SKILLS
example, and therefore might include a file with pictures of the accident
scene and depositions from the various witnesses, as well as a library with
relevant case law. Examinees are given 90 minutes to complete each task.
For example, in a case involving a slip and fall in a store, the task
might be to prepare an initial draft of an early dispute resolution for a
judge. The draft should candidly discuss the strengths and weaknesses of
the client’s case. The file would contain the instructional memo from the
supervising attorney, the local rule, the complaint, an investigator’s report,
and excerpts of the depositions of the plaintiff and a store employee. The
library would include a jury instruction concerning the premises liability
with commentary on contributory negligence.
Scoring
The MBE is a multiple-choice test and thus scored by machine. How-
ever, the other two components require human scoring. The NCBE pro -
duces the questions and the grading guidelines for the MEE and MPT,
but the essays and performance tests are scored by the jurisdictions them-
selves. The scorers are typically lawyers who are trained during grading
seminars held at the NCBE offices, after the exam is administered. At this
time, they review sample papers and receive training on how to apply the
scoring guidelines in a consistent fashion.
Each component of the Bar examination (MBE, MEE, MPT) is intended
to assess different skills. The MBE focuses on breadth of knowledge, the
MEE focuses on depth of knowledge, and the MPT focuses on the ability
to demonstrate practical skills. Together, the three formats cover the dif-
ferent types of tasks that a new lawyer needs to do.
Determinations about weighting the three components are left to the
jurisdictions; however, the NCBE urges them to weight the MBE score by
50 percent and the MEE and MPT by 25 percent each. The recommenda-
tion is an attempt to balance a number of concerns, including authenticity,
psychometric considerations, logistical issues, and economic concerns.
The recommendation is to award the highest weight to the MBE because
it is the most psychometrically sound. The reliability of scores on the MBE
is generally over .90, much higher than scores on the other portions, and
the MBE is scaled and equated across time. The recommended weighting
helps to ensure high decision consistency and comparability of pass/fail
decisions across administrations.
Currently the MBE is used by all but three jurisdictions (Louisiana,
Washington, and Puerto Rico). The essay exam is used by 27 jurisdictions,
and the performance test is used by 34 jurisdictions.
OCR for page 33
33
ASSESSING COGNITIVE SKILLS
Test Development
Standing test development committees that include practicing
lawyers, judges, and lawyers on staff with law schools write the test
questions. The questions are reviewed by outside experts, pretested on
appropriate populations, analyzed and revised, and professionally edited
before operational use. Case said the test development procedures for the
Bar exam are analogous to those used for the medical licensure exams.
Operation ARIES! (Acquiring Research
Investigative and Evaluative Skills)
The summary below is based on materials provided by Art Graesser,
including his presentation13 and two background papers he supplied to
the committee (Graesser et al., 2010; Millis et al., in press).
Operation ARIES! is a tutorial system with a formative assessment
component intended for high school and higher education students,
Graesser explained. It is designed to teach and assess critical thinking
about science. The program operates in a game environment intended
to be engaging to students. The system includes an “Auto Tutor,” which
makes use of animated characters that converse with students. The Auto
Tutor is able to hold conversations with students in natural language,
interpret the student’s response, and respond in a way that is adaptive to
the student’s response. The designers have created a science fiction set -
ting in which the game and exercises operate. In the game, alien creatures
called “Fuaths” are disguised as humans. The Fuaths disseminate bad
science through various media outlets in an attempt to confuse humans
about the appropriate use of the scientific method. The goal for the stu-
dent is to become a “special agent of the Federal Bureau of Science (FBS),
an agency with a mission to identify the Fuaths and save the planet”
(Graesser et al., 2010, p. 328).
The system addresses scientific inquiry skills, developing research
ideas, independent and dependent variables, experimental control, the
sample, experimenter bias, and relation of data to theory. The focus is on
use of these skills in the domains of biology, chemistry, and psychology.
The system helps students to learn to evaluate evidence intended to sup -
port claims. Some examples of the kinds of research questions/claims that
are evaluated include the following:
13 For Graesser’s presentation, see http://nrc51/xpedio/groups/dbasse/documents/
webpage/060267~1.pdf [August 2011].
OCR for page 34
34 ASSESSING 21ST CENTURY SKILLS
From Biology:
• o chemical and organic pesticides have different effects on food
D
quality?
• oes milk consumption increase bone density?
D
From Chemistry:
• oes a new product for winter roads prevent water from freezing?
D
• oes eating fish increase blood mercury levels?
D
From Psychology:
• oes using cell phones hurt driving?
D
• s a new cure for autism effective?
I
The system includes items in real-life formats, such as articles, advertise -
ments, blogs, and letters to the editor, and makes use of different types of
media where it is common to see faulty claims.
Through the system, the student encounters a story told by video,
combined with communications received by e-mail, text message, and
updates. The student is engaged through the Auto Tutor, which involves
a “tutor agent” that serves as a narrator, and a “student agent” that serves
in different roles, depending on the skill level of the student.
The system makes use of three kinds of modules—interactive train-
ing, case studies, and interrogations. The interactive training exchanges
begin with the student reading an e-book, which provides the requi-
site information used in later modules. After each chapter, the student
responds to a set of multiple-choice questions intended to assess the
targeted skills. The text is interactive in that it involves “trialogs” (three-
way conversations) between the primary agent, the student agent, and the
actual (human) student. It is adaptive in that the strategy used is geared
to the student’s performance. If the student is doing poorly, the two auto-
tutor agents carry on a conversation that promotes vicarious learning: that
is, the tutor agent and the student agent interact with each other, and the
human student observes. If the student is performing at an intermediate
level, normal tutoring occurs in which the student carries on a conversa -
tional exchange with the tutor agent. If the student is doing very well, he
or she may be asked to teach the student agent, under the notion that the
act of teaching can help to perfect one’s skills.
In the case study modules, the student is expected to apply what he
or she has learned. The case study modules involve some type of flawed
science, and the student is to identify the flaws by applying information
learned from the interactive text in the first module. The student responds
by verbally articulating the flaws, and the system makes use of advances
in computational linguistics to analyze the meaning of the response. The
researchers adopted the case study approach because it “allows learners
to encode and discover the rich source of constraints and interdependen -
OCR for page 35
35
ASSESSING COGNITIVE SKILLS
cies underlying the target elements (flaws) within the cases. [Prior] cases
provide a knowledge base for assessing new cases and help guide reason-
ing, problem solving, interpretation and other cognitive processes” (Millis
et al., in press, p. 17).
In the interrogation modules, insufficient information is provided,
so students must ask questions. Research is presented in an abbrevi-
ated fashion, such as through headlines, advertisements, or abstracts.
The student is expected to identify the relevant questions to ask and to
learn to discriminate good research from flawed research. The storyline is
advanced by e-mails, dialogues, and videos that are interspersed among
the learning activities.
Through the three kinds of modules, the system interweaves a vari-
ety of key principles of learning that Graesser said have been shown to
increase learning. These include
• Self-explanation (where the learner explains the material to
another student, such as the automated student)
• Immediate feedback (through the tutoring system)
• Multimedia effects (which tend to engage the student)
• Active learning (in which students actually participate in solving
a problem)
• Dialog interactivity (in which students learn by engaging in con-
versations and tutorial dialogs)
• Multiple, real-life examples (intended to help students transfer
what they learn in one context to another context and to real
world situations)
Graesser closed by saying that he and his colleagues are beginning
to collect data from evaluation studies to examine the effects of the Auto
Tutor. Research has focused on estimating changes in achievement before
and after use of the system, and, to date, the results are promising.
Packet Tracer
The summary below is based on materials provided by John Behrens,
including his presentation14 and a background paper he forwarded in
preparation for the workshop (Behrens et al., in press).
To help countries around the world train their populations in net-
working skills, Cisco created the Networking Academy. The academy is
a public/private partnership through which Cisco provides free online
14 For Behrens’ presentation, see http://www7.national-academies.org/bota/21st_
Century_Workshop_Behrens.pdf [August 2011].
OCR for page 36
36 ASSESSING 21ST CENTURY SKILLS
curricula and assessments. Behrens pointed out that in order to become
adept with networking, students need both a conceptual understanding
of networking and the skills to apply this knowledge to real situations.
Thus, hands-on practice and assessment on real equipment are important
components of the academy’s instructional program. Cisco also wants
to provide students with time for out-of-class practice and opportuni -
ties to explore on their own using online equipment that is not typically
available in the average classroom setting. In the Networking Academy,
students work with an online instructor, and they proceed through an
established curriculum that incorporates numerous interactive activities.
Behrens talked specifically about a new program Cisco has devel-
oped called “Packet Tracer,” a computer package that uses simulations
to provide instruction and includes an interactive and adaptable assess -
ment component. Cisco has incorporated Packet Tracer activities into the
curricula for training networking professionals. Through this program,
instructors and students can construct their own activities, and students
can explore problems on their own. In Cisco’s Networking Academy,
assessments can be student-initiated or instructor-initiated. Student-
initiated assessments are primarily embedded in the curriculum and
include quizzes, interactive activities, and “challenge labs,” which are a
feature of Packet Tracer. The student-initiated assessments are designed
to provide feedback to the student to help his or her learning. They use
a variety of technologies ranging from multiple-choice questions (in the
quizzes) to complex simulations (in the challenge labs). Before the devel -
opment of Packet Tracer, the instructor-initiated assessments consisted
either of hands-on exams with real networking equipment or multiple-
choice exams in the online assessment system. Packet Tracer provides
more simulation-based options, and also includes detailed reporting and
grade-book integration features.
Each assessment consists of one extensive network configuration or
troubleshooting activity that may require up to 90 minutes to complete.
Access to the assessment is associated with a particular curricular unit,
and it may be re-accessed repeatedly based on instructor authorization.
The system provides simulations of a broad range of networking devices
and networking protocols, including features set around the Cisco IOS
(Internet Operating System). Instructions for tasks can be presented
through HTML-formatted text boxes that can be preauthored, stored,
and made accessible by the instructor at the appropriate time.
Behrens presented an example of a simulated networking problem
in which the student needs to obtain the appropriate cable. To com-
plete this task, the student must determine what kind of cable is needed,
where on the computer to plug it in, and how to connect it. The student’s
performance is scored, and his or her interactions with the problem are
OCR for page 37
37
ASSESSING COGNITIVE SKILLS
tracked in a log. The goal is not to simply assign a score to the student’s
performance but to provide detailed feedback to enhance learning and to
correct any misinterpretations. The instructors can receive and view the
log in order to evaluate how well the student understands the tasks and
what needs to be done.
Packet Tracer can simulate a broad range of devices and networking
protocols, including a wide range of PC facilities covering communi-
cation cards, power functionality, web browsers, and operating system
configurations. The particular devices, configurations, and problem states
are determined by the author of the task (e.g., the instructor) in order to
address whatever proficiencies the chapter, course, or instruction targets.
When icons of the devices are touched in the simulator, more detailed pic-
tures are presented with which the student can interact. The task author
can program scoring rules into the system. Students can be observed
trying and discarding potential solutions based on feedback from the
game resulting in new understandings. The game encourages students
to engage in problem-solving steps (such as problem identification, solu -
tion generation, and solution testing). Common incorrect strategies can
be seen across recordings.
OCR for page 38