Assessment to Guide Teaching and Learning
This chapter summarizes presentations and discussions related to promising practices in assessment, including the use of concept inventories and an example of how research and assessment can inform instructional improvements.
CONCEPT INVENTORIES IN THE SCIENCES: EXAMPLES FROM THE GEOSCIENCES CONCEPT INVENTORY
Julie Libarkin (Michigan State University) discussed concept inventories in the sciences. She explained that concept inventories are multiple-choice assessments that are designed to diagnose areas of conceptual difficulty prior to instruction and evaluate changes in conceptual understanding related to a specific intervention (Libarkin, 2008). Incorrect response options for each question often are written to reflect students’ misconceptions.
Libarkin said she views concept inventories as a valuable and necessary first step to investigate science learning across institutions. She remarked on their proliferation, noting that she found 23 inventories in various science domains as she was preparing for the workshop.
Using the geosciences concept inventory (GCI) as an example, Libarkin described the development cycle for concept inventories. She and her colleagues began the development process by reviewing textbooks to identify the most important geosciences concepts to cover. Although most inventories target a specific concept in the sciences (e.g., force or natural selection), the GCI covers the geosciences as a whole; it is a bank of 69 questions that are related through a psychometric technique called item-response theory.
Libarkin explained that is possible to create subinstruments from the CGI to focus on specific topics, but it is unique among concept inventories because each subinstrument is statistically related to the others and to the whole.
The next step was to collect data on students’ alternate conceptions through interviews and open-ended surveys. After that, an external team of science educators, psychometricians, and geologists reviewed the instrument. Using information from students and the external reviewers, the developers created and field-tested a pilot concept inventory.
Faculty members whose students were involved in the pilot test also reviewed the instrument. Libarkin described a situation in which this review resulted in changes to the inventory. One question asked about the coexistence of humans and dinosaurs. The 30th person to review the instrument, a biology professor, pointed out that birds are dinosaurs. Because students who know that birds are classified as dinosaurs might respond that humans and dinosaurs coexisted, the GCI development team reworded that question.
After pilot-testing the inventory, the development team performed statistical analyses on the items, conducted interviews with students to better understand their responses to the questions, and revised the instrument. In all, Libarkin said the development of the GCI took two and one-half years.
Cautioning that data are only as good as the tools used to gather them, Libarkin identified some of the considerations that are involved in developing concept inventories. First, she reviewed the terminology related to multiple-choice questions. The question itself is called the stem, and incorrect response options are called distractors.
Libarkin then provided a checklist for developing multiple-choice assessment questions. The checklist began with guiding questions, such as “Is the topic covered by this question important for geosciences understanding?” “From the perspective of an expert geoscientist, does the question actually measure some aspect of geosciences understanding?” “Would a test-taker interpret this question, including both the stem and the response options, in the same way as intended by the test developer?”
The checklist also included several rules for creating sound multiple-choice questions. Using those rules as a guideline, Libarkin analyzed a question from the first version of the GCI. She noted that the question violated several of the rules and explained how the development team revised it to be more consistent with the rules.
Observing that concept inventories serve several purposes, Libarkin explained that the importance of the question quality varies with the purpose. For example, if the purpose is to document alternative conceptions to “wake up” faculty, the style of the questions might not matter. The question
format matters more if the purpose is to evaluate learning for instruction, and it is very important if the purpose of the concept inventory is to assess learning for research.
CONCEPT INVENTORIES IN ENGINEERING
Teri Reed-Rhoads (Purdue University) observed that although engineering lags behind science in terms of developing concept inventories, the few engineering concept inventories available are increasingly being used for such purposes as accreditation, grant proposals, and grant project accountability. In addition, she explained that engineering faculty members are beginning to use concept inventories to facilitate changes in pedagogy aimed at increasing student learning.
Reed-Rhoads defined engineering concept inventories as those that are developed by engineers, either on their own or in collaboration with others. Using this definition, Reed-Rhoads identified 21 engineering concept inventories, 6 of which she labeled as science, technology, engineering, and mathematics (STEM) concept inventories, which were developed by or in conjunction with engineers and focused on nonengineering-related subjects.1
Discussing the relative maturity of engineering concept inventories, Reed-Rhoads pointed out that many more examinees have taken the statics concept inventory than the other engineering-related concept inventories, and that its growth has been exponential. For example, between year 2 and year 3 of its existence, the cumulative number of examinees for the statics inventory jumped from about 300 to about 1,700, further increasing to 2,700 in year 4 (Reed-Rhoads and Imbrie, 2008). In contrast, the cumulative number of examinees for the systems and signals inventory steadily grew from about 300 in year 1 to about 500 in year 2 to slightly more than 800 in year 3. She also explained that, because concept inventories take years to develop (as noted by Libarkin), there is often a significant lag time between their development and a discernible effect on instructional practices.
In engineering, concept inventory developers initially were slow to analyze the psychometric properties of engineering concept inventories, said Reed-Rhoads. She observed, however, that developers are increasingly collaborating with psychometricians to analyze and validate their instruments. She also noted that the research base on students’ engineering misconceptions is lagging behind those in some of the other sciences. This lag complicates the development of the concept inventories; in other disciplines
The specific concept inventories are listed in the workshop paper by Reed-Rhoads (see http://www.nationalacademies.org/bose/Reed_Rhoads_CommissionedPaper.pdf).
the inventory developers draw on existing research about misconceptions, whereas in engineering, the concept inventories drive the definitions of the misconceptions (Reed-Rhoads and Imbrie, 2008).
Reed-Rhoads identified gaps in the research related to engineering concept inventories. First, she explained that concept inventories have been used only in the basic engineering courses so far, which means that upper division courses and subject areas are sparsely represented. In addition, although some research indicates that examinees’ attitudes and beliefs about a field of study might influence assessment results in that field (Gal and Ginsburg, 1994), few of the engineering concept inventories have related instruments that measure the affective and cognitive domains.
Another gap in the research is that engineering concept inventories have not been extensively studied for the various types of bias that might be included in the questions (Reed-Rhoads and Imbrie, 2008). These biases include how gender, race/ethnicity, native language, and culture might affect student scores on the inventories. The understanding of bias in engineering concept inventories is limited because not enough students from different subpopulations have used the instrument; with such low sample numbers, the statistics for each subgroup are not reliable. However, Reed-Rhoads noted that although women are the most underrepresented population in engineering, enough women have used the concept inventories to allow for some statistical testing related to gender bias.
Reed-Rhoads also observed that the relationships among concept inventories is important but not well understood. She emphasized the need to track students’ conceptual development, which requires greater knowledge of how the concept inventories fit together. She argued that this need is becoming increasingly important as concept inventories proliferate.
The final gap relates to helping faculty members use concept inventories to change their practices. To this end, Reed-Rhoads and her colleagues created a community of inventory developers, faculty members, and students called ciHUB (short for concept inventory hub) to provide access to resources that can facilitate collaboration and the use of research-based tools to improve instruction.
IDENTIFYING AND ADDRESSING STUDENT DIFFICULTIES IN PHYSICS
Karen Cummings delivered a presentation by Paula Heron (University of Washington) on work by Heron and her colleagues in the University of Washington’s Physics Education Group.2 This group conducts a coordi-
See the workshop paper by Heron (see http://www.nationalacademies.org/bose/Heron_CommissionedPaper.pdf).
nated program in which research, curriculum development, and instruction are tightly linked in an iterative cycle. One of the group’s major curriculum development projects, Tutorials in Introductory Physics (McDermott, Shaffer, and the Physics Education Group at the University of Washington, 2002), was the focus of the presentation.
Cummings explained that the Physics Education Group developed the tutorials to supplement instruction in an introductory, calculus-based physics course at the University of Washington that is required for all physics majors. Approximately 1,000 students are enrolled in the course at any time. The course meets for three 50-minute classes and one 3-hour laboratory each week. Each course also has a 50-minute tutorial each week, and students have weekly online homework that is linked to the lecture material. They also are assessed through three mid-term exams and a final exam that contain material from the lectures, labs, and tutorials. Because the course is similar in structure and content to many others in colleges and universities throughout the United States, the setting is well suited for the development and assessment of instructional materials that can be adopted at other institutions.
In the weekly tutorials, students work through carefully structured worksheets in small groups, and instructors question them in a semi-Socratic manner. Designed to fit within the constraints imposed by large lecture-based courses, the research-based tutorials foster the development of reasoning skills and conceptual understanding.
Tutorial development depends on systematic investigations of student learning at the beginning of, during, and after instruction, including ongoing individual student interviews to probe their understanding in depth (Heron, Shaffer, and McDermott, 2008). Based on those interviews, the researchers write open-ended questions to ascertain the prevalence of specific difficulties. They also conduct descriptive studies in the classroom to further inform the development of their curriculum materials.
These tutorials have been assessed extensively at the University of Washington and at many of the dozens of institutions that have adopted them. At the University of Washington, students who completed the tutorial were given a posttest with questions that could not be answered by memorization. Eighty percent of the students gave a correct or nearly correct answer (compared with 20 percent without the tutorial) (Heron, Shaffer, and McDermott, 2008). Results from other institutions that have used the University of Washington tutorials include the following:
Learning gains in introductory physics courses that used tutorials at the University of Colorado were much higher than is typical in introductory courses (Finkelstein and Pollack, 2005).
At Montana State University, a longitudinal study showed that nonmajors retained gains they made in understanding force—as
measured by the Force Concept Inventory (FCI)—up to 3 years after completing an introductory physics that used the tutorials (Francis, Adams, and Noonan, 1998).
In Harvard University physics classes that used a variety of interactive strategies—including the University of Washington tutorials—the gender gap between the FCI scores of male and female students disappeared (Lorenzo, Crouch, and Mazur, 2006).
After a large introductory physics course at the University of Colorado that used tutorials, Finkelstein and Pollack (2005) did not observe the shift toward unfavorable attitudes about physics that typically occurs in those courses.
Based on these results, Heron, Shaffer, and McDermott (2008) posited that additional assessments would be valuable in the areas of student reasoning skills, student ability to transfer conceptual knowledge to quantitative problems, and student ability to apply concepts and principles in subsequent courses.
At the workshop, Cummings characterized Tutorials in Introductory Physics as an example of how research can guide the improvement of instruction within the practical constraints of courses with large enrollments. She explained that the tutorials and other research-based instructional materials are most successful when the developers invest sustained effort in their continuous improvement and in supporting adopters. She ended by noting that the growth in STEM departments of groups and individuals who devote their scholarly effort to conducting research on teaching and learning in the science disciplines is the truly promising practice in STEM education (Heron, Shaffer, and McDermott, 2008).
Before taking questions from the audience, the panelists reflected on each others’ presentations. Cummings remarked about the dearth of published concept inventories in chemistry and noted that researchers in all disciplines would benefit from the information Libarkin and Reed-Rhoads presented about the process of developing concept inventories. Reed-Rhoads agreed that disseminating information about the development and appropriate use of concept inventories is important. She stressed the need for a “Good Housekeeping seal of approval” for concept inventories. She and Libarkin also discussed the need to warehouse and analyze the data collected from concept inventories. Libarkin added that she would like to see the disciplinary communities be trained to use and improve the tools.
David Mogk and William Wood expressed concerns about the inappropriate dissemination and use of concept inventories. In response, Libarkin
explained her view that concept inventories are useful as a snapshot of students’ understanding of one or more targeted concepts, and that other assessment methods provide a deeper look at students’ mental models. She agreed that it is important for concept inventories to be aligned with the assessment purpose. Reed-Rhoads expressed the view that widespread dissemination is beneficial as long as the authors of the concept inventories have access to the resulting data so they can improve the instrument.
Kenneth Heller pointed out that the FCI is not about forces and is not a concept inventory. Rather, it is an instrument about misconceptions that is based on the misconception research. Although the instrument is reliable, Heller stressed that it is not a predictor of students’ success in introductory physics. He asked the presenters whether they are trying to replicate the success of the FCI or develop concept inventories that may or may not have the same properties as the FCI. Libarkin and Reed-Rhoads said their respective communities (geosciences and engineering) are trying to do both. Cummings agreed with Heller’s assessment of the FCI and emphasized the importance of being clear about what these instruments measure.
Heidi Schweingruber asked the concept inventory developers to elaborate on the link between concept inventories and instructional change. Cummings responded that the University of Washington Physics Education Research Group gets feedback on strategies that work and do not work to foster conceptual understanding and uses that feedback to develop curriculum materials. The Physics Education Research Group works with professors who adopt the materials to ensure that they have the support they need to implement the materials effectively.
Responding to a question from Jay Labov (National Research Council), Libarkin and Cummings said that concept inventories do not measure whether students will have enough knowledge of science to make informed decisions later in their lives. Cummings added that this gap suggests a need for additional research and instrumentation.