Libarkin explained that is possible to create subinstruments from the CGI to focus on specific topics, but it is unique among concept inventories because each subinstrument is statistically related to the others and to the whole.
The next step was to collect data on students’ alternate conceptions through interviews and open-ended surveys. After that, an external team of science educators, psychometricians, and geologists reviewed the instrument. Using information from students and the external reviewers, the developers created and field-tested a pilot concept inventory.
Faculty members whose students were involved in the pilot test also reviewed the instrument. Libarkin described a situation in which this review resulted in changes to the inventory. One question asked about the coexistence of humans and dinosaurs. The 30th person to review the instrument, a biology professor, pointed out that birds are dinosaurs. Because students who know that birds are classified as dinosaurs might respond that humans and dinosaurs coexisted, the GCI development team reworded that question.
After pilot-testing the inventory, the development team performed statistical analyses on the items, conducted interviews with students to better understand their responses to the questions, and revised the instrument. In all, Libarkin said the development of the GCI took two and one-half years.
Cautioning that data are only as good as the tools used to gather them, Libarkin identified some of the considerations that are involved in developing concept inventories. First, she reviewed the terminology related to multiple-choice questions. The question itself is called the stem, and incorrect response options are called distractors.
Libarkin then provided a checklist for developing multiple-choice assessment questions. The checklist began with guiding questions, such as “Is the topic covered by this question important for geosciences understanding?” “From the perspective of an expert geoscientist, does the question actually measure some aspect of geosciences understanding?” “Would a test-taker interpret this question, including both the stem and the response options, in the same way as intended by the test developer?”
The checklist also included several rules for creating sound multiple-choice questions. Using those rules as a guideline, Libarkin analyzed a question from the first version of the GCI. She noted that the question violated several of the rules and explained how the development team revised it to be more consistent with the rules.
Observing that concept inventories serve several purposes, Libarkin explained that the importance of the question quality varies with the purpose. For example, if the purpose is to document alternative conceptions to “wake up” faculty, the style of the questions might not matter. The question