![]() |
![]() |
![]() |
![]() |
6 Testing and Grading
In addition to the informal assessments described in Chapter 5, more formal assessments of student progress provide important gauges of student learning. At most institutions, testing students and assigning them grades provide the bases for such evaluations. Grading practices in a course can both motivate students and define the goals of a course. Grades may influence students' decisions to select a field as a major or a career (Seymour and Hewitt, 1994). Although a course may offer many activities and learning opportunities, faculty members declare what is important in their courses by their decisions of what to grade, how to test, and how much a particular score counts toward the final grade. Students who measure their learning by the grade they receive tend to invest time only in aspects of a course that clearly affect their grade. Aspects of a course that may be part of a final grade include tests, weekly problem assignments, oral or written reports, library research projects, essays, group projects, and laboratory performance. For each activity, students deserve to know how they will be graded, and they deserve careful, detailed, and timely comments on their performance. For each aspect of your course, try to identify the important skills, learning, and accomplishments you hope to measure, then determine how to grade them fairly and equitably.
Laboratory activities involve aspects of reasoning, teamwork, experimental design, data acquisition and recording, data analysis, discussion, interpretation, and reporting. One way of grading labs (Kandel, 1989; Joshi, 1991) is to assess the following:
Rondini and Feighan (1978) describe a chemistry lab in which they give students at the end of each lab a numerical score for specific attributes, such as the product yield, equipment setup, handling of chemicals, purity of product, time to completion, technique, safety procedures. These scores are added to the grades for their lab reports and notebooks. Thus, students know quickly what aspects of their lab techniques need improvement and can use this information as a catalyst for change. Joshi (1991) asks students to prepare and submit their lab reports on-line. The computer checks and grades the quality of input data; performs and displays the necessary calculations; checks and grades students' calculations and accuracy of the results generates a grade report; and displays the grading scheme used. When assigning essays or written reports as activities for grading, explain to students the important aspects of the assignment and describe how it will be graded. One might include content, research, references, reasoning, data analysis and clear expression (see sidebar for an example). Another aid to student learning is to grade first drafts and give students a chance to resubmit an improved version. If instructor time is a significant deterrent to this approach, students can exchange draft reports with a partner or gather in a group and critique one another's drafts. Oral reports and presentations can be difficult to grade, especially when students have little experience with this skill. It can be hard to overlook poor delivery and focus on content. Some faculty members develop a scoring rubric that weight these two components unequally, and which give credit for effective use of visuals. When students do more than one presentation in a term, the weight given to delivery is increased to reflect the expectation that they will have improved with experience. Group activities are difficult to grade on an individual basis. Most instructors find that a good way to grade a group is to make the entire group responsible for the answers, presentation, and results, by giving each group member the same grade. This encourages stronger students to help less able students. Observing the groups in action will give you an idea of how each participant performs as a partner. Students are also quite cognizant of their contribution and their fellow classmates' contribution. One approach is to ask students to estimate the percentage of the final project that can be attributed to each group member, including themselves. You can use these ratings from all members to construct a participation score, so that there are slight differences when one group member contributes significantly more or less than the others. Some recommend that group activity grades account for only a small portion of a student's overall grade in the class (Johnson et al., 1991). You will need to decide how to address homework problems, if you feel that these are an important aspect of student learning. If you choose not to collect and grade them, many students will interpret that as a signal that you do not consider them important. However, some faculty get around this problem by duplicating some of the assigned problems on their tests. If you choose to make homework a part of the final course grade, you need to make a number of decisions. What percentage of the overall grade should it be? Will students work alone or in groups? Will they submit individual papers or a single answer set for the group? |
The English department at Dickinson College conducts a seminar for faculty teaching in the Freshman Seminar program, to help them learn to teach writing to new students and to evaluate students' assignments. They suggest assigning a percentage to the various categories shown below, with approximately equal weight given to content and presentation. When students hand in a rough draft, they recommend assigning it a nominal percentage, and grading it on the basis of whether the student has made reasonable progress on the assignment. The grade sheet typically occupies a full page, with adequate space left for instructor comments. A sample grading sheet is shown. Evaluation of Paper #1 Quality of effort on draft = 10%
|
Ideally, tests measure students' achievement of the educational goals for the course, and the test items sample the content and skills that are most important for students to learn. Tests usually ask students questions about material that is most essential to the discipline. A well-constructed test measures a range of cognitive skills, not just students' recall of facts. However, it is unlikely "that research will ever demonstrate clearly which form of examination, essay or objective, has the more beneficial influence on study and learning" (Ebel and Frisbie, 1986). Your choice of examination form will need to take into account many factors such as the time available for students to take the test, the amount of time you have available to grade it, and what you wish to measure. Some goals and methods of testing, adapted from Fuhrmann and Grasha (1983) are:
There are a limited number of standard formats for exam questions. Multiple choice questions can measure students' mastery of details, specific knowledge as well as complex concepts. Because multiple choice test items can be answered quickly, you can assess students' grasp of many topics in an hour exam. Although multiple choice test items are easily scored, good multiple choice questions can be challenging to write (see sidebar on page 42). Short answer questions can require one or two sentences or brief paragraphs. They are easier to write than multiple choice tests but take longer to score, and may not be as useful as essay exams to measure the depth of student understanding. Essay questions probe students' understanding of broad issues and general concepts. They can measure how well students are able to organize, integrate, and synthesize material and apply information to new situations. Unlike multiple choice tests, you can only pose a few essay questions in an hour. Further, essay tests are sometimes difficult to grade. |
Writing Effective Multiple Choice Questions One of the best ways to identify useful wrong answers for multiple-choice items is first to ask the question in a free response format. When the free-response tests are graded, look for common errors or misconceptions and tally them. If what went wrong is not clear from a students' response, ask the student to explain how he or she went about answering the question when the papers are returned. Then use common errors as the wrong answers for multiple-choice questions. After several years of this activityless if you share items with colleaguesyou will have a sizable bank of good multiple choice questions and understand common misconceptions and errors well enough to construct suitable multiple-choice questions without going through the preliminary step of giving free-response items first. Herron, 1996 |
Problem solving forms the core of many science courses, and numerical problems are prominent on many exams in these courses. As noted in Chapter 4, students who successfully answer these test questions do not necessarily grasp the underlying concept (Gabel and Bunce, 1994). Traditional numeric problems can incorporate some sort of conceptual essay section which measures the students' understanding of the concepts involved as well as their ability to use algorithms to solve problems. Nakhleh and Mitchell (1993) offer a sample of multiple choice questions for a limited number of chemistry concepts, in which the answers are pictorial representations of molecular events. Although you may find it difficult to develop an appropriate set of possible answers (see sidebar on multiple choice tests), asking students to draw a picture of the phenomenon described in the numerical problem is a good way to test their conceptual understanding. |
Helping Students Learn from Exams Students often learn more from their tests if there are detailed written
comments about their errors. Try commenting on individual tests or posting
a key that includes a preferred solution method, alternative solutions,
and commentary on common errors and the flaws in the reasoning behind them.
Words of encouragement on students' papers mean a lot to them, and can motivate
them to study harder for the next test or work harder on the next assignment.
|
There are two general approaches to assigning grades: criterion-referenced grading and norm-referenced grading. In criterion-referenced grading, students' grades are based on an absolute scale established by the instructor before the exam is graded. If all the students in a class achieve 80 percent or higher on an exam, they will all receive A's or B's. Conversely, if none of the students in a class scores better than 80 percent, then no one in the class receives a grade higher than B- for that test. Criterion-referenced grading meets three important standards: any number of students can earn A's and B's; the focus is on learning and mastery of material; final grades reflect what students know, compared to the teacher's standards. There are various ways to identify the criterion (standard) for each letter grade. Ory and Ryan (1993) describe a strategy that involves determining the number of items on a test that students need to answer correctly to achieve a C (typically those items written at the basic knowledge or comprehension levels), adding to that minimum the number of additional items for a B (questions written at higher levels) and for an A, and then working back to D and F. Criterion-referenced grading requires skill and experience in writing exams and establishing the grading scale. New teachers are advised to consult with experienced colleagues before using this approach. |
What Do the Numbers Mean? Science and mathematics teachers are quantitatively skilled, but how
accurate, objective, and meaningful are their test scores? Despite the apparent
objectivity of the numerical result, it is important to remember that there
is subjectivity in the selecting and weighting of questions and in assigning
numerical values or deducting points for missing parts of answers. The uncertainty
of the numbers depends on how those scores were determined. |
Secondly, norm-referenced grading, often called grading on a curve, measures a student's achievement relative to other students in the class. Faculty members uncomfortable with setting absolute standards or unsure of the difficulty level of their exams may chose to grade on a curve as a way to renormalize the class scores. Many traditional grading systems used in science classes put students in competition with their classmates and limit the number of high grades. Research indicates that normative systems such as grading on the curve can reduce students' motivation to learn, and increase the likelihood of academic dishonesty and evaluation anxiety (Crooks, 1988; McKeachie, 1994). Normative grading also serves to discourage effective group studying or other work, because assisting a classmate inherently decreases the value of the work of other students in the class. In addition, a grade assigned on the curve does not indicate how much or how little students have learned, only where they stand in relation to the class. Some faculty members try to compensate for inequities by adjusting the cutoff scores or by assigning a higher percentage of A's than usual if the class is especially good. |
Scoring Your Tests When scoring a test, as with designing, it is a good idea to decide
whether the objective is to see what students know and what they have learned
or to identify specific things they do not know or cannot do. The objectives
of the test are implicit in its design and grading rubric. In any case,
a specific scoring strategy (giving points for things done or deducting
points for missing items) is recommended. If questions have multiple parts,
plan your scoring strategy so that students who stumble on the first part
do not lose all of the points. Many teachers find it easiest and most uniform
to grade all students on a particular question at the same time. Keeping
the student's identity unknown as you grade the test is also a good practice,
because it helps minimize any bias in grading. |
Another form of norm-referenced grading is to assign grades according to breaks in the distribution. In this model, scores are arranged from highest to lowest, and notable gaps or breaks in the distribution are located. For example, on a midterm totaling 100 points, eight students score 81 or higher and three students score 75; no one scores between 80 and 76. Instructors using this model will assign A's to students who scored 81 and above, and start the B's at 75. One disadvantage of this assumption is that these breaks may not represent true differences in achievement, so the magnitude of the gaps in scores should be taken into account. A further disadvantage of this model of grading is that the grade distribution depends on judgments made after students have taken the test rather than on guidelines that are established before testing. When students' scores are fairly well distributed across a wide range, different approaches often yield similar grades. However, if the overall performance of the class is either low or high, the model used matters a great deal. When many students have done well on an exam, for example, everyone who did well will receive an A or B under criterion-referenced grading. When many students have done poorly, grading on a curve ensures that at least some will receive A's or B's.
|
|
If you want to reward improvement, one way is to give students bonus points at the end of the term to acknowledge steady improvement throughout the semester. Alternatively, some instructors offer students a chance to drop a weak exam grade, replace it with their performance on a comprehensive final exam, or complete some credit-granting exercise which demonstrates an improved understanding of the material covered on the exam. Other faculty members allow students to correct their exams and resubmit their answers for a specified amount of additional credit. |
![]() |
![]() |
![]() |
![]() |
![]() |