Read "Science Teaching Reconsidered: A Handbook" at NAP.edu

Page 39 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

6
Testing and Grading

Ways to assess student learning
Goals for tests
Suggestions to help students do better on exams
Descriptions of common testing methods
Issues to consider when assigning test grades

In addition to the informal assessments described in Chapter 5, more formal assessments of student progress provide important gauges of student learning. At most institutions, testing students and assigning them grades provide the bases for such evaluations. Grading practices in a course can both motivate students and define the goals of a course. Grades may influence students' decisions to select a field as a major or a career (Seymour and Hewitt, 1994). Although a course may offer many activities and learning opportunities, faculty members declare what is important in their courses by their decisions of what to grade, how to test, and how much a particular score counts toward the final grade. Students who measure their learning by the grade they receive tend to invest time only in aspects of a course that clearly affect their grade.

Aspects of a course that may be part of a final grade include tests, weekly problem assignments, oral or written reports, library research projects, essays, group projects, and laboratory performance. For each activity, students deserve to know how they will be graded, and they deserve careful, detailed, and timely comments on their performance. For each aspect of your course, try to identify the important skills, learning, and accomplishments you hope to measure, then determine how to grade them fairly and equitably.

GRADING SPECIFIC ACTIVITIES

Laboratory activities involve aspects of reasoning, teamwork, experimental design, data acquisition and recording, data analysis, discussion, interpretation, and reporting. One way of grading labs (Kandel, 1989; Joshi, 1991) is to assess the following:

Page 40 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

Understanding the results, whether or not they agree with expectations.
Decision-making skills based on results both expected and unanticipated (application of theory).
Method of recording, presenting, and analyzing data; observations and results (the notebook and final report).
Performance of physical manipulations (technique).

Rondini and Feighan (1978) describe a chemistry lab in which they give students at the end of each lab a numerical score for specific attributes, such as the product yield, equipment setup, handling of chemicals, purity of product, time to completion, technique, safety procedures. These scores are added to the grades for their lab reports and notebooks. Thus, students know quickly what aspects of their lab techniques need improvement and can use this information as a catalyst for change. Joshi (1991) asks students to prepare and submit their lab reports on-line. The computer checks and grades the quality of input data; performs and displays the necessary calculations; checks and grades students' calculations and accuracy of the results generates a grade report; and displays the grading scheme used.

When assigning essays or written reports as activities for grading, explain to students the important aspects of the assignment and describe how it will be graded. One might include content, research, references, reasoning, data analysis and clear expression (see sidebar for an example). Another aid to student learning is to grade first drafts and give students a chance to resubmit an improved version. If instructor time is a significant deterrent to this approach, students can exchange draft reports with a partner or gather in a group and critique one another's drafts.

Oral reports and presentations can be difficult to grade, especially when students have little experience with this skill. It can be hard to overlook poor delivery and focus on content. Some faculty members develop a scoring rubric that weight these two components unequally, and which give credit for effective use of visuals. When students do more than one presentation in a term, the weight given to delivery is increased to reflect the expectation that they will have improved with experience.

Group activities are difficult to grade on an individual basis. Most instructors find that a good way to grade a group is to make the entire group responsible for the answers, presentation, and results, by giving each group member the same grade. This encourages stronger students to help less able students. Observing the groups in action will give you an idea of how each participant performs as a partner. Students are also quite cognizant of their contribution and their fellow classmates' contribution. One approach is to ask students to estimate the percentage of the final project that can be attributed to each group member, including themselves. You can use these ratings from all members to construct a participation score, so that there are slight differences when one group member contributes significantly more or less than the others. Some recommend that group activity grades account for only a small portion of a student's overall grade in the class (Johnson et al., 1991).

You will need to decide how to address homework problems, if you feel that these are an important aspect of student learning. If you choose not to collect and grade them, many students will interpret that as a signal that you do not consider them important. However, some faculty get around this problem by duplicating some of the assigned problems on their tests. If you choose to make homework a part of the final course grade, you need to

Page 41 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

make a number of decisions. What percentage of the overall grade should it be? Will students work alone or in groups? Will they submit individual papers or a single answer set for the group?

Grading Students' Essays

The English department at Dickinson College conducts a seminar for faculty teaching in the Freshman Seminar program, to help them learn to teach writing to new students and to evaluate students' assignments. They suggest assigning a percentage to the various categories shown below, with approximately equal weight given to content and presentation. When students hand in a rough draft, they recommend assigning it a nominal percentage, and grading it on the basis of whether the student has made reasonable progress on the assignment. The grade sheet typically occupies a full page, with adequate space left for instructor comments. A sample grading sheet is shown.

Evaluation of Paper #1
Quality of effort on draft = 10%
	Score: _____
Content = 40%	Score: _____
1. Paper responds to the assignment
2. Focuses on central idea or thesis
3. Thesis supported by evidence
Organization = 25%	Score: _____
1. Paper has an introduction, development, and conclusion
2. Paragraphs coherent and focused on single idea
3. Paragraphs are related to central thesis
4. Transitions between paragraphs are logical, so that the reader can follow the development of the thesis
Mechanics = 15%	Score: _____
1. Sentence structure
2. Word usage
3. Punctuation
4. Spelling
Style = 10%	Score: _____
1. Sentences varied and not awkward
2. Language is uninflated and appropriate for a formal paper (no slang, contractions, etc.)
Paper 1 grade:__________________

THE WHY AND HOW OF TESTS

Ideally, tests measure students' achievement of the educational goals for the course, and the test items sample the content and skills that are most important for students to learn. Tests usually ask students questions about material that is most essential to the discipline. A well-constructed test measures a range of cognitive skills, not just students' recall of facts. However, it is unlikely "that research will ever demonstrate clearly which form of examination, essay or objective, has the more beneficial influence on study and learning" (Ebel and Frisbie, 1986). Your choice of examination form will need to take into account many factors such as the time available for students to take the test, the amount of time you have available to grade it, and what you wish to measure. Some goals and methods of testing, adapted from Fuhrmann and Grasha (1983) are:

To measure knowledge (recall of common terms, facts, principles, and procedures), ask students to define, describe, identify, list, outline, or select.
To measure application (solving problems, applying concepts and principles to new situations), ask students to demonstrate, modify, prepare, solve, or use.
To measure analysis (recognition of unstated assumptions or logical fallacies, ability to distinguish between facts and inferences), ask students to diagram, differentiate, infer, relate, compare, or select.
To measure comprehension (understanding of facts and principles, interpretation of material), ask students to convert, distinguish, estimate, explain, generalize, define limits for, give examples, infer, predict, or summarize.
To measure synthesis (integration of learning from different areas or solving problems by creative thinking), ask students to categorize, combine, devise, design, explain, or generate.
To measure evaluation (judging and assessing), ask students to appraise, compare, conclude, discriminate, explain, justify, or interpret.

There are a limited number of standard formats for exam questions. Multiple choice questions can measure students' mastery of details, specific knowledge as well as complex concepts. Because multiple choice test items can be answered quickly, you can assess students' grasp of many topics in an hour exam. Although multiple choice test items are easily scored, good multiple choice questions can be challenging to write (see sidebar on page 42). Short answer questions can

Page 42 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

require one or two sentences or brief paragraphs. They are easier to write than multiple choice tests but take longer to score, and may not be as useful as essay exams to measure the depth of student understanding. Essay questions probe students' understanding of broad issues and general concepts. They can measure how well students are able to organize, integrate, and synthesize material and apply information to new situations. Unlike multiple choice tests, you can only pose a few essay questions in an hour. Further, essay tests are sometimes difficult to grade.

Problem solving forms the core of many science courses, and numerical problems are prominent on many exams in these courses. As noted in Chapter 4, students who successfully answer these test questions do not necessarily grasp the underlying concept (Gabel and Bunce, 1994). Traditional numeric problems can incorporate some sort of conceptual essay section which measures the students' understanding of the concepts involved as well as their ability to use algorithms to solve problems. Nakhleh and Mitchell (1993) offer a sample of multiple choice questions for a limited number of chemistry concepts, in which the answers are pictorial representations of molecular events. Although you may find it difficult to develop an appropriate set of possible answers (see sidebar on multiple choice tests), asking students to draw a picture of the phenomenon described in the numerical problem is a good way to test their conceptual understanding.

Keep in mind that novice problem solvers take longer to locate appropriate strategies than experienced problem solvers. As a rule of thumb, it could take students ten minutes to solve a problem you might do in two minutes, so plan your test length accordingly. There are several resources to help faculty members develop, administer, and grade exams (Jacobs and Chase, 1992; Davis, 1993; Ory and Ryan, 1993).

Writing Effective Multiple Choice Questions

One of the best ways to identify useful wrong answers for multiple choice items is first to ask the question in a free response format. When the free-response tests are graded, look for common errors or misconceptions and tally them. If what went wrong is not clear from a students' response, ask the student to explain how he or she went about answering the question when the papers are returned. Then use common errors as the wrong answers for multiple choice questions.

After several years of this activity—less if you share items with colleagues—you will have a sizable bank of good multiple choice questions and understand common misconceptions and errors well enough to construct suitable multiple choice questions without going through the preliminary step of giving free-response items first.

Herron, 1996

What About Take-Home Tests?

With take-home tests, students generally work at their own pace with access to books and materials and the Internet. At institutions with a strong honor code, some faculty members provide strict guidelines about the time limit and the resources students can use on take-home tests. Take-home questions can be longer and more involved than in-class questions. Problem sets, short answer questions, and essays are the most appropriate for take-home exams. Some suggestions for giving take-home tests include:

Limit the number of words that students write.
Give students explicit instructions on what they can and cannot do, such as: Are they allowed to talk to other students about their answers? Can they work in groups? Be explicit about the consequences of violating these rules.

An alternative to a take-home test is to give out the questions in advance but ask the students to write their answers in class.

Page 43 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

Are There Advantages to Open Book Tests?

Some instructors feel that open-book tests are inappropriate for introductory courses in which very basic concepts or facts are being learned. On an open-book test, students who lack basic knowledge may waste too much time consulting references and searching for information. Although open-book tests tend to reduce student anxiety, some research has shown that students do not necessarily perform significantly better on open-book tests, and that open-book tests seem to reduce students' motivation to study (Clift and Imrie, 1981; Crooks, 1988). A compromise between open- and closed- book testing is to include with the closed book test any appropriate reference material such as equations, formulas, constants, or unit conversions.

HELPING YOUR STUDENTS PREPARE FOR EXAMS

How can you help your students do better on exams? Distributing practice exams, scheduling extra office hours before a test, arranging for review sessions before major exams, and encouraging students to study in groups (particularly in which they share solution strategies, not just answers) are all excellent ways to allay students' anxieties and enhance their performance. Early success in a course may also increase students' motivation and confidence. It is a good idea to advise students carefully before the first exam, as it often sets the tone for the rest of the course. Here are some tips for helping students prepare for tests:

Distribute sample questions and old exams to give an idea of the types of questions used.
Review with students the thought processes involved in answering test questions.
Review lists of questions and show students how to sort them by the type of reasoning or the type of solution required.
Use quizzes and midterm exams to indicate the types of questions that will appear on the final exam.

Tips to Students on How to Solve Exam Problems

read the problem carefully and identify the information that is specifically requested
list all "givens," both explicit and implicit
break the problem into smaller parts
do the easiest parts or steps first
make a rough approximation of what the solution should look like

TESTING STUDENTS THROUGHOUT THE TERM

Although many students dislike frequent tests, periodic testing during the term has been shown to improve students' performance on the final exam (Lowman, 1995). Giving two or more midterm exams also spreads out the pressure, allows students to concentrate on one chunk of material at a time, and permits students and instructors to monitor student progress more carefully. By giving students many opportunities to show what they know, faculty members can acquire a more accurate picture of students' abilities and avoid penalizing students who have an off day. For first- and second-year courses, it is common for an instructor to schedule two midterms; several shorter tests, quizzes, or writing assignments; and a final exam.

After a test, most students are anxious to see how they have done. It is a good idea to discuss the overall results in class. Returning work to students as quickly as possible encourages them to learn from their mistakes. One way to encourage this is to require students to resubmit a corrected

Page 44 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

exam. The section below on ways to encourage improvement suggests a number of ways to reflect this effort in a student's course grade.

APPROACHES TO ASSIGNING GRADES

Many faculty members, especially new instructors, feel uneasy about assigning course grades. According to Erickson and Strommer (1991), how faculty members view grades depends a great deal on their values, assumptions, and educational philosophy. For example, some faculty members consider their introductory courses for science and engineering majors to be "weeder" classes designed to separate out students who lack potential for future success in the field and they assign grades accordingly. A problem with this philosophy is that students who are weeded out leave the course with a very poor perception of science and scientists. It is important to keep in mind that even in courses intended for students who will continue in the major, the majority of students are not planning to major in that field; physics courses taken by chemistry majors and chemistry courses taken by biology majors are but two examples. Courses for non-scientists generally fall into this category. Although most faculty members see grades as a measure of how well a student has mastered information, skills, and the ability to reason scientifically, some faculty members include other factors such as classroom participation, effort, or attendance.

There are two general approaches to assigning grades: criterion-referenced grading and norm-referenced grading. In criterion-referenced grading, students' grades are based on an absolute scale established by the instructor before the exam is graded. If all the students in a class achieve 80 percent or higher on an exam, they will all receive A's or B's. Conversely, if none of the students in a class scores better than 80 percent, then no one in the class receives a grade higher than B-for that test. Criterion-referenced grading meets three important standards: any number of students can earn A's and B's; the focus is on learning and mastery of material; final grades reflect what students know, compared to the teacher's standards. There are various ways to identify the criterion (standard) for each letter grade. Ory and Ryan (1993) describe a strategy that involves determining the number of items on a test that students need to answer correctly to achieve a C (typically those items written at the basic knowledge or comprehension levels), adding to that minimum the number of additional items for a B (questions written at higher levels) and for an A, and then working back to D and F. Criterion-referenced grading requires skill and experience in writing exams and establishing the grading scale. New teachers are advised to consult with experienced colleagues before using this approach.

Secondly, norm-referenced grading, often called grading on a curve, measures a student's achievement relative to other students in the class. Faculty members uncomfortable with setting absolute standards or unsure of the difficulty level of their exams may chose to grade on a curve as a way to renormalize the class scores. Many traditional grading systems used in science classes put students in competition with their classmates and limit the number of high grades. Research indicates that normative systems such as grading on the curve can reduce students' motivation to learn, and increase the likelihood

Helping Students Learn from Exams

Students often learn more from their tests if there are detailed written comments about their errors. Try commenting on individual tests or posting a key that includes a preferred solution method, alternative solutions, and commentary on common errors and the flaws in the reasoning behind them. Words of encouragement on students' papers mean a lot to them, and can motivate them to study harder for the next test or work harder on the next assignment.

Page 45 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×

What Do the Numbers Mean?

Science and mathematics teachers are quantitatively skilled, but how accurate, objective, and meaningful are their test scores? Despite the apparent objectivity of the numerical result, it is important to remember that there is subjectivity in the selecting and weighting of questions and in assigning numerical values or deducting points for missing parts of answers. The uncertainty of the numbers depends on how those scores were determined.

of academic dishonesty and evaluation anxiety (Crooks, 1988; McKeachie, 1994). Normative grading also serves to discourage effective group studying or other work, because assisting a classmate inherently decreases the value of the work of other students in the class. In addition, a grade assigned on the curve does not indicate how much or how little students have learned, only where they stand in relation to the class. Some faculty members try to compensate for inequities by adjusting the cutoff scores or by assigning a higher percentage of A's than usual if the class is especially good.

Another form of norm-referenced grading is to assign grades according to breaks in the distribution. In this model, scores are arranged from highest to lowest, and notable gaps or breaks in the distribution are located. For example, on a midterm totaling 100 points, eight students score 81 or higher and three students score 75; no one scores between 80 and 76. Instructors using this model will assign A's to students who scored 81 and above, and start the B's at 75. One disadvantage of this assumption is that these breaks may not represent true differences in achievement, so the magnitude of the gaps in scores should be taken into account. A further disadvantage of this model of grading is that the grade distribution depends on judgments made after students have taken the test rather than on guidelines that are established before testing.

When students' scores are fairly well distributed across a wide range, different approaches often yield similar grades. However, if the overall performance of the class is either low or high, the model used matters a great deal. When many students have done well on an exam, for example, everyone who did well will receive an A or B under criterion-referenced grading. When many students have done poorly, grading on a curve ensures that at least some will receive A's or B's.

Scoring Your Tests

When scoring a test, as with designing, it is a good idea to decide whether the objective is to see what students know and what they have learned or to identify specific things they do not know or cannot do. The objectives of the test are implicit in its design and grading rubric. In any case, a specific scoring strategy (giving points for things done or deducting points for missing items) is recommended. If questions have multiple parts, plan your scoring strategy so that students who stumble on the first part do not lose all of the points. Many teachers find it easiest and most uniform to grade all students on a particular question at the same time. Keeping the student's identity unknown as you grade the test is also a good practice, because it helps minimize any bias in grading.

Ways to Encourage Improvement

If you want to reward improvement, one way is to give students bonus points at the end of the term to acknowledge steady improvement throughout the semester. Alternatively, some instructors offer students a chance to drop a weak exam grade, replace it with their performance on a comprehensive final exam, or complete some credit-granting exercise which demonstrates an improved understanding of the material covered on the exam. Other faculty members allow students to correct their exams and resubmit their answers for a specified amount of additional credit.

Page 46 Cite

Suggested Citation:"Chapter 6: Testing and Grading." National Research Council. 1997. Science Teaching Reconsidered: A Handbook. Washington, DC: The National Academies Press. doi: 10.17226/5287.

×