Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 23
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved 5 Measuring Teaching Performance Up to this point, engaging faculty in the development of the value system; defining the fundamental elements of teaching excellence in engineering education; determining appropriate sources of information in the evaluation of teaching; and weighting the information from these sources have been addressed in operational terms. In this chapter, the subject changes to how information should be gathered, assembled, measured, and used, both as part of the institution’s reward system and to improve teaching performance. MEASURING PERFORMANCE ELEMENTS As noted earlier, content expertise, although necessary, does not guarantee effective teaching. Faculty must be able to design and deliver instructional experiences in such a way that there is some assurance that learning will occur when students engage the experience. The subject matter must be presented in a way that piques students’ interest and encourages them to learn. Also, the course design and implementation must provide students with meaningful feedback on their progress in mastering the material. In addition, teachers must handle myriad routine tasks involved in managing a course. Laboratory supplies must be ordered and inventories maintained, arrangements for guest lecturers must be made, library materials must be put on reserve, field trips must be arranged and coordinated, drop/add slips and, later, grades must be turned in on time, and so on. Thus effective teaching has many components. Instructors must interact with students in a way that (1) provides opportunities for them to learn; (2) creates conditions that support and facilitate learning; and (3) uses techniques and methods that create an environment with a high probability that students will learn. At least five basic skills are necessary for effective teaching (Arreola, Theall, & Aleamoni, 2003): content expertise instructional design skills instructional delivery skills instructional assessment skills course management skills These are the five performance components that were outlined in Table 4.2. When the total “act” of teaching is defined in terms of these five broad components, it becomes clear that the evaluation of teaching cannot be accomplished by using a single measurement tool or by basing it on the judgment of one administrator or peer committee who
OCR for page 24
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved have made a few classroom visits. No one person or group has a detailed, complete view of the entire teaching process. A more accurate and more valid assessment of teaching performance of necessity involves gathering information on all five dimensions of teaching performance. This might include (1) students’ perceptions and reactions to various aspects of the instructor’s delivery, course design, and assessment methods; (2) information from peers, and perhaps informed experts, on the quality of the instructor’s design and assessment skills; (3) information from peers and department heads or supervisors on content expertise (primarily in terms of the level, currency, and appropriateness of the material in the course design and supporting materials); and (4) information from the department head or supervisor on the instructor’s course management. Data provided by students would most likely be gathered by a well designed student-rating form that elicits students’ perceptions of the effectiveness of the instructional design, delivery, and assessment aspects of the course. Data provided by peers may include reviews of the course syllabus to judge whether (1) the content is current, (2) the design includes experiences that will advance students’ mastery of the material, (3) the delivery mechanism (e.g., slides, web pages, lectures, etc.) are well executed, and (4) the assessment tools and procedures are valid and reliable. It should be pointed out that the peers used for such evaluation activities should be experienced and capable of making the assessments that are being asked of them. This will require individuals that have some level of knowledge and expertise in instructional practice. Data provided by the department chair or supervisor may include (1) external evidence of the content expertise of the instructor, (2) evidence that the instructor is complying with all instructional assessment policies and procedures, and (3) evidence that the instructor complies with internal policies and procedures (e.g., reporting grades, keeping attendance records, supervising laboratory activities, etc.). Finally, the instructor himself/herself may maintain a portfolio of evidence and/or informal or qualitative evidence on all aspects of teaching performance. Although peers and the department head or supervisor may want to use the portfolio to augment their interpretation, we do not recommend that self-rating data be used in combination with data from other sources, because self-rating data may then have a greater impact than intended. However, determining how much self-rating data should “count” is an issue that should have been resolved at the faculty-engagement stage (Chapter 4). The key to an effective evaluation of teaching is putting the parts of this mosaic together in a way that accurately reflects the instructor’s overall teaching competence. A DATA-GATHERING RUBRIC Different units may decide to measure only a subset of the performance components of teaching. Table 5.1, an expanded version of Table 4.2, provides a rubric for gathering measurement data on all of the components of teaching performance. In Table 5.1, the type and source of data is described within the appropriate cell.
OCR for page 25
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved TABLE 5.1 Sources and Weights of Measurement Data Tables 5.2 through 5.5 provide strategies for gathering data from various sources.
OCR for page 26
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved TABLE 5.2 Strategy for Student Ratings Description: Students rate an instructor’s performance using a structured questionnaire, unstructured questionnaire or interview. Strengths: Produces extremely reliable, valid information on faculty classroom performance, because students observe the teacher every day (Aleamoni, 1981). Instructors are often motivated to change their behavior as a result of student feedback. If a professionally designed student-rating form is used, results show a high correlation with ratings by peers and supervisors; in addition, these assessments are not affected by grades. Weaknesses: If a professionally developed form is not used, external factors, such as class size and gender, may influence student ratings. In addition, students tend to be generous in their ratings. Conditions for Effective Use: Student anonymity and instructor’s willingness to accept student feedback. Instruments must be carefully developed by appropriate and documented reliability and validity studies. Nature of the Evidence: Student perceptions of organization, difficulty, and course impact (e.g., how they have changed as a result of taking the course); how various teaching techniques affect them; reactions to instructor’s actions; what students like and dislike about an instructor. TABLE 5.3 Strategy for Peer Ratings Description: Other faculty or peers rate an instructor’s performance in terms of (1) course design, (2) appropriateness and effectiveness of instructional materials, and (3) appropriateness of instructional assessment strategies and tools. Peer reviewers are usually from outside the university, but may include some faculty from within the university. This process would be analogous to peer evaluation as done for research contributions. Strengths: Raters are familiar with the institutional, departmental, and division goals, priorities, and values, as well as the specific problems that affect teaching. Peer review encourages professional behavior (e.g., a desire to improve one’s own profession). Raters with expertise in the instructor’s subject area may be able to give content-specific suggestions and recommendations. Weaknesses: Assumes that peers have expertise in instructional design, delivery, and assessment. Bias may be introduced because of previous personal knowledge, personal relationships, or personal pressure to influence the evaluation. Relationships among peers may suffer. Possible bias may be introduced because of a reviewer’s preference for his/her teaching method. Conditions for Effective Use: A high degree of professional ethics and objectivity. Multiple reviewers. Nature of the Evidence: Comments on relations between instructor’s actions and students’ behavior. Comparisons with instructional methods peers may consider superior or more appropriate. Suggestions for instructors on methods to use, etc.
OCR for page 27
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved TABLE 5.4 Strategy for Review by Department Head or Supervisor Description: Administrator evaluates instructor’s performance relative to policies and procedures of the colleges and the objectives of the department. Strengths: Evaluators familiar with college and community goals, priorities, and values often provide additional insights because they can compare the instructor’s performance with other performances in the college, school, division, or department. Weaknesses: Bias may be introduced because of extraneous data, personal relationships, and evaluator’s values and favored teaching methods. Conditions for Effective Use: Requires knowledge of institutional, college, and departmental policies and procedures as they relate to teaching courses in the engineering curriculum and the maintenance of student information (e.g., FERPA, approved grading scale, etc.). Requires maintenance of records relating to instructor’s compliance with relevant policies and regulations. Nature of Evidence Produced: Comments on the relationship between instructor’s actions and the achievement of departmental goals and objectives. TABLE 5.5 Strategy for Self-rating (Portfolio) Description: Instructor gathers information to assess his/her own performance relative to personal needs, goals, and objectives. Strengths: May be part of a program of continuous assessment. Likely that instructors will act on data they collect themselves. Data are closely related to personal goals and needs. Necessary to facilitate review of syllabus by peers. Weaknesses: Results may be inconsistent with ratings by others. Possible unwillingness to collect and/or consider data relative to one’s own performance. Tendency to rate performance higher than students do. Conditions for Effective Use: Requires that instructor be self-confident and secure and have the skills to identifying goals and collect appropriate data. Data cannot be heavily weighted in personnel decisions (e.g., promotion, tenure, merit pay, etc.) Nature of Evidence Produced: Information on progress toward personal goals.
OCR for page 28
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved MEASUREMENT TOOLS Constructing valid and reliable forms, questionnaires, or other tools for gathering data is a complex task that requires expertise in psychometrics. We must always keep in mind that what are being developed are tools to measure—in a valid and reliable way—complex psychological phenomena such as opinions, reactions, observations, rankings, and so on. Even selecting appropriate forms and tools from published, commercially available products requires fairly sophisticated psychometric skills; however, resources to assist in locating instruments can often be found on campus (in educational development office or within the social sciences departments). Each of these products must be assessed for appropriateness and utility in the faculty evaluation system that has been designed for a specific situation. No standardized forms for peer or department chair ratings are commercially available; however, a search of the internet provides ad hoc checklists, rating forms, and other resources that would provide useful guidance in constructing such tools (University of Texas). Therefore, institutions may have to develop their own—or find appropriate forms that have been used at other institutions. Before either of these can be done, however, it is imperative that the performance elements to be measured have been clearly and completely specified. If new forms must be developed, experts in psychometrics should be consulted. Also, training for the observers is important in that it helps to focus their observations around the items listed on checklist or rating forms (Braskamp & Ory, 1994). Such expertise may be available in other colleges on the campus, especially in departments that focus on educational research, instructional-systems design, or psychological measurement. All of the tools for the evaluation of teaching must use the same scale of measurement. That is, whether data are gathered via a student rating form, a peer review form, or a department chair review form, all measures must be on a common scale. Most student rating forms use either a 4-point or 5-point scale. Thus student ratings are represented by a number between 1 and 5, with, in most cases, the highest number indicating the most positive rating. If that scale is adopted, the forms used to gather information from all sources should use the same number scale in reporting results. COMPUTING AN OVERALL EVALUATION Once measurement tools have been selected and/or developed for all input sources (Table 3), the systematic evaluation of teaching can proceed. After data have been gathered, the task becomes combining it into a usable form. The examples below use a common 1 to 4 scale, with 4 as the highest rating and 1 as the lowest. All forms, including questionnaires, interview schedules, and any other measurement tools used to collect student ratings, peer ratings, and department head ratings report results on that scale. The same would be true if the 5-point scale or another measurement had been selected. Whichever scale is used, it must be consistent throughout the evaluation system. Having determined the information to be provided by each source and specified the weights assigned to that information, it is now possible to compute an overall rating that reflects the collective values of the faculty. Each source provides information on teaching performance elements as previously determined by the faculty. The information from each source concerning
OCR for page 29
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved each component of each teaching role is weighted in ways that reflect the consensus value system developed in collaboration with the faculty. In other words, the overall rating or evaluation is based on the principle of controlled subjectivity (discussed in Chapter 4). Table 4 shows how data gathered by the various tools used to measure the performance elements of teaching for one faculty member might be assembled into an overall evaluation of teaching. TABLE 5.6 Weighting Measurement Data to Produce an Evaluation of Teaching Minimum 20% TEACHING Maximum 60% Performance Component Sources of Measurement Data Students (25%) Peers (45%) Dept. Chair/Supervisor (20%) Self (10%) Content expertise 4 4 Instructional design 3 4 4 Instructional delivery 4 3 4 Instructional assessment 2 3 4 3 Course management 2 3 AVERAGE 3.0 3.5 3.0 3.6 Weighted Sum WEIGHTED AVERAGE 0.75 1.575 0.6 0.36 3.3 The weighted sum shown in the right-hand column is the final evaluation of teaching for the instructor in this case. Ratings from each source of various teaching performance elements were averaged, and those averages were weighted in accordance with the values determined during the development of the evaluation system. Finally, the weighted averages were added together to produce the final evaluation. Using the principle of controlled subjectivity ensures an approximation of objectivity (i.e., consistency of conclusions based on the same data). Controlled subjectivity ensures the weights assigned to subjective values for each source of information are controlled in that they are consistent for each individual. The weighted sum of 3.3 in Table 5.6 indicates a favorable teaching evaluation.1 1 For an in-depth discussion of this method of computing an evaluation of teaching, or an evaluation of faculty performance in any other role, see Developing a Comprehensive Faculty Evaluation System, 3rd ed., by R.A. Arreola (Jossey-Bass, 2007).
OCR for page 30
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved Finally, as the heading of Table 5.6 indicates, the value range of the teaching role in the overall evaluation ranges from 20 percent to 60 percent. This range was determined by the faculty in accordance with the faculty role model (see Table 4.1) developed for the larger institution. Thus, in the overall evaluation of an instructor for decisions involving promotion, tenure, or merit pay, the evaluative outcome for teaching may be given a different weight than the weights assigned to scholarly and creative activities and service—the other components of an overall faculty role. For example, suppose the faculty member whose data are shown in Table 5.6 had a professional assignment that included not only teaching but also various forms of scholarship and service. Suppose, then, that the roles relative to his or her specific professional responsibilities were represented as follows: teaching (45 percent) scholarly/creative activities (45 percent) service (10 percent) If all faculty evaluations for the institution used the same 4-point scale, the evaluation of the scholarly/creative activities and service component would result in values similar to the value for teaching. For example, in addition to the teaching evaluation of 3.3 shown in Table 5.6, suppose the faculty member received an evaluation of 3.6 for scholarly/creative activities and 2.7 for service. The overall evaluation could then be computed as shown in Table 5.7. TABLE 5.7 Computation of an Overall Faculty Evaluation Role Weight Evaluation Weighted Evaluation Teaching 45% 3.3 1.478 Scholarly/creative activities 45% 3.6 1.620 Service 10% 2.7 0.270 Weighted sum (overall evaluation) 3.4 The development of the metric for the evaluation of teaching as shown in tables 4.2, 5.1, and 5.6, as well as the metric for the overall evaluation shown in Table 5.7, provide a consistent mechanism for using faculty evaluation data in promotion and tenure decisions, as well for determining the allocation of merit-pay dollars: The standards for awarding a promotion could be set in terms of a specific overall evaluation value for a certain number of years.
OCR for page 31
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved The standards for qualifying for tenure could be set in those same terms plus the achievement of a specific evaluation value in any of the roles, including, of course, teaching. However, in the awarding of tenure, faculty performance is usually only one of a number of factors taken into account. The standards for determining merit pay could be set in terms of achieving a specified minimum value in both an overall evaluation and specific role evaluations. LINKING EVALUATION AND PROFESSIONAL ENRICHMENT Faculty evaluation and professional enrichment are two sides of the same coin. Ideally, faculty evaluation programs and professional enrichment programs should work hand in hand. For example, if a particular aspect of faculty performance is being evaluated, faculty should have access to resources or opportunities to gain or improve the skills necessary for that aspect of performance. For maximal self-improvement effect, faculty-evaluation systems should be linked to, but operationally separate from, professional enrichment programs. As a rule of thumb, if a specific aspect of faculty performance is to be evaluated, resources should be available to enable faculty members to gain expertise and proficiency in the skills required for that performance component—especially if that performance area is outside the area of engineering. Professional enrichment programs in educational psychology, instructional technology, conflict management, public speaking, and organizational management, for example, may assist faculty in achieving excellence in the full range of their professional performance. The experience of the committee members indicates that no matter how well faculty evaluation systems are designed, if they are implemented without reference to opportunities for professional enrichment, they are inevitably considered primarily punitive. In addition, professional enrichment programs that are implemented without reference to the information generated by faculty evaluations tend to have disappointing results, no matter how well the programs are designed and funded. This situation is neatly summarized by Theall’s (2007) statement that, “Evaluation without development is punitive. Development without evaluation is guesswork.” The reason is simple, if not always obvious. Unless professional enrichment programs are linked to evaluation systems, they tend to attract primarily faculty who are already motivated to seek out resources and opportunities to improve their skills. In short, the “good” seek out ways to get better—which is the quality that tends to make them good in the first place. However, individuals who are not thus motivated, and who, accordingly, are probably in greatest need of professional enrichment opportunities, generally tend to be the last ones to seek them out. Leadership from deans and department chairs can create an atmosphere of continuous improvement regarding teaching effectiveness by engaging the faculty in ongoing discussions about teaching (and related activities) as a pursuit of excellence. When the elements of faculty evaluations are carefully coordinated with a professional enrichment program, the institution is more likely to obtain a valuable benefit from both. Thus, if an instructor’s skill in assessing student learning is going to be evaluated, the institution should provide resources and training opportunities for him or her to become proficient in that skill. If a faculty member’s ability to deliver a well organized, exciting lecture is going to be evaluated,
OCR for page 32
Developing Metrics for Assessing Engineering Instruction: What Gets Measured is What Gets Improved resources should be available for him or her to become proficient in the requisite public speaking and presentation skills. We must keep in mind that most instructors have had little or no formal training in the complex, sophisticated skills involved in designing and delivering instruction or assessing student learning outcomes. Most tend to teach the way they were taught and test the way they were tested. Thus, if faculty performance is evaluated, especially performance in teaching, the institution should provide resources for educators to develop, support and enhance their teaching performance. In summary, a successful faculty evaluation system must provide (1) meaningful feedback to guide professional growth and enrichment and (2) evaluative information on which to base personnel decisions. The key to a system that serves both of these purposes is in the policies that determine the distribution of the information gathered for evaluations. As a general principle, detailed information from questionnaires or other evaluation tools should be provided exclusively to the faculty member being evaluated as a guide to professional enrichment and growth in certain areas. However, aggregate data that summarize and reflect the overall pattern of performance of an individual over time should be used for personnel decisions, such as promotion, tenure, continuation, and merit raises. It is important that everyone, both faculty and administrators, understand that evaluation data will be used both to provide faculty with diagnostic information to encourage their professional growth and to provide administrators with information that will be used in making personnel decisions (promotion, tenure, pay raises, etc.) An institution may emphasize one use over another, but it would be a mistake to pretend that faculty evaluation data will only be used for professional enrichment purposes. And, even if the primary intent is to use evaluations for professional enrichment, they should be designed so they can also be used for personnel decisions.